Building Custom Metrics for Predictive Maintenance

Do not index

Canonical URL

Standard metrics like precision and recall can feel limiting in the context of predictive maintenance. These models deal with complex, imbalanced datasets where failures are rare but expensive, and false alarms are costly.

Most of the time, relying on traditional metrics doesn’t capture the true financial impact of predictions like preventing a machine breakdown or optimizing maintenance schedules.

How do you know whether your model is truly reducing costs, saving time, or improving maintenance efficiency?

In this blog post, you’ll learn how to craft classification metrics that not only reflect model performance but also connect directly to business outcomes.

The ML Use Case at Hand

I developed a predictive maintenance model using a Gradient Boosting Classifier to identify potential machine failures in a milling process. The dataset is a synthetic representation of 14 features and 10,000 rows, simulating the typical operation of a milling machine. These features include machine vibration, temperature, rotation speed, and torque, which collectively provide a comprehensive view of the machine's condition during operation.

For a data scientist, these features open up numerous possibilities to explore custom metrics tailored to the specific nuances of predictive maintenance.

Only Two Python Functions

NannyML Cloud allows you to add custom metrics with just two Python functions: the calculate function and the optional estimate function.

The first function, calculate, is mandatory and serves as the foundation for evaluating your custom metric. If you have ground truth data available, calculate will assess how well your model performed by comparing its predictions to actual outcomes.

Without ground truth data to validate predictions, it becomes difficult to determine whether a machine would have failed had no maintenance been performed or whether a prediction of continued operation is correct.

The estimate function comes into play when ground truth is missing or delayed. It can provide an approximation of performance based on calibrated probabilities, ensuring that you can monitor your model even without real-time labels.

These calibrated probabilities are generated by NannyML from the model's predictions.

Whether you have access to true target values or need to estimate performance, you have everything you need to track your model's impact.

Matthews Correlation Coefficient

The Matthews Correlation Coefficient (MCC) is a solid choice for evaluating predictive maintenance models, like the one we've built for the milling machine.

It’s especially useful when your dataset is unbalanced, which is often the case in predictive maintenance.

Here is the code for calculate and estimate function:

from sklearn.metrics import matthews_corrcoef
import numpy as np
import pandas as pd

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    **kwargs
) -> float:
        y_true = y_true.astype(int)
        y_pred = y_pred.astype(int)

        return matthews_corrcoef(y_true, y_pred)

import numpy as np
import pandas as pd

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    **kwargs
) -> float:
        
        y_pred = np.asarray(y_pred)
        estimated_target_probabilities = estimated_target_probabilities.to_numpy().ravel(
        data = pd.DataFrame({
            'estimated_target_probabilities': estimated_target_probabilities,
            'y_pred': y_pred
        })
        data.dropna(axis=0, inplace=True)

        estimated_target_probabilities = data['estimated_target_probabilities'].to_numpy()
        y_pred = data['y_pred'].astype(int).to_numpy()

        # Calculate estimated confusion matrix elements
        tp = np.sum(np.where(y_pred == 1, estimated_target_probabilities, 0))
        fp = np.sum(np.where(y_pred == 1, 1 - estimated_target_probabilities, 0))
        fn = np.sum(np.where(y_pred == 0, estimated_target_probabilities, 0))
        tn = np.sum(np.where(y_pred == 0, 1 - estimated_target_probabilities, 0))

        numerator = (tp * tn) - (fp * fn)
        denominator = np.sqrt(
            (tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)
        )
        if denominator == 0:
            return 1

        mcc_estimated = numerator / denominator
        return np.nan_to_num(mcc_estimated)

📌

We’ve created a range of resources to support you as you add these Python functions and metrics to the dashboard.Check out our documentation for detailed steps, explore our blog post for a full walk-through, or watch our webinar to see it all in action.

On adding the metric correctly in the dashboard, the metric will be displayed like so on the Performance page:

Matthews Correlation Coefficient Metric Plot

This formula gives a value between -1 and 1, where 1 represents a perfect prediction, 0 is equivalent to random guessing, and -1 indicates a completely wrong model.

Business Value Metric

The Business Value Metric is useful when communicating with non-technical stakeholders because it translates technical performance into direct financial impact.

Here is how I configured it for this model:

The metric will be visualised as follows:

Business Value Metric for Predictive Maintenance (Binary Classification)

When ground truth labels are unavailable, NannyML offers performance estimation algorithms, which estimate the confusion matrix and apply the same business value formula to approximate the model’s financial impact.

This metric is also available for multi-class models.

Business Value Configuration for Multi-Class Classification

In the above image, the first two rows are default rules that cannot be removed; you can only change the weights. The first row is essentially “correct prediction” whereas the second row is “incorrect prediction”.

All the rows after that allow you to configure overrides, say a correct prediction for a specific class has higher value, you can add a rule for that with the appropriate weight. These rules are applied from top to bottom without any checks on duplicate rows.

In the dataset, the shilling machine had five independent failure modes: tool wear failure, heat dissipation failure, power failure, overstrain failure, and random failure.

Misclassifying tool wear failure as a heat dissipation issue wastes time and effort spent troubleshooting the wrong problem and therefore doubling both the time required and technician costs. The weight of every correct and incorrect classification should be determined with your team.

Business Value Metric for Predictive Maintenance for Multi-Class Classification

Conclusion

In this blog, we covered how to create and add custom metrics to your model monitoring workflow.

We looked at metrics like the Matthews Correlation Coefficient and business value estimation. These metrics help you communicate clearly with stakeholders and show the financial impact of predictive maintenance.

At NannyML, we know that each use case brings unique challenges. If you're facing specific hurdles with model monitoring, consider scheduling a demo with our founders. They’re here to discuss your needs and help you get the best out of your data science solutions.

Read More…

Looking for more custom metrics?

Top 3 Custom Metrics Data Scientists Should Build for Finance: A Tutorial

In this blog, we’ll explore the differences between traditional and custom metrics and examine finance-specific classification and regression models. Also, there is a step-by-step tutorial on how to set up and utilize these features in NannyML Cloud.

https://www.nannyml.com/blog/custom-metrics-finance

Python Tutorial: Developing Custom Metrics for Insurance AI Models

Insurance companies are often concerned not just with the accuracy of their predictions but also with their financial implications. Custom metrics offer a way to measure model performance in a more nuanced and context-specific manner than traditional metrics. We’ll explore two machine learning applications in insurance and the custom metrics suited to each.

https://www.nannyml.com/blog/custom-metrics-for-insurance-models

3 Custom Metrics for Your Forecasting Models

You’ve worked hard to build a model that should add value, but the wrong metrics can make it look like your work is falling short. By employing custom metrics that align more closely with business needs, you can demonstrate the real value of your work.

https://www.nannyml.com/blog/custom-metrics-for-demand-forecasting-models

If you’re interested in learning how to maintain and monitor ML models in manufacturing, take a look at my other blogs.

Monitor your predictive maintenance models stress-free

In this blog, we will focus on the challenges of monitoring predictive maintenance models and how you, as a data scientist, can set up a robust model monitoring workflow to detect failures early.

https://www.nannyml.com/blog/monitor-predictive-maintenance-models-stress-free

Prevent Failure of Product Defect Detection Models: A Post-Deployment Guide

This blog dissects the core challenge of monitoring defect detection models: the censored confusion matrix. Additionally, I explore how business value metrics can help you articulate the financial impact of your ML models in front of non-data science experts.

https://www.nannyml.com/blog/prevent-failure-of-product-defect-detection-models

References

The dataset used in the blog is part of the following publication: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.