Building Custom Metrics for Predictive Maintenance

Most of the time, relying on traditional metrics doesn’t capture the true financial impact of predictions like preventing a machine breakdown or optimizing maintenance schedules. What you need is custom metrics.

Building Custom Metrics for Predictive Maintenance
Do not index
Do not index
Canonical URL
Standard metrics like precision and recall can feel limiting in the context of predictive maintenance. These models deal with complex, imbalanced datasets where failures are rare but expensive, and false alarms are costly.
Most of the time, relying on traditional metrics doesn’t capture the true financial impact of predictions like preventing a machine breakdown or optimizing maintenance schedules.
How do you know whether your model is truly reducing costs, saving time, or improving maintenance efficiency?
In this blog post, you’ll learn how to craft classification metrics that not only reflect model performance but also connect directly to business outcomes.

The ML Use Case at Hand

I developed a predictive maintenance model using a Gradient Boosting Classifier to identify potential machine failures in a milling process. The dataset is a synthetic representation of 14 features and 10,000 rows, simulating the typical operation of a milling machine. These features include machine vibration, temperature, rotation speed, and torque, which collectively provide a comprehensive view of the machine's condition during operation.
For a data scientist, these features open up numerous possibilities to explore custom metrics tailored to the specific nuances of predictive maintenance.
 

Only Two Python Functions

NannyML Cloud allows you to add custom metrics with just two Python functions: the calculate function and the optional estimate function.
The first function, calculate, is mandatory and serves as the foundation for evaluating your custom metric. If you have ground truth data available, calculate will assess how well your model performed by comparing its predictions to actual outcomes.
Without ground truth data to validate predictions, it becomes difficult to determine whether a machine would have failed had no maintenance been performed or whether a prediction of continued operation is correct.
The estimate function comes into play when ground truth is missing or delayed. It can provide an approximation of performance based on calibrated probabilities, ensuring that you can monitor your model even without real-time labels.
These calibrated probabilities are generated by NannyML from the model's predictions.
Whether you have access to true target values or need to estimate performance, you have everything you need to track your model's impact.

Matthews Correlation Coefficient

The Matthews Correlation Coefficient (MCC) is a solid choice for evaluating predictive maintenance models, like the one we've built for the milling machine.
It’s especially useful when your dataset is unbalanced, which is often the case in predictive maintenance.
Here is the code for calculate and estimate function:
from sklearn.metrics import matthews_corrcoef
import numpy as np
import pandas as pd

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    **kwargs
) -> float:
        y_true = y_true.astype(int)
        y_pred = y_pred.astype(int)

        return matthews_corrcoef(y_true, y_pred)
import numpy as np
import pandas as pd

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    **kwargs
) -> float:
        
        y_pred = np.asarray(y_pred)
        estimated_target_probabilities = estimated_target_probabilities.to_numpy().ravel(
        data = pd.DataFrame({
            'estimated_target_probabilities': estimated_target_probabilities,
            'y_pred': y_pred
        })
        data.dropna(axis=0, inplace=True)

        estimated_target_probabilities = data['estimated_target_probabilities'].to_numpy()
        y_pred = data['y_pred'].astype(int).to_numpy()

        # Calculate estimated confusion matrix elements
        tp = np.sum(np.where(y_pred == 1, estimated_target_probabilities, 0))
        fp = np.sum(np.where(y_pred == 1, 1 - estimated_target_probabilities, 0))
        fn = np.sum(np.where(y_pred == 0, estimated_target_probabilities, 0))
        tn = np.sum(np.where(y_pred == 0, 1 - estimated_target_probabilities, 0))

        numerator = (tp * tn) - (fp * fn)
        denominator = np.sqrt(
            (tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)
        )
        if denominator == 0:
            return 1

        mcc_estimated = numerator / denominator
        return np.nan_to_num(mcc_estimated)
📌
We’ve created a range of resources to support you as you add these Python functions and metrics to the dashboard.Check out our documentation for detailed steps, explore our blog post for a full walk-through, or watch our webinar to see it all in action.
notion image
On adding the metric correctly in the dashboard, the metric will be displayed like so on the Performance page:
Matthews Correlation Coefficient Metric Plot
Matthews Correlation Coefficient Metric Plot
This formula gives a value between -1 and 1, where 1 represents a perfect prediction, 0 is equivalent to random guessing, and -1 indicates a completely wrong model.

Business Value Metric

The Business Value Metric is useful when communicating with non-technical stakeholders because it translates technical performance into direct financial impact.
Here is how I configured it for this model:
Business Value Configuration Panel
Business Value Configuration Panel
The metric will be visualised as follows:
Business Value Metric for Predictive Maintenance (Binary Classification)
Business Value Metric for Predictive Maintenance (Binary Classification)
When ground truth labels are unavailable, NannyML offers performance estimation algorithms, which estimate the confusion matrix and apply the same business value formula to approximate the model’s financial impact.
This metric is also available for multi-class models.
Business Value Configuration for Multi-Class Classification
Business Value Configuration for Multi-Class Classification
In the above image, the first two rows are default rules that cannot be removed; you can only change the weights. The first row is essentially “correct prediction” whereas the second row is “incorrect prediction”.
All the rows after that allow you to configure overrides, say a correct prediction for a specific class has higher value, you can add a rule for that with the appropriate weight. These rules are applied from top to bottom without any checks on duplicate rows.
In the dataset, the shilling machine had five independent failure modes: tool wear failure, heat dissipation failure, power failure, overstrain failure, and random failure.
Misclassifying tool wear failure as a heat dissipation issue wastes time and effort spent troubleshooting the wrong problem and therefore doubling both the time required and technician costs. The weight of every correct and incorrect classification should be determined with your team.
Business Value Metric for Predictive Maintenance for Multi-Class Classification
Business Value Metric for Predictive Maintenance for Multi-Class Classification

Conclusion

In this blog, we covered how to create and add custom metrics to your model monitoring workflow.
We looked at metrics like the Matthews Correlation Coefficient and business value estimation. These metrics help you communicate clearly with stakeholders and show the financial impact of predictive maintenance.
At NannyML, we know that each use case brings unique challenges. If you're facing specific hurdles with model monitoring, consider scheduling a demo with our founders. They’re here to discuss your needs and help you get the best out of your data science solutions.

Read More…

Looking for more custom metrics?
If you’re interested in learning how to maintain and monitor ML models in manufacturing, take a look at my other blogs.

References

The dataset used in the blog is part of the following publication: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.

Ready to learn how well are your ML models working?

Join other 1100+ data scientists now!

Subscribe

Written by

Kavita Rana
Kavita Rana

Data Science Intern at NannyML