Using Concept Drift as a Model Retraining Trigger

Discover how NannyML’s innovative Reverse Concept Drift (RCD) algorithm optimizes retraining schedules and ensures accurate, timely interventions when concept drift impacts model performance.

Using Concept Drift as a Model Retraining Trigger
Do not index
Do not index
Canonical URL

Introduction

Machine learning (ML) models undergo AI "aging" after deployment. Vela et al. coined this term to describe the complex ML model performance degradation over time.
Many companies use an automated retraining schedule to prevent AI aging and ensure model performance remains within their specified performance levels. Yet, triggering retraining can be costly. It requires additional computing and time resources, and there is a risk that automated retraining can result in a worse-performing model.
However, retraining is a vital issue-resolution tool where the ML model’s learned pattern is no longer relevant, such as in concept drift instances (a changing relationship between the input and target space).
So, how can we trigger retraining on performance drops attributed to concept drift?
This blog post will guide you through NannyML’s new Reverse Concept Drift (RCD) algorithm. The RCD algorithm assists your business in detecting when your ML model is undergoing concept drift and how this concept drift is affecting your model’s performance. Additionally, we will walk you through our proposed automated retraining workflow to ensure that the retrained models always provide business value.

Standard NannyML ML Workflow

NannyML can capture performance degradation without targets through our performance estimation algorithms. Image by NannyML designers.
NannyML can capture performance degradation without targets through our performance estimation algorithms. Image by NannyML designers.
As discussed in the blog Monitoring Workflow for Machine Learning Systems, NannyML advocates for a three-pronged approach to monitoring ML models. This approach centers on performance monitoring and estimation. No further action is needed if performance does not degrade below the business’s defined threshold levels.
If a performance degradation occurs, a root cause analysis should be completed. The root cause analysis ensures that the issue resolution steps address the reason for the performance drop. Miles Weberman wrote an excellent blog, Retraining Is Not All You Need, which explains how to resolve performance degradation through targeted issue-specific resolution. His blog additionally demonstrates better first-pass solutions than retraining, especially in poor data quality or feature input that minimally covers the actual population space. In these instances, retraining could result in a worse-performing model.
The NannyML proposed ML monitoring workflow. Image conception by NannyML designers.
The NannyML proposed ML monitoring workflow. Image conception by NannyML designers.
Yet, suppose there is an alert that performance has fallen, and following a root cause analysis, concept drift is to blame. Here, retraining is appropriate, as the ML model needs to learn the new underlying relationships between the targets and the input space. But what is concept drift, and how does it affect your model?

Concept Drift: Deep Dive

What is concept drift?

Concept drift is the phenomenon whereby the pattern between an ML model's input and output changes. Since the model is trained to predict this relationship, the changed pattern will result in the model's unsatisfactory performance for future predictions.
Mathematically, concept drift is defined as the change in with no change in , where refers to the distribution of the input space and is the distribution of the targets given the distribution of the inputs or the pattern being modeled.
Concept Drift Walkthrough. Image by NannyML designers.
Concept Drift Walkthrough. Image by NannyML designers.
For instance, let's say the underlying pattern between the input space and the targets shifts, causing the decision boundary between the two classes to shift. This shifted decision boundary would result in data that previously represented a negative class (purple) being classified as representing a positive class (blue) for the same sample. Subsequently, performance would substantially decrease.
Decision Boundary Change Due to Concept Drift. Image by the author.
Decision Boundary Change Due to Concept Drift. Image by the author.

Real-world example

A real-world example of concept drift can be illustrated by the imagined case of a bank that uses an ML model to classify loan applications. The original model is trained to classify loan applications as high-risk (negative class = purple) or low-risk (positive class = blue). The model was trained on historical data such that:
  • High-Risk (Negative Class - Purple): Loan applications from individuals with low credit scores, high debt-to-income ratios, and frequent late payments are classified as high-risk.
  • Low-Risk (Positive Class - Blue): Loan applications from individuals with high credit scores, low debt-to-income ratios, and consistent on-time payments are classified as low-risk.
Over time, however, the characteristics of high-risk and low-risk borrowers might change due to shifts in the economy that alter consumer behavior, new financial regulations that alter creditworthiness criteria, or the entrance of new financial products like cryptocurrencies.
Due to these changes, the decision boundary between high-risk and low-risk borrowers shifts, and more high-risk loans are classified as low-risk. This misclassification would lead to loan approval for individuals who might default, increasing the bank's financial risk and potential for significant losses.

Impact on the ML Model Performance

Let’s understand how concept drift impacts model performance to justify retraining for issue resolution. To contextualize this retraining workflow, we will compare the impact of concept drift and covariate shift on performance.
The impact of a covariate shift on model performance depends on how the input feature distribution changes and does not always lead to performance degradation. For instance, the model's performance could improve if the data drifts away from the class boundary. If performance has decreased, however, a range of issue resolution steps, such as shifting the prediction thresholds for model classification decisions, could be more appropriate than retraining.
However, concept drift will likely always lead to performance degradation. Therefore, we know that some form of corrective action will be needed. Retraining the model will usually rectify its performance, provided data quality remains satisfactory. How can we set up a retraining schedule that triggers based on concept drift?

Reversed Concept Drift (RCD) Algorithm

Reverse Concept Drift (RCD) is an algorithm developed by NannyML and only available through the NannyML Cloud service. The algorithm enables monitoring of the magnitude of concept drift and its impact on model performance. The algorithm trains a new comparison model on the monitored data to make predictions on the reference dataset. This approach helps us estimate how the original model under investigation, called the monitored model, would perform if the reference data followed the same concept as the monitored data.
💡
Data Chunk
A data chunk is NannyML terminology for a data sample. All NannyML algorithms work on the data chunk level, typically derived from a time period.
Datasets: Reference and Monitored
NannyML operates on two different datasets: a reference and a monitored dataset.
The reference dataset contains observations for a time period during which the model had acceptable performance levels. Depending on the time span the model has been in production, this could be the test dataset or a selected benchmark dataset.
The monitored dataset is a subset of the data with observations for an analysis period you want NannyML to evaluate.

Understanding the algorithm

Steps of the RCD algorithm. Image by author.
Steps of the RCD algorithm. Image by author.
💡
It is important to note that concept drift detection relies on having access to the actual targets within the monitoring dataset.
The steps of the algorithm follow from the flow chart above:
  1. Train a new model (comparison model ) on the current chunk of the monitored dataset.
  1. Use this comparison model to predict the same reference dataset chunk as the original monitored model.
  1. Measure the difference between the comparison and monitored models to estimate the magnitude or performance impact of the concept drift.
  1. Plot predictions against time. If the projections are similar, then it is unlikely that concept drift has occurred. Yet, concept drift is likely experienced if there is a difference.
RCD underlying algorithm visualized. Image by NannyML designers.
RCD underlying algorithm visualized. Image by NannyML designers.

Assumptions

While the RCD algorithm is robust, it has some underlying assumptions and requirements that must be met to enable its accuracy.
  • The comparison model can capture the underlying pattern within the data. This assumption is valid, especially when using a gradient boosting (GB) model, given a GB’s ability to capture complex data relationships.
  • The chunk size is large enough to capture a new concept (~1000 data points). The chunk size can be adjusted at the point of retraining. The tradeoff for using smaller chunks will be between adequate concept capturing and business case value ( i.e., monitoring run every day vs. every week based on chunk size).
  • No covariate shift to unseen regions. While the algorithm will return valid estimates in most cases of covariate shift, if the input feature distribution shifts such that it covers a very narrow area of the input space, then the comparison model will likely learn an overly simplified version of the actual pattern.

Magnitude Estimation of Concept Drift

notion image
notion image
The degree of concept drift can be represented graphically as the area between the original concept’s curve and the new concept’s curve. Thus, the magnitude is the integral of the difference between the concept of the monitored model and the trained comparison model on the reference dataset.
Magnitude estimation depiction, with the shaded area corresponding to the degree of the concept’s shift. Image by NannyML designers.
Magnitude estimation depiction, with the shaded area corresponding to the degree of the concept’s shift. Image by NannyML designers.
The resulting value, called the Magnitude Estimation (ME), ranges between 0 and 1. The ME can be visualized in the cloud dashboard to understand the extent of concept drift within the monitored dataset.
However, ME was designed to quantify the amount of concept drift. To understand how concept drift has affected the model’s performance, the performance impact estimation functionality of the RCD algorithm should be used.
A cloud dashboard of the magnitude estimation depicts concept drift alerts from April 2019 until August 2019. Image from the NannyML cloud product.
A cloud dashboard of the magnitude estimation depicts concept drift alerts from April 2019 until August 2019. Image from the NannyML cloud product.

Quantifying Concept Drift’s Impact on Performance

To quantify the impact of concept drift on performance ( ) and compare against the business’s performance thresholds, we find the difference between the estimated performance of the original monitored model under the new concept () and its performance under the old concept () using the same reference dataset.
 

Calculating estimated performance under the new concept

For this calculation, we assume that the new comparison model’s predictions are the ground truth for the reference set. Additionally, we assume that the predicted scores are well-calibrated and represent actual probabilities.
These assumptions allow us to construct a confusion matrix over a large enough chunk size (~1000 data points) to reasonably estimate how the monitored user model would have performed under the shifted concept. We can then calculate any metric derived from the confusion matrix for our . In the example above, we use estimated accuracy.
💡
Even if you can’t get 1000 data points for the estimation, there will be a confidence band around the performance estimation, which will assist you in deciding whether to trust the estimation.

Calculating estimated performance under the old concept

The is obtained by running the CBPE algorithm on the monitored model using the reference dataset. This value provides a metric for how the monitored model would be expected to perform on the old concept.

Performance Impact Estimation

Each chunk of data is used to retrain the comparison model to capture the new concept and make predictions on the base reference data. If, at any point, the performance impact drops below the set thresholds, an alert will be generated. Relying on the performance impact estimation alerts from RCD will provide a great way of assessing whether or not to retrain the ML model.
Cloud dashboard of performance estimation for a model with concept drift. Image by Santiago Víquez.
Cloud dashboard of performance estimation for a model with concept drift. Image by Santiago Víquez.

Optimized Retraining Schedule using Reverse Concept Drift

Now that we have covered the fundamentals of concept drift and NannyML’s RCD algorithm - we are ready to unpack the recommended retraining workflow. This workflow schedule recommends using the drop in performance impact estimation, detected via the RCD algorithm, to trigger retraining automatically.
💡
Why is retraining the best issue resolution option for concept drift? The concept is the underlying pattern in the data the model trains to capture. If the concept changes, the model can no longer make valid predictions. Retraining must ensure the model can correctly map between the input and target data.

Concept drift triggered retraining workflow

This retraining schedule starts with the RCD performance impact functionality, which directly captures changes in the concept as a decrease in performance. Retraining is automatically triggered if performance drops below the business’s defined thresholds.
NannyML’s Recommended Retraining Schedule for Reverse Concept Drift. Image by author.
NannyML’s Recommended Retraining Schedule for Reverse Concept Drift. Image by author.
Although concept drift’s effect can be resolved through retraining, it is not guaranteed that it is the only cause of our performance decrease (see Retraining is Not All You Need blog). Subsequently, before deploying the newly retrained model, it is essential to validate its performance using the standard NannyML methods of PAPE (cloud method) or CBPE (open source).
If the model meets the business’s performance requirements - the new model is triggered for redeployment.

Improvements Over Other Retraining Schedules

The proposed retraining schedule offers several advantages over traditional retraining workflows. Retraining based on time, volume, or data-drift-based triggers can be wasteful as these triggers run tangentially to the main issue retraining is trying to solve: concept drift.
NannyML’s RCD algorithm allows us to use concept drift directly to trigger retraining. Subsequently, retraining only happens when it is likely to be needed and when it is expected to solve the underlying performance drop. This saves computing time, resources, cost, and downstream troubleshooting.

Conclusion

In this blog, we explored the challenge of AI aging and how concept drift can deteriorate ML model performance over time. We introduced NannyML's RCD algorithm, which estimates when concept drift occurs and quantifies its impact on model performance. This innovative approach allows for timely and accurate identification of performance issues attributed to concept drift, enabling businesses to maintain model efficacy.
We also presented a new automated retraining workflow using the RCD algorithm. This workflow triggers retraining only when performance drops below-defined thresholds due to concept drift, ensuring that resources are used efficiently. By validating the performance of the retrained model before deployment, businesses can ensure that the new model meets performance requirements, thereby reducing unnecessary retraining and saving on computational costs. This targeted approach to retraining ensures that models remain effective, relevant, and aligned with business needs. Ready to enhance your ML model monitoring and retraining process? Check out the NannyML Cloud 30-day free trial to test the RCD algorithm on your models.

Further reading

This blog follows on from other NannyML issue resolution blogs. To get a fuller perspective of corrective actions and monitoring strategies, see the following blogs:

Frequently Asked Questions

How does the Reverse Concept Drift (RCD) algorithm help manage concept drift?
The RCD algorithm uses a newly trained comparison model to predict outcomes on a reference dataset and compares these predictions with those from the original model on the same dataset. The new model’s predictions are taken as the ground truth. This approach enables quantitation of performance by assessing the difference in model performance under the new and old concepts.
How does the new automated retraining workflow using the RCD algorithm optimize resource usage?
The automated retraining workflow uses the RCD to trigger retraining only when performance drops below-defined thresholds due to concept drift. This targeted approach ensures that retraining occurs only when necessary, saving computational resources, time, and costs. Additionally, businesses can ensure that the new model meets performance requirements by validating the retrained model's performance before deployment using NannyML’s PAPE or CBPE algorithms.

Ready to learn how well are your ML models working?

Join other 1100+ data scientists now!

Subscribe

Written by

Taliya Weinstein
Taliya Weinstein

Data Science Writer at NannyML