How to Go Bankrupt: A Quick Look at Unmonitored Deployed Models

Do not index

Canonical URL

AI and machine learning models are making headlines these days, going from news on how generative models might make it to customer service positions to the most recent Physics Nobel Prize being awarded to advances in AI.

Obviously, we all want in on the game, and companies are racing to include chatbots, generative AI solutions, and predictive ML models within their ranks to avoid being left behind in the midst of seemingly meteoric technological advances. However, the implementation of ML Models can be a double-edged sword. It is common for machine learning models to lose around 20% of their value within six months of deployment, and sometimes this degradation can lead to catastrophic losses for companies or even their demise, as was the case during last year’s infamous Silicon Valley Bank fallout.

Unmonitored Models: Short-Term Gain, Long-Term Pain

A predictive model with no supervision might seem like a band-aid that will temporarily take care of a problem; if it stops working, the only issue is its unusability, right? But it can actually pose great risks to companies and institutions when the inevitable happens: consumers’ behavior changes, investments have been made, interests and inflation have been raised and liquidity has… evaporated into thin air. During the pandemic, Silicon Valley Bank invested heavily in debt security while interest rates were low assuming they would remain low, but after 2021 most of the tech sector was trying to get back on their feet post-lockdowns and, generally, an economic recession. It was not a great omen that most of SVB’s customers were, in fact, from the tech sector. And when most of your clients need cash all at once, chaos ensues. All those low interest rates investments had now turned into high interest, and the low-yield treasury bonds that were supposed to pay interest had lost all their profit due to inflation. Hence, unrealized losses. Lots of them. Which created even more unsteadiness among their clients. Which ultimately led to the bank’s downfall.

According to some, SVB’s fallout was the cause of the employment of “bad models” while assessing its own risks.

The Telltale Signs of Post-Deployment Model Failure

Could this have been avoided? Could the Silicon Valley Bank collapse have been predicted and taken into account through model monitoring?

ML models tend to assume that the future is equal to the past, which is especially untrue when dealing with fast-paced environments such as, well, the world after a global pandemic. In fact, many banks and governments kept granting loans unchanged before, during and right after the pandemic, which has resulted in subpar dynamics, to say the least, as people and businesses are still struggling to pay off their debt. In fact, being one of the largest lenders, JPMorgan Chase had to reassess its credit models, which were based on pre-pandemic data, to better understand borrower risk in the new global conditions.

Consumer loans during the period of March, 2019 to March, 2021. Source: FDIC

A loan model, and many others, can bring billions per year, but if the predictions decay over time, it could go from making billions to losing billions. On a big scale, the effects of underestimating risk have been crystal-clear in 2008 as well as recently during the SVB collapse. Machine learning model failure is referred to as “silent” because the signals are not really visible to the naked eye; the model still outputs predictions that actually look reasonable when inspected up-close. However, performance is what significantly degrades, but it is generally difficult or even impossible to access realized performance as actual target data isn’t always available.

The main issue is that re-training models is quite expensive. While target data changes with time, training data remains static and the model starts making wrong assumptions. That’s our challenge.

Why does a Model Fail?

As time goes by, data and patterns between what a model has seen during training and what it is predicting change, so the performance decays. For example, a loan evaluation model might have trained exclusively on clients with seniority in their jobs. When entry-level workers start applying for loans, the model will have lower confidence in its predictions since it has never seen data including those profiles. This causes the model to degrade and potentially underestimate risk.

As we have come to learn, there can always be stochastic shocks. Regulations, sudden recessions, or other external influences that affect people’s behavior. If the government updates regulations and favors lending, for example, more customers from different backgrounds may apply for loans, which is data that the model has not trained on. These shifts in population can also happen when a company expands to different regions or territories with customers who have very different patterns of behavior from the ones on the training data.

But why do models deployed in very dissimilar industries fail? What do they have in common?

covariate shift

concept drift

data quality

Covariate shift refers to a shift in the variables. The distribution of a variable shifts, but that doesn’t imply a change in the target variable. This could happen as inflation is reflected on income, but the target variable (probability of defaulting, for example) remains unchanged. That is, money is worth less due to inflation, but people earn more money, so the outcome doesn’t really change. Covariate shift can happen gradually, as with inflation, or suddenly, as with stochastic shocks.

Concept drift, on the other hand, implies a change in the relationship between a given variable and the target. That means that the actual concept of how we understand that variable is changing in the real world. Going back to the same example, if the market suffers inflation but income does not actually reflect a change in distribution, the target variable (again, probability of defaulting) could change. People earn the same amount of money, but money is worth less. Hence, they are unable to pay back their loans.

Lastly, what nobody wants to hear: data quality is fundamental. ML models can seem like black boxes in which we throw huge datasets and, after doing their magic, they spit predictions back out. But the reality is that, if the data that is fed to the model is low quality or suddenly changes because of human error or a shift in data collection (if a company stops asking about income, for example), predictions will deteriorate.

A more in-depth read on data drift can be found in this article we wrote a while back.

If left unchecked, all these changes in data and consumer behavior can lead to catastrophic outcomes. The complexity of the issue relies on the inability of the human eye to detect the warnings in time.

The Aftermath of Model Degradation

In many industries, the best-case scenario when model degradation issues are not resolved is maybe losing clients or not making as much money. But what happens when faulty predictions lead to catastrophic decisions? When a financial model systematically approves loans for negligent clients? When a car image recognition model doesn’t identify road signs correctly? Or when analytic software misdiagnoses patients? With time, the consequences of model decay become evident, but then it is too late. In some industries, monitoring deployed models can even be a matter of life and death. In most, it can definitely help prevent bankruptcy, which could’ve been the case even for Silicon Valley Bank.

And what’s more: no process happens isolated. Loan approval involves marketing and customer service. Autonomous cars’ performance involves manufacturing and engineering teams. Medical imaging classification involves medical equipment, healthcare providers. And so on and so forth. Thus, if a model decays, energy and resources invested in other departments go to waste, as well as money.

How can NannyML help?

NannyML’s monitoring software can estimate degradation in any model, even after deployment. When incorporated into the production pipeline, NannyML will raise alerts when things go south and let the Data Science team know something is up.

The alerts happen only when something is actually detrimental to predictions. At NannyML we know that data drift doesn’t always imply model performance degradation. If there is data drift, but that doesn’t really affect model performance, then it’s not necessary to pay close attention to it. This helps focus energy and efforts where they matter and prevents wasting time on checking changes that might not be as relevant, thus averting burnout (aka alert fatigue).

So, what are the main steps to monitor models?

In order to assess model degradation, Performance Monitoring is key. This is where NannyML does its magic. Maybe everything is going well and no further action is needed. However, if model decay is confirmed, then further analysis is mandatory to identify the root cause of the problem. This root cause analysis includes data quality checks as well as looking at possible concept drift and data drift, both univariate and multivariate. Lastly, after the cause of the issue has been pinpointed, the problem can be solved through the necessary strategies, whether they have to do with the model itself or with the company’s business process.

In order to prevent unwelcome surprises (or even bankruptcy), a company should have a monitoring workflow in place at all times to make sure predictions are running smoothly, and take action whenever they suffer a hiccup. ML model monitoring is mandatory if we expect models to perform well over time and be robust when faced with new inputs. That’s why integrating a monitoring system such as NannyML’s is a must. Gone are the days when deploying a model and letting it be was the norm.

Summary

Unmonitored models put companies at serious risk of facing catastrophic consequences. A deployed model with no performance monitoring is part of a perfect recipe for disaster.

The alerts visible to our naked eye are only symptoms of a far greater problem that, by the time it is detected, has already made irreparable damage. By monitoring the performance of your ML models with NannyML, you can rest assured that potential failures do not go unnoticed so that you can take necessary action.

Not only can NannyML save companies a vast amount of money (and perhaps even prevent bankruptcy), but it can also give them the peace of mind needed to target efforts towards growth and improvement instead of towards patching up holes in their deployed machine learning models.

To learn more about NannyML and contribute to its community, make sure to check out github.com/nannyml.