3 Custom Metrics for Your Forecasting Models

Do not index

Canonical URL

Achieve more with Custom Metrics

Many data scientists struggle to demonstrate the true impact of their models on business decisions because common metrics like MSE or RMSE don’t reflect real-world needs.

When business stakeholders are asking for actionable insights and better forecasting, these gaps in traditional metrics leave you feeling stuck. You’ve worked hard to build a model that should add value, but the wrong metrics can make it look like your work is falling short.

By employing custom metrics that align more closely with business needs, you can demonstrate the real value of your work.

I’ll cover weighted MAE, forecast attainment, and SMAPE—each offering a unique way to measure model performance that truly matters in demand forecasting. Plus, you’ll find Python code snippets throughout the blog for all the metrics discussed.

Weighted Mean Absolute Error

The costs of under-forecasting and over-forecasting are rarely equivalent. In many cases, underestimating demand can lead to stockouts, creating missed sales opportunities and potential customer dissatisfaction. On the other hand, overestimating can result in excess inventory, tying up capital and increasing storage costs.

Despite their distinct financial consequences, traditional metrics like MAE treat these errors equally—a 10-unit over-forecast is penalized the same as a 10-unit under-forecast. This oversimplification can lead to flawed decisions, as it fails to reflect the true economic impact of forecasting errors.

Weighted Mean Absolute Error (WMAE) resolves this by assigning different weights to over- and under-forecasting errors.

To add a custom metrics for a regression usecase in NannyML Cloud, you need two Python functions, loss and aggregate.

In my WMAE example, heavier penalties are placed on underestimates, but this can be tailored or expanded in complexity to suit your specific needs.

import pandas as pd
import numpy as np

def loss(
    y_true: pd.Series,
    y_pred: pd.Series,
    chunk_data: pd.DataFrame,
    aggregate: str = 'mean',
    **kwargs
) -> np.ndarray:
    underforecast_weight: float = 2.0
    overforecast_weight: float = 1.0
    
    
    mask = y_true.notna() & y_pred.notna()
    y_true_valid = y_true[mask]
    y_pred_valid = y_pred[mask]
    
    
    underforecast_mask = y_pred_valid < y_true_valid
    overforecast_mask = y_pred_valid >= y_true_valid
    
    
    underforecast_errors = np.abs(y_true_valid - y_pred_valid) * underforecast_weight
    overforecast_errors = np.abs(y_true_valid - y_pred_valid) * overforecast_weight
    losses = np.zeros_like(y_true_valid)
    losses[underforecast_mask] = underforecast_errors[underforecast_mask]
    losses[overforecast_mask] = overforecast_errors[overforecast_mask]
    full_losses = np.full_like(y_true, fill_value=np.nan, dtype=float)
    full_losses[mask] = losses

    return full_losses

import numpy as np
import pandas as pd

def aggregate(
    loss: np.ndarray,
    chunk_data: pd.DataFrame,
    **kwargs
) -> float:
    return loss.mean()

After adding these two functions to the Cloud dashboard and re-fitting the model, the custom metrics will appear alongside the standard metrics like this:

📌

Learn how to seamlessly integrate these custom metrics into your NannyML Cloud setup. Check out the full tutorial here

Forecast Attainment

Forecast attainment is a metric used to assess how closely your forecast aligns with actual results. It’s valuable in operational settings where hitting a specific demand target is a key focus.

Put simply, it answers: How much of the projected demand was fulfilled?

Anything above 100% indicates under-forecasting, where the actual demand exceeded the forecast, while values below 100% reflect over-forecasting, where the projected demand was higher than what was needed.

This metric shines when the priority is hitting precise targets rather than minimizing errors across the entire dataset.

import numpy as np
import pandas as pd

def loss(
    y_true: pd.Series,
    y_pred: pd.Series,
    chunk_data: pd.DataFrame,
    **kwargs
) -> np.ndarray:
    attainment_values = (y_true / y_pred) * 100
    return attainment_values

import numpy as np
import pandas as pd

def aggregate(
    loss: np.ndarray,
    chunk_data: pd.DataFrame,
    **kwargs
) -> float:
    return np.sqrt(np.mean(loss ** 2))

Symmetric Mean Absolute Percentage

MAPE has a drawback, when the relationship between the actual values and predicted values are very different, the percentage error becomes extremely large.

A small forecasting mistake for low demand can seem disproportionately large in percentage terms, distorting the analysis. And when actual values hit zero, MAPE becomes unusable, as it cannot handle division by zero.

The Symmetric Mean Absolute Percentage Error (SMAPE) was developed to address some these limitations.

There are two formal ways to define SMAPE:

This form keeps SMAPE bounded between 0% and 100% by normalizing both actual and predicted values. It can be calculated using:

The second version simply adds a 2 in the denominator which increases the upper limit of the metric to 200%. SMAPE100 seems more obvious but this version, the SMAPE 200 overestimates slightly in case of negative percentage errors and it slightly underestimates in case of positive percentage errors.

I have implemented SMAPE-200 below.

def loss(
    y_true: pd.Series,
    y_pred: pd.Series,
    chunk_data: pd.DataFrame,
    **kwargs
) -> np.ndarray:
    
    valid_mask = y_true.notna() & y_pred.notna()
    y_true = y_true[valid_mask]
    y_pred = y_pred[valid_mask]

    if y_true.empty or y_pred.empty:
        return np.zeros(len(y_true))  
    
    denominator = (np.abs(y_true) + np.abs(y_pred)) / 2
    denominator = np.where(denominator == 0, 1e-10, denominator)  
    smape_values = np.abs(y_true - y_pred) / denominator * 100
    
    return smape_values

def aggregate(
    loss: np.ndarray,
    chunk_data: pd.DataFrame,
    **kwargs
) -> float:
    return loss.mean()

A decreasing SMAPE is a good sign, showing that the model's predictions are getting sharper over time. As the gap between the actual and predicted values shrinks, it means the forecasting is becoming more reliable.

However, it’s important to keep an eye on any sudden jumps in the SMAPE graph. These spikes might indicate problems like overfitting or data inconsistencies that need to be looked into to keep the model performing well.

Conclusion

Custom metrics are game-changers for showcasing the real impact of your models.

Weighted Mean Absolute Error (WMAE) helps you address the financial implications of forecasting mistakes.

Forecast Attainment keeps your targets on track, while SMAPE gives you a clear view of prediction accuracy.

Most of a machine learning model’s life happens after deployment, yet 91% of models degrade. There are three prime reasons behind model degradation: Covariate Shift, Concept Drift and Data Quality Issues. To deal with these, we have developed a suite of algorithms designed to quantify their impact and identify potential faults before they affect your business.

Ready to integrate post-deployment data science into your solutions? Schedule a call with the founders of NannyML today.

Don’t just deploy and forget—monitor continuously so your model delivers lasting value.

Frequently Asked Questions

What are the key KPI for demand forecasting

Key KPIs include forecast accuracy, inventory turnover, stockout rate, and customer service level. These metrics help assess the effectiveness of demand planning and inventory management.

What are the best metrics for evaluation forecasting?

The best metrics for evaluating forecasting are those that align business goals with technical accuracy. Weighted Mean Absolute Error (WMAE), Forecast attainment and Symmetric Mean Absolute Percentage Error (SMAPE) are metrics that can help in evaluation.

What are the metrics for demand forecasting accuracy?

Common metrics for forecasting accuracy are Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Symmetric Mean Absolute Percentage Error (SMAPE). These provide insights into forecast reliability.