ML in Production is doing WORSE than locally

First, congratulations on getting your model to production. That was no easy task!

Remember that typically models in production will perform (at least) slightly worse than offline models, even if you used the same metrics to evaluate them, because the model is seeing data is hasn't seen before. But what are some reasons for why the difference is big?

  • As an aside: If the model performance was almost perfect even in production (without leakage), you may not need ML -- a rules-based approach may be just as good (and easier to maintain).

Part I: Possible Reasons

If you're seeing a large difference in model performance between what you saw when you ran it locally to what's now happening in production, here are some possible reasons:

  • Overfitting on the local version

    • Assessment: How did it perform locally on a brand new out-of-sample data that it’s never seen before?

  • Bug in data processing/ETL pipeline/etc.

  • Schema changed

  • Data sample/split used for model training is missing seasonality in production process

  • Metrics on live model are different than what was optimized for locally

  • Model drift, for instance because customer behavior's changed or biased data

  • Intervened to stop poor performance when model went live

  • (Most likely cause) Leakage on local version

Part II: Diving Deeper into Leakage

What is leakage? Data leakage happens when training data has information you’re trying to predict [ref].

  • e.g. (Most common) Leaking information from future into past [ref]

    • Assessment: Are you seeing 99.9% accuracy in your offline model?

  • e.g. “Leaking data from the test set into the training set” [ref]

Examples of Leakage:

Part III: Recommended Next Steps

Depending on the model performance reason, you may want to consider turning off the live model, and explore locally -- on an old and also fresher data extract -- the potential reasons (above) for seeing such different model performance live vs offline.

Do you need an expert to help you figure out what happened? Please reach out.

Keywords: AI/ML in production, data products, customer understanding

You may also like: