Four "R"s to Machine Learning Software Development

Having developed and maintained data-driven products based on ML models in production -- across companies of all shapes and sizes, including internationally (!), I've found that there are 4 themes to ML software development.

    1. Relevance

      • Do you understand the business question, for which you're developing a promising POC to put into production that meets business and technical requirements?

    2. Robustness

      • How robust is the data processing? How high do you score on the ML Test Score (Google, 2017)?

      • Design: Do you have an architecture diagram? Have you defined an input and output spec?

      • Is there a testing suite?

    3. Reproducibility

      • Do you use version control?

      • Package management?

      • Docker?

      • Do you connect to the data source(s) directly?

    4. Readability

      • (Python) Do you follow PEP 8 guidelines? And a style guide?

      • Does the code need refactoring? Or is it (relatively) easy to understand what the code is doing and how to modify it?

      • Is there logging?

      • Should I not ask about documentation? :)

Keywords: Data products, Machine Learning software development

You may also like: