Dear Advisor: How do I avoid the biggest AI/ML mistakes others make?

Post was originally published in June 2020 and has been updated in August and September 2020 for relevancy.

You keep hearing in the media that AI can accomplish anything, and vendors tell you that things just work out-of-the-box. You don't know what you don't know and you don't have the data and analytics expertise.

Having spent 10 years in the industry solving customer pain points with data and analytics to drive product market fit, I've had the unique opportunity to invent 4 novel AI algorithms that brought in millions of dollars in revenue. It's not trivial, takes years -- and a bit of luck -- to develop something that works for a very specific unique use case when other options fail.

If you're not an analytics/AI expert, here are the most common recurring challenges I've seen when it comes to product scope and development of data-driven products. (This blog post is based in part on a "making data-driven decisions" office hour on the 805 Startups Discord server on June 15th, discussing the biggest mistakes I continually see start-ups make when it comes to analytics/AI.)

Missing Information

  1. It’s impossible to get insights from data you didn’t collect. Are you tracking everything about your business?

  2. (On the flip side) Tracking virtually everything about the business, but with many different data providers and vendors, that don't talk to one another.

    • Assessment: Do you know how many new active users -- there are on your platform today?

Too early for ML -- all of the above and:

  1. Starting with ML/AI (e.g. predictive analytics), before understanding what's happening with customers and product historically and now (e.g. descriptive analytics).

    • Assessment: Is a Data Scientist one of your first technical hires?

    • Assessment: Do you currently know who cancelled your service within the last week, and how you acquired that customer in the first place?

Treating ML/analytics as a Silver Bullet -- all of the above and:

  1. Doing analytics for the sake of analytics, without understanding how the result will solve the business/customer's pain point, how it will be used by the stakeholder to solve that pain point, and what data is/isn't available.

    • Assessment: Are you trying to do ML for a process that you don't understand?

  2. Inadvertently scoping out MVP for product/feature to be better than state-of-the-art for ML.

    • Example: Many pitch decks that end with "... and we'll use AI to do this" :(

  3. Buying a multi-year software vendor license without doing an internal POC to see if the software actually solves the problem you bought it for.

    • Assessment: What's not working now? What does it need to have for in short-term? long-term?

Executing on ML products as if they're software engineering tasks -- all of the above and:

  1. Thinking you have clean data :)

  2. Developing biased models because of biased data or 4 other reasons.

  3. Not treating ML as data products that you scope down and iterate over, from proof-of-concept (POC) to v1, v2, etc.

    • Assessment: Do you have an ad-hoc/simple model that answers your business question, that you can compare + evaluate the next iteration of the ML model against?

  4. Asking for a guarantee on ML model performance (based on ML metrics, KPI, etc.):

    • Guarantee that offline model performance will be at least X -- which is impossible to guarantee because it depends on data quality, or

    • Guarantee that live model performance will be just as good or better, than the offline model -- which is impossible to guarantee, because customer behavior or product offering changed, data collection/processing pipeline broke/changed, or model may have (inadvertently) overfit offline data, etc., or

    • Guarantee that a live model will never need to be updated

5. Difficulty hiring DS/ML/AI candidates, most likely because job descriptions list software requirements over a 30/60/90 plan of what the expectations and deliverables look like.

6. Not knowing about the Hidden Technical Debt in Machine Learning Systems

  • Assessment: Do you set aside time to tackle software engineering and machine learning debt?

7. Not understanding what the algorithm is doing.

  • Assessment (start-up): Can you give a 1-2 sentence overview of what the algorithm is doing?

  • Assessment (data scientist): Can you give a 1-2 sentence non-technical overview of what the algorithm is doing? Why you picked that one? And why you chose the parameters you did?

Not executing on (aspects of) ML products as if they're software engineering tasks -- all of the above and:

  1. Not testing and monitoring ML products in production

    • Assessment: How high do you score on the ML Test?

Nobody is perfect, has clean data, or has ML running with no downtime in production. Now that you know what to focus on, start small and iterate.

Did you try this yourself -- and now need more support with what that process looks like for you, or how to execute it? Please reach out.

Keywords: AI, ML, start-ups, data strategy, data products, customer understanding

You may also like:

References: