Dear Advisor: How do I avoid the biggest Data/AI/ML mistakes others make?

Post was originally published in June 2020 and has been updated in August 2020, September 2020, April 2021, and April 2022 for relevancy.

You keep hearing that you need big data, the media tells you that AI can accomplish anything, and vendors tell you that things just work out-of-the-box. You don't know what you don't know -- and you don't have the data and analytics expertise.

I've spent 10 years in the industry solving customer pain points with data and analytics to drive product market fit, and have 4 inventions of novel AI algorithms behind my belt -- inventions that brought in millions of dollars in revenue. I'm here to to tell you that developing data-products with high business impact are not trivial. And developing novel algorithms may take takes years (and a bit of luck!) to develop something that works for a very specific unique use case, when other options fail.

If you're not a data/analytics/AI expert, here are the most common recurring challenges I've seen when it comes to product scope and development of data-driven products.

Not answering the Business Question

  1. Doing data processing or creating predictive models without understanding how the result will solve the business/customer's pain point, including how it will be used by the stakeholder to solve that pain point.

    • Assessment: Do you know how the output of the ML model will be used to specifically answer the business/customer's pain point?

    • Assessment: Are you trying to do ML without a question that the business needs help answering?

  2. Implementing data warehouse as the first data-related task, without understanding how the data will be used help the business make better, data-driven decisions.

    • Assessment: Is there an existing POC deliverable that suggests there's high value in existing data -- and the workflow "just" needs to be more streamlined?

Missing Information -- all of the above and:

  1. It’s impossible to get insights from data you didn’t collect. Are you tracking everything about your business?

  2. (On the flip side) Tracking virtually everything about the business, but with many different data providers and vendors, that don't talk to one another.

    • Assessment: Do you know how many new active users -- there are on your platform today?

Too early for ML -- all of the above and:

  1. Starting with ML/AI (e.g. predictive analytics) to automate insights/forecasts, before understanding what's happening with customers and product historically and now (e.g. descriptive analytics).

    • Assessment: Is a Data Scientist one of your first technical hires?

    • Assessment: Do you currently know who cancelled your service within the last week, and how you acquired that customer in the first place?

2. Your MVP is under development.

Treating ML/analytics as a Silver Bullet -- all of the above and:

  1. Doing analytics for the sake of analytics, without understanding how the customer actions generated the data you see -- and how the deliverable will solve the business/customer's pain point.

    • Assessment: Are you trying to do ML for a process that you don't understand?

  1. Inadvertently scoping out MVP for product/feature to be better than state-of-the-art for ML.

    • Example: Many pitch decks that end with "... and we'll use AI to do this" :(

  1. Buying a multi-year software vendor license without doing an internal POC to see if the software actually solves the problem you bought it for.

    • Assessment: What's not working now? What does it need to have for in short-term? long-term?

Executing on ML products as if they're software engineering tasks -- all of the above and:

  1. Thinking you have clean data :)

  2. Assuming that the data won't change :(

  3. Developing biased models because of biased data or 4 other reasons -- or the 21 (sometimes contradictory) definitions of fairness.

  4. Not treating ML as data products that you scope down and iterate over, from proof-of-concept (POC) to v1, v2, etc.

    • Assessment: Do you have an ad-hoc/simple model that answers your business question, that you can compare + evaluate the next iteration of the ML model against?

    • Recommendation: For each POC, time-box data the exploration stage to help you scope down (and scope out) the next phase.

  5. Asking for a guarantee on ML model performance (based on ML metrics, KPI, etc.):

6. Not knowing about the Hidden Technical Debt in Machine Learning Systems

  • Assessment: Do you set aside time to tackle software engineering and machine learning debt?

7. Not understanding what the algorithm is doing.

  • Assessment (start-up): Can you give a 1-2 sentence overview of what the algorithm is doing?

  • Assessment (data scientist): Can you give a 1-2 sentence non-technical overview of what the algorithm is doing? Why you picked that one? And why you chose the parameters you did?

Not executing on (aspects of) ML products as if they're software engineering tasks -- all of the above and:

  1. Not testing and monitoring ML products in production

    • Assessment: How high do you score on the ML Test?

Difficulty hiring data professionals

  1. Who should be my first data hire? to help you:

  • Align job title and description of requirements, and/or

  • Align job title/description with how the role actually fills the needs of the team, including listing software packages over a 30/60/90 plan of what the expectations and deliverables look like.

Nobody is perfect, has clean data, or has ML running with no downtime in production. Now that you know what to focus on, start small and iterate.

Do you need an expert to help you improve your product market fit and scale by leveraging data to make your customers happier? Please reach out.

Keywords: AI, ML, start-ups, data strategy, data products, customer understanding

This blog post was originally based on an office hour I hosted on the 805 Startups Discord server on June 15th, 2020.

You may also like: