Dear advisor: How do i get clean data? (or) Why clean data doesn't exist

June 2021, updated June 2022 and February 2023 for clarity

Every company has data challenges (no joke!). What that boils down to, is that throughout my career as a data expert, there’s 2 questions that comes up again and again (from founders and data professionals):

“I hear data professionals spend 90% of their time doing data cleaning -- how do I get clean data to work with from the very beginning?” and

"So you have your PhD in Statistics. You're a data expert. Can you we get your stamp of approval that our data's perfect?"

These are great questions! I’m here to disappoint you... because I'm here to tell you... clean/perfect data just doesn't exist -- and that’s actually good thing (!). Here’s why:

Best Case Scenario: Incomplete Data is Expected

It's expected because it reflects the underlying issues that your customers are experiencing -- and that’s a good thing.

In all other cases -- clean data is not a good thing.

Typical Scenario 1: Uninspected Data

You’re tracking seemingly everything, but not checking it as it’s coming in, from the very beginning; so you don't know how clean or complete it is. Jeff Wilke argues that this "uninspected data is always wrong".

Typical Scenario 2: (Partially) Incomplete Data

On the flip side, is it possible that you're not tracking all of the product and customer touch-points?

Typical Scenario 3: Biased Data

Or you're only tracking certain (customer) events/outcomes?

Typical Scenario 4: No Data 

Or you don't know how to get started?

Other Reasons for "Weirdness" in Data:

In my experience, the most common reasons for any weirdness in the data may be due to

Recommended Next Steps

Do you need an expert to help you execute these steps, to help your company make better data-driven decisions? Please reach out.

You may also like:

References