Unit Testing for Data Science


As data scientists or analysts, our goal is to extract as much knowledge and value from data as possible. We write code to do it, and that code may be quick and dirty. Often, that’s OK, because we are exploring and trying things that we might not end up using.

But if you are successful, that messy and imperfect code ends up being regularly run, maybe even deployed, and, soon becomes a problem.

  • Have you ever tried to change a model, only to break it?

  • Have you ever had a working model fail because the input data changed?

  • Have you ever had problems with models breaking when handed over to or from someone else?

  • Have you ever been hit by the same bug twice?

  • Has fear of breaking something ever stopped you from changing your code?

  • Has any of the above ever made you miss a deadline or make a project take longer than expected?

If so, automated testing, or unit testing, will help.


There’s no free lunch in statistics, but, when doing data science or analytics, unittesting is pretty close to one.

Testing is easy, once you know how to, and start doing it. In fact, you’ll go from frustration to calmness, from churn to flow. Yet despite the benefits, I have run into many data scientists who do not test. You have to know how to get started, and, you have to know how to keep going.

I made this site to help you get started with automated testing in data science and go from churn to flow.