Dec. 1, 2020

Three layered cake of data science

Erik Jan de Vries

Data Scientist

This blog post provides some context for my more extensive blog post titled Data Science is Boring.

What comprises a data science project? I like to think of a data science project as a three layered cake: Business, Analysis, Data.

Three layered cake of a data science project. (Here's how to bake your own at Odlums.)

Generally speaking, during a data science project we move from top to bottom, and back up again, although in reality the path tends to be more mercurial.

This is not BAD at all, but the acronym makes it easy to remember the layers

From Business to Data, and back again

  1. A good data science project starts with a clear Business perspective, defining the goal. Why are we doing it? How is it going to benefit the business? How will we measure our success (or failure)?
  2. Next comes the science: how can we translate our business problem (or opportunity) into a scientific Analysis?
  3. Once we have a clear picture of the analysis we would like to do, we can deal with the Data: What have we got? And how can we best shape it for our analytical purposes?
  4. With the data in place, we can execute our Analysis and verify the results.
  5. Finally, we conclude a project with the Business validation and implementation. Have we met our goals?

Complementary views

I have often seen a data science project described as a process that flows from Ideation through Experimentation to Industrialisation (the exact names of the phases vary):

The data product life cycle

This description is not contradictory to the Business, Analysis and Data layers, but it has a different purpose. The process flow is very instructive for project execution (for example in an IT-organisation, where many data science projects land) showing clearly which sequences of actions are to be executed and which stage gates are to be passed.

The risk of the process flow description is that it may give the impression that a project has been successfully finished when a model has been deployed in production; it can be easy to forget that the ultimate goal of the project is to generate business value. The layered cake makes it visually clear that a project needs to produce business value in the end. Implementing the required changes in business processes (that are needed to reap the benefits of your model) tends to be the hardest part of a project, so it is important not to forget about this!

I'd love to hear from you! What's the cherry on top of your cake?