Three layered cake of data science
This blog post provides some context for my more extensive blog post titled Data Science is Boring.
What comprises a data science project? I like to think of a data science project as a three layered cake: Business, Analysis, Data.
Generally speaking, during a data science project we move from top to bottom, and back up again, although in reality the path tends to be more mercurial.
From Business to Data, and back again
- A good data science project starts with a clear Business perspective, defining the goal. Why are we doing it? How is it going to benefit the business? How will we measure our success (or failure)?
- Next comes the science: how can we translate our business problem (or opportunity) into a scientific Analysis?
- Once we have a clear picture of the analysis we would like to do, we can deal with the Data: What have we got? And how can we best shape it for our analytical purposes?
- With the data in place, we can execute our Analysis and verify the results.
- Finally, we conclude a project with the Business validation and implementation. Have we met our goals?
I have often seen a data science project described as a process that flows from Ideation through Experimentation to Industrialisation (the exact names of the phases vary):
This description is not contradictory to the Business, Analysis and Data layers, but it has a different purpose. The process flow is very instructive for project execution (for example in an IT-organisation, where many data science projects land) showing clearly which sequences of actions are to be executed and which stage gates are to be passed.
The risk of the process flow description is that it may give the impression that a project has been successfully finished when a model has been deployed in production; it can be easy to forget that the ultimate goal of the project is to generate business value. The layered cake makes it visually clear that a project needs to produce business value in the end. Implementing the required changes in business processes (that are needed to reap the benefits of your model) tends to be the hardest part of a project, so it is important not to forget about this!
I'd love to hear from you! What's the cherry on top of your cake?