Forecasting beer sales for HEINEKEN’s customers

By | Big data, Data science | No Comments

As one of the world’s leading brewers, HEINEKEN works together with its customers to offer a diverse array of over 250 brands to consumers in over 170 countries, of which 70 have breweries present. One of the leading challenges in serving consumers is to ensure on-shelf-availability of its products in retail outlets. Market research has shown that when consumers consistently notice their favourite brand is missing from the shelves, they may quickly make the choice for a competitor’s brand instead. This blog outlines a tutorial to get started with retail sales forecasting, a necessary component in preventing such situations.
Read More

Integrating Pandas and scikit-learn with pipelines

By | Data engineering, Data science | No Comments

Integrating Pandas and scikit-learn with Pipelines

Scikit-learn and Pandas are both great tools for explorative data science. Both require a bit of practice to get the hang of. The phrase “Hey Alex, how do I achieve the following in Pandas?”, is often heard in our office.

Everyone at BigData Republic – no exceptions – is familiar to some degree with Python and Pandas. Nearly all products start off as a clutter of Pandas operations before gradually maturing into a production-ready code base.
Read More

Custom Optimizer in TensorFlow

By | Uncategorized | No Comments

Introduction

Neural Networks play a very important role when modeling unstructured data such as in Language or Image processing. The idea of such networks is to simulate the structure of the brain using nodes and edges with numerical weights processed by activation functions. The output of such networks mostly yield a prediction, such as a classification. This is achieved by optimizing on a given target using some optimisation loss function.
Read More

RE-WORK Deep Learning Summit London

By | Uncategorized | No Comments

On the 21th and 22nd of September we attended the RE-WORK Deep Learning Summit in London. This event was focused on the development of deep learning techniques and business applications based on deep learning. Facebook and Amazon presented their Natural Language Processing research using neural networks, which was a recurring theme in the summit. Bayesian Neural Networks and Generative models were other main topics discussed at the summit. Read More

Flink Forward Berlin 2017 – An overview

By | Uncategorized | No Comments

A small delegation of BigData Republic consultants went to the Flink Forward conference in Berlin. One of the the key statements in this conference which resonated was: “ING is an IT company with a banking license.” In a world where data is increasingly important to keep ahead of competition, realizing that excellent IT infrastructure and data-driven software is key to realize a data driven vision. This realization also became clear in the statement to view Flink Jobs as applications on their own, delivering business value. Most talks were more technical and did not always have a clear pointer to the actual business value delivered. In this blog we give a brief overview of what we view as the highlights of this conference. Read More

Machine learning for predictive maintenance: where to start?

By | Big data, Data science | No Comments

Think about all the machines you use during a year, all of them, from a toaster every morning to an airplane every summer holiday. Now imagine that, from now on, one of them would fail every day. What impact would that have? The truth is that we are surrounded by machines that make our life easier, but we also get more and more dependent on them. Therefore, the quality of a machine is not only based on how useful and efficient it is, but also on how reliable it is. And together with reliability comes maintenance. Read More

Why do Big Data projects fail and how to make it succeed?

By | Big data, Data science | No Comments

Big Data has the goal to automate delivery of actionable business insights from data. In order to do this, you often end up wanting diverse data sources, large data sets and a vast amount of computational power. However, most are symptoms of an approach, not prerequisites of the goal.

This often leads to higher management focusing on tools used by competition instead of focusing on why the competition is using that tool in the first place and what steps are required to end up in the same league. Read More

Key takeaways from the Scala days keynote

By | Data engineering | No Comments


There’s no Scala Days conference without a keynote of Martin Odersky himself. This year he spoke about his current work: Dotty. Dotty is the new Scala compiler that will be part of Scala 3. The first release candidate was released just hours before the keynote and comes with the compiler itself (dotc), a repl (doti), a doc tool (dotd) and an IDE. It implements the MS language server protocol, enabling it to serve several front ends: VS-Code and Emacs (IntelliJ support is in the works). With Dotty, IDE’s can use the regular compiler as the presentation compiler.

Read More

How to obtain advanced probabilistic predictions for your data science use case

By | Data science, Deep learning | No Comments
Many data science use cases involve predicting a continuous quantity. For instance, a grid operator might want to predict the energy consumption level for a group of households for next week. In order to deliver these predictions, the Big Data Scientist will apply machine learning algorithms to a large collection of features, such as the family size, weather forecasts, property value and last weeks consumption levels. There are many use cases of this type, for example, predicting sales numbers, hotel rooms booked, money transfers or the time-to-failure of critical components. But what number do we actually want our algorithm to output?
Read More