DataworksSummit Berlin - Wednesday morning

2018-04-19 06:50 More posts about Apache BigData hortonworks
Data strategy - cloud strategy - business strategy: Aligning the three was one of the main themes (initially put forward in his opening keynote by CTO of Hortonworks Scott Gnau) thoughout this weeks Dataworks Summit Berlin kindly organised and hosted by Hortonworks. The event was attended by over 1000 attendees joining from 51 countries.

The inspiration hat was put forward in the first keynote by Scott was to take a closer look at the data lifecycle - including the fact that a lot of data is being created (and made available) outside the control of those using it: Smart farming users are using a combination of weather data, information on soil conditions gathered through sensors out in the field in order to inform daily decisions. Manufacturing is moving towards closer monitoring of production lines to spot inefficiencies. Cities are starting to deploy systems that allow for better integration of public services. UX is being optimized through extensive automation.

When it comes to moving data to the cloud, the speaker gave a nice comparison: To him, explaining the difficulties that moving to the cloud brings is similar to the challenges that moving "stuff" to external storage in the garage brings: It opens questions of "Where did I put this thing?", but also about access control, security. Much the same way, cloud and on-prem integration means that questions like encryption, authorization, user tracking, data governance need to be answered. But also questions like findability, discoverability and integration for analysis purposes.

The second keynote was given by Mandy Chessell from IBM introducing Apache Atlas for metadata integration and governance.

In the third keynote, Bernard Marr talked about the five promises of big data:

  • Informing decisions based on data: The goal here should be to move towards self service platforms to remove the "we need a data scientist for that" bottleneck. That in turn needs quite some training and hand-holding for those interested in the self-service platforms.
  • Understanding customers and customer trends better: The example given was a butcher shop that would install a mobile phone tracker in his shop window in order to see which advertisement would make more people stop by and look closer. As a side effect he noticed an increase in people on the street in the middle of the night (coming from pubs nearby). A decision was made to open at that time, offer what people were searching for at that time according to Google trends - by now that one hour in the night makes a sizeable portion of the shop's income. The second example given was Disney already watching all it's Disney park visitors through wrist bands, automating line management at popular attractions - but also deploying facial recognition watching audiences watch shows in figure out how well those shows are received.
  • Improve the customer value proposition: The example given was the Royal Bank of Scotland moving closer to it's clients, informing them through automated means when interest rates are dropping, or when they are double insured - thus building trust and transparency. The other example given was that of a lift company building sensors into lifts in order to be able to predict failures and repair lifts when they are least used.
  • Automate business processes: Here the example was that of a car insurance that would offer dynamic rates if people would let themselves monitor during driving. Those adhering to speed limits, avoiding risky routes and times would get lower rates. Another example was that of automating the creation of sports reports e.g. for tennis matches based on sensors deployed, or that of automating Forbes analyst reports some of which get published without the involvement of a journalist.
  • Last but not least the speaker mentioned the obvious business case of selling data assets - e.g. selling aggregated and refined data gathered through sensors in the field back to farmers. Another example was the automatic detection of events based on sounds detected - e.g. gun shots close to public squares and selling that back to the police.


After the keynotes were over breakout sessions started - including my talk about the Apache Way. It was good to see people show up to learn how all the open source big data projects are working behind the scences - and how they themselves can get involved in contributing and shaping these projects. I'm looking forward to receiving pictures of feather shaped cookies.

During lunch there was time to listen in on how Santander operations is using data analytics to drive incident detection, as well as load prediction for capacity planning.

After lunch I had time for two more talks: The first explained how to integrate Apache MxNet with Apache NiFi to bring machine learning to the edge. The second one introduced Apache Beam - an abstraction layer above Apache Flink, Spark and Google's platform.

Both, scary and funny: Walking up to the Apache Beam speaker after his talk (having learnt at DataworksSummit that he is PMC Chair of Apache Beam) - only to be greeted with "I know who *you* are" before even getting to introduce oneself...