BBDC workshop on Data Processing and Machine Learning

News

Workshop on Distributed and Stream Data Processing and Machine Learning at Smart Data Forum

The workshop Distributed and Stream Data Processing and Machine Learning held by Berlin Big Data Center (BBDC) took place on March 22, 2019 at the Smart Data Forum (SDF) showroom. Over 70 participants came to hear four insightful presentations on data processing and machine learning.

Prof. Dr. Volker Markl, the organizer of the workshop, spoke about the latest research of the BBDC, Database Systems and Information Management (DIMA) group at the Technical University (TU) Berlin as well as the research group Intelligent Analytics For Massive Data (IAM) at the German Research Center for Artificial Intelligence (DFKI). Since part of the research employs the Apache Flink framework, he mentioned that the startup data Artisans – a spin off from TU Berlin – was successfully commercialized and that it is now the time to move on to develop new systems in a different environment.

Albert Bifet, Professor at Telecom ParisTech, head of the Data, Intelligence and Graphs Group (DIG) as well as research associate at the WEKA Machine Learning Group, gave a speech on Machine Learning for Data Streams. According to him, data availability and computational scale has increased substantially in the last year, therefore, machine learning is gaining relevance and is developing faster than ever before. Prof. Bifet also referred to the French AI strategy discussing some key challenges. According to him, one priority of the French AI strategy is the Green AI, for which he suggests transferring Big Data into Small Data in order to make data stream methods more efficient. Moreover, he addressed the challenges of Explainable AI as well as ethical issues arising from these technological developments.

Almar El Abbadi, Professor of Computer Science at the University of California in Santa Barbara, continued with a talk on The Cloud, the Edge and Blockchains: Unifying Them. He spoke about how to manage large data sets, focusing mainly on scalability, availability and fault tolerance as well as consistency of data that is located in different places. The professor argued that the fields of distributed systems, data management as well as cryptography need to be further integrated in order to better work on issues like throughput and latency.

Seif Harifi and Paris Carbone concluded the workshop by presenting the topic From Stream Processing to Continuous and Deep Analytics. Seif Harifi, Chief Scientific Advisor of RISE SICS and Chair-Professor of Computer Systems at KTH Royal Institute of Technology in Stockholm, talked about the current research projects of his team: the startup Logical Clocksdeveloped Hopsworks, which is a key project at RISE SICS. Furthermore, his team works on Apache Flink as well as on Continuous Deep Analytics (CDA).

Paris Carbone, senior researcher at the Swedish Institute of Computer Science, which is associated with RISE, presented the current research on CDA. Its mission is to design systems that allow to go efficiently from data to decision making. The project aims at providing a unified approach to declare and execute analytical tasks and their seamless integration with continuous services, streams and data-driven applications at scale. This is especially relevant as there are more data centric applications than ever before, such as relational data streams, dynamic graphs, simulation tasks, feature learning and many more. Additionally, he explained Arc, the language used that captures batch and stream analytics as well as a sophisticated distributed runtime.