Data Analysis

Data analytics focuses on extracting valuable insights from heterogeneous data sources with multiple interrelations. Visualisation of the results in a comprehensible and exhaustive manner is also part of the analytics process, so that the results are presented in a meaningful way that are ready to be used. This helps business make smarter decisions, reduce costs and develop new products and services of higher quality.

We present the recent challenges and trends of this topic as well as Machine Learning systems and the most important programs and initiatives. Our collection of related sources helps to delve further into the topic.

© Smart Data Forum

Machine Learning

When speaking of data analysis, it is inevitable to mention machine learning (ML). This is a process of getting computers to learn from large amounts of data by finding patterns and extracting information that help the machine make decisions without any human intervention (programming). The algorithms have to be trained by feeding them with significant amounts of data. Out of the multiple methods of getting computers to learn, deep learning is a popular subset of ML for good data representation learning. Deep learning systems can be successfully applied to information retrieval, object recognition and detection, sentiment analysis, personalized medicine and others. Several open-source machine learning / deep learning frameworks are listed below.

  • MXNet – open source framework based on deep learning that is designed to train and execute deep neural networks. Due to its scalability to multiple GPUs and computers, the system can train algorithms quickly. It supports many programming languages, such as Python, JavaScript, Go, C++, Scala, Matlab, R and many more.
  • TensorFlow – open source software library that performs numerical computations. It works across many platforms.

Deep Learning Libraries and Frameworks

  • Caffe – deep learning framework focused on speed, expression and modularity.
  • Caffe2 – modular and scalable deep learning framework that gives the possibility to try new deep learning algorithms and models.
  • Theano – library based on Python that offers operations, such as defining, optimizing and calculating of math expressions using multi-dimensional groups.
  • Torch – scientific computing system focused on GPUs, with a major support for machine learning, signal procession, parallel processing, computer vision and more.
  • PyTorch – open source library based on deep learning designed for fast and flexible experimentations. It works with Python, is used in natural language processing and comes with strong GPU acceleration.
  • Chainer – open source framework for neural networks. It is able to run on many GPUs, supporting per-batch architectures. The framework functions on Python, making the code easy to debug.
  • Keras – a Python-based deep learning library, able to run on top of TensorFlow. It offers fast prototyping though modularity and extensibility, supporting also convolutional and recurrent networks.

Challenges

  • Efficient execution on heterogeneous hardware environments.
  • Efficient delivery and application of trained models on different hardware environments.
Selected Programmes & Initiatives
  • Smart Data Innovation Lab – SDIL Platform is the powerful in-memory computing infrastructure offered free-of-charge to research projects. (BMBF)