Data Acquisition

Large amounts of data are being generated every second by billions of devices: from smartphones, wearables, social media, to the Internet of Things, Industry 4.0 and cyber-physical systems. All these different sources generate highly heterogenous types of data. Before you can start to extract insights from data sets you need to understand requirements and availability and quality problems of you might consider. You need to know which data can be useful to your project and where you can get fiting high-quality data.
To ensure that this data is collected and used in a meaningful manner, heavy data storage capabilities have to be developed and employed. The new big data storage has to be able to handle huge datasets and cope with the constant data flow in real-time.

We present selected Open Data repositories the recent challenges and trends of this topic as well as the most important programs and initiatives. Our collection of related sources helps to delve further into the topic.

© Smart Data Forum

Examples of Open Data repositories

In addition to the various data which is created by researches and businesses themselves, they can also make use of the huge amounts of publicly available Open Data to enrich their database and gain more valuable insights.

Re3data: By offering detailed information on more than 2,000 research data repositories, re3data has become the most comprehensive source of reference for research data infrastructures globally.

public, research and corporate data / various formats

World Bank Open Data: This portal publishes sets of developement data for various countries and regions; most data under CC 4.0 Attribution.

economic data, demographics / numeric

World Economic Outlook Databases: The World Economic Outlook Databases belong to the International Monetary Fund and publish comprehensive economic outlook datasets twice a year.

economic, research data / numeric

GDELT: Collects and offers global news and social network data with georeference. In addition to raw data it offers a data analysis tool, supported by Google Jigsaw.

public, social data / machine-readable

DBpedia: A Semantic Web and database of across all languages.

public data / text and numeric

World Data System:  Promoting long-term stewardship of, and universal and equitable access to, quality-  assured scientific data and data services, products, and information across a range of disciplines in the   natural and social sciences, and the humanities.

research data / various formats

Data One: A community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of earth and environmental data.

research data / machine-readable

European Data Portal: Harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included.

public data / metadata, WMS, WFS, KML 

EU Open Data Portal: The EU ODP gives you access to open data published by EU institutions and bodies. All the data you can find via this catalogue are free to use and reuse for commercial or non-commercial purposes.

public data / metadata, TSV, SDMX-ML formats

Zenodo: Built and developed by researchers, to ensure that everyone can join in Open Science. It is a catch-all repository for EU funded research.

research data / machine-readable

GovData: Offers open data on various topics relevant for the business sector, research, administration, civil society and media.

public data / various formats

BMBF-Daten-Portal: The portal provides open data collected from BMBF sources.

public data / various formats The datasets are made available by the public administration for free public use.

public data / various formats

RADAR: Research Data Repository (Radar) is a platform for archiving research data originating from finished research projects and publications. The scope of the repository is to offer support exclusively to the research community.

research data / machine-readable

Gesundheitscloud: Non-profit Gesundheitscloud enables you to take control of your health data. Patients can upload and securely store their health data online and share it with their healthcare provider or researchers.

German research data / machine-readable, various formats

RKI-Gesundheitsmonitoring: RKI conducts continuous collection of data on the health of the population resident in Germany, mainly through surveys and medical examinations.

German research data / machine-readable, various formats

EudraCT: A European health database that collects clinical trials results from investigational medical products across the European Union.

European public data / machine-readable, various formats

OpenTrials: Linked database for all the available information, on every clinical research trial ever conducted. It is built and updated by users.

Global research data / various formats, also unstructured data

GHDx: World’s most comprehensive catalog of surveys, censuses, vital statistics, and other health-related data.

Global public, research data / various formats

Federal Health Monitoring System: Aim of the project is to improve the availability of health data in Germany. The repository is a joint service by RKI and Destatis.

German public, research data / machine-readable

DRYAD: Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication.

Global research data / various formats

Global Health Observatory: Global Health Observatory is the gateway to health-related data for more than 1000 indicators for it’s 194 member states.

Global public data / machine-readable, various formats

cBioPortal: The portal currently contains data from cancer genomics studies. It provides tools for data visualization and analysis.

Global research data / machine-readable, various formats

mCloud: Repository for Open Data from mobility sector. It serves developers, researchers and public administration, providing direct access to Open Data.

German public data / numeric

MDM: Provides data on traffic flows, congestion, road work, parking facilities and more in Germany. MDM is an interface for people in the economy, research, politics and public administration.

German public data / numeric

DB Open-Data-Portal: German national railways Deutsche Bahn publishes important amounts of data on infrastructure and mobility that is being generated by the company.

German corporate data / machine-readable, various formats

Transforming Transport: The objective of the TT is to provide the community working on transport data across the different transport domains identified for TT with open datasets that they can reuse for their own purposes, as well as links and metadata to existing datasets that cannot be pulished under an open data licence but where ad-hoc agreements may be established between the data producers and the potential data reusers.

European corporate data / various formats, metadata

opentraffic: Global data platform to process anonymous positions of vehicles and smartphones into real-time and historical traffic statistics.

Global research data / various formats, machine-readable

Uber Movement: Provides anonymized data from over two billion trips to help urban planning around the world.

Global corporate data / various formats, machine-readable

SMARD: Database with records on energy generation, consumtion and networks. It receives the data directly from the European Network of Transmission System Operators for Electricity (ENTSO-E). Only data verified by the Bundesnetzagentur is published on SMARD.

German public data / machine-readable

Open Power System Data: Free-of-charge data platform dedicated to electricity system researchers. They collect, check, process, document, and publish data that are publicly available but currently inconvenient to use.

German public, research data / numeric

OpenEI: Trusted source of energy data, specifically for renewable energy and energy efficiency. Users can view, edit, add and download data for free.

Global corporate, research data / numeric

Selected Programmes & Initiatives
  • International Data Spaces – A virtual database that supports the secured exchange of data based on standards. (BMBF)
  • SDIL Innovation Lab – An exchange and operating platform for big data applications. Its objective is to accelerate cooperation between the business sector, the public sector and the research community concerning big data and smart data technologies. (BMBF)
  • Kompetenznetzwerk Trusted Cloud – Offers a platform for knowledge exchange on cloud technologies, specifically within the framework of digital transformation of commerce. (BMWi)