Recent state-of-the-art approaches and technologies for generating RDF graphs from non-RDF data, use languages designed for specifying transformations or mappings to data of various kinds of format. This work is a new approach for the generation of ontology-annotated RDF graphs, linking data from multiple heterogeneous streaming and archival data sources, with high throughput and low latency. To support this, and in contrast to existing approaches, we propose embedding in the RDF generation process a close-to-sources data processing and linkage stage, supporting the fast template-driven generation of triples in a subsequent stage. This approach, called RDF-Gen, has been implemented as a SPARQL-based RDF generation approach. RDF-Gen is evaluated against the latest related work of RML and SPARQL-Generate, using real world datasets.
Three different data sets, for typical or large volumes of data varying between 100 and 100,000 entries.
- An artificial dataset of Persons, generated by GenerateData.com, mapping 8 properties
- A real-life archival dataset of aircrafts (compiled from FlightRadar24.com), mapping 9 properties
- Aircraft surveillance streaming data, mapping 5 properties
Figures 1,2,3 present the achieved throughput of RDF-Gen for each of the data sets, varying their size.
Figure 1: Achieved throughput on “Persons” data set
Figure 2: Achieved throughput on “Aircrafts” data set
Figure 3: Achieved throughput on “Surveillance” data set
Georgios M. Santipantakis, Konstantinos I. Kotis, George A. Vouros, Christos Doulkeridis. RDF-Gen: Generating RDF from Streaming and Archival Data
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative
Commons, PO Box 1866, Mountain View, CA 94042, USA.
(c) AI-Group/UNIVERSITY OF PIRAEUS RESEARCH CENTER (UPRC)