BigSIoT: Big Data Management for the Semantic Internet of Things
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Final Report Abstract
The project aimed to investigate distributed data storage and processing in the semantic web and Internet-of-Things (IoT) environments. In IoT, many heterogeneous systems have to interact with each other. We proposed using a semantic web DBMS to improve interoperability because it can handle arbitrary schema-less data sources. Therefore we implemented the DBMS LUPOSDATE3000 in Kotlin because the language allows us to compile a single code base for several targets like the java virtual machine for high performance competing with other DBMS in benchmarks and Javascript for running LUPOS- DATE3000 completely in the browser. Indeed, the LUPOSDATE3000 is designed for distributed settings and simplifying further extensions, and is intended to be run on any device and operating system fitting best to the desired IoT environment, where we expect much heterogeneity between and in all components. We started to develop a local strategy to use multiple partitioning strategies simultaneously, such that the optimizer at runtime has more options to choose the best partitioning for a given query. These options can further boost the advantages of merge joins besides the standard sorted input. Furthermore, this removes the otherwise necessary partitioning thread with time-consuming locking. We proposed a new network simulator called SIMORA, which allows the application to access routing protocol information, which would otherwise be hidden in the protocol. With this simulator, the application can apply enhanced communication strategies to reduce the overall network traffic. Our DBMS used the topology information retrieved from the routing protocol to optimize the join order such that the path of the data within the network is reduced. For this, the database does not need a separate topology view by just reusing the information available in the routing protocol on any device. For the benchmarks of the DBMS, we proposed a new benchmark scenario. We needed a new benchmark scenario because, in state of the art, the benchmarks send their data simultaneously to one DBMS instance, instead of a distributed environment. This new distributed data insertion allows the database to keep the data at its natural instance without much traffic overhead at the beginning. We did some experiments to show that different data distribution schemes are suitable for different kinds of queries. Besides standard optimization techniques, we developed a machine learning approach to optimize the join order of SPARQL queries typically consisting of a considerable number of join operators. In our proposed solution, the memory consumption scales quadratically to the number of joins per query. Additionally, we were able to use the costs of the network traffic in the rewardfunction in order to minimize network traffic.
Publications
-
Generating Sound from the Processing in Semantic Web Databases”. In: Open Journal of Semantic Web (OJSW) 8.1 (2021), pp. 1–27. issn: 2199-336X.
Sven Groppe; Rico Klinckenberg & Benjamin Warnke
-
Sound of databases. Proceedings of the VLDB Endowment, 14(12), 2695–2698.
Groppe, Sven; Klinckenberg, Rico & Warnke, Benjamin
-
“Flexible data partitioning schemes for parallel merge joins in semantic web queries”. In: Datenbanksysteme fur Business, Technologie und Web (BTW), 19. Fachtagung des GIFachbereichs ”Datenbanken und Informationssysteme”, Dresden, Germany. Ed. by Kai-Uwe Sattler, Melanie Herschel, and Wolfgang Lehner. LNI. Gesellschaft für Informatik, Bonn, 2021, pp. 237–256
Benjamin Warnke et al.
-
A SPARQL benchmark for distributed databases in IoT environments. Proceedings of the International Workshop on Big Data in Emergent Distributed Environments, 1-6. ACM.
Warnke, Benjamin; Mantler, Johann; Groppe, Sven; Sehgelmeble, Yuri Cotrado & Fischer, Stefan
-
SIMORA: SIMulating Open Routing protocols for Application interoperability on edge devices. 2022 IEEE 6th International Conference on Fog and Edge Computing (ICFEC), 42-49. IEEE.
Warnke, Benjamin; Sehgelmeble, Yuri Cotrado; Mantler, Johann; Groppe, Sven & Fischer, Stefan
-
Distributed SPARQL queries in collaboration with the routing protocol. International Database Engineered Applications Symposium Conference, 99-106. ACM.
Warnke, Benjamin; Fischer, Stefan & Groppe, Sven
-
Using Machine Learning and Routing Protocols for Optimizing Distributed SPARQL Queries in Collaboration. Computers, 12(10), 210.
Warnke, Benjamin; Fischer, Stefan & Groppe, Sven
-
“Data Partitioning and Query Optimization in the Semantic Internet of Things”. Dissertation, Universitat zu Lubeck, 2023.
Benjamin Warnke
