Project Details
Projekt Print View

BigSIoT: Big Data Management for the Semantic Internet of Things

Subject Area Security and Dependability, Operating-, Communication- and Distributed Systems
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 422053062
 
Final Report Year 2023

Final Report Abstract

The project aimed to investigate distributed data storage and processing in the semantic web and Internet-of-Things (IoT) environments. In IoT, many heterogeneous systems have to interact with each other. We proposed using a semantic web DBMS to improve interoperability because it can handle arbitrary schema-less data sources. Therefore we implemented the DBMS LUPOSDATE3000 in Kotlin because the language allows us to compile a single code base for several targets like the java virtual machine for high performance competing with other DBMS in benchmarks and Javascript for running LUPOS- DATE3000 completely in the browser. Indeed, the LUPOSDATE3000 is designed for distributed settings and simplifying further extensions, and is intended to be run on any device and operating system fitting best to the desired IoT environment, where we expect much heterogeneity between and in all components. We started to develop a local strategy to use multiple partitioning strategies simultaneously, such that the optimizer at runtime has more options to choose the best partitioning for a given query. These options can further boost the advantages of merge joins besides the standard sorted input. Furthermore, this removes the otherwise necessary partitioning thread with time-consuming locking. We proposed a new network simulator called SIMORA, which allows the application to access routing protocol information, which would otherwise be hidden in the protocol. With this simulator, the application can apply enhanced communication strategies to reduce the overall network traffic. Our DBMS used the topology information retrieved from the routing protocol to optimize the join order such that the path of the data within the network is reduced. For this, the database does not need a separate topology view by just reusing the information available in the routing protocol on any device. For the benchmarks of the DBMS, we proposed a new benchmark scenario. We needed a new benchmark scenario because, in state of the art, the benchmarks send their data simultaneously to one DBMS instance, instead of a distributed environment. This new distributed data insertion allows the database to keep the data at its natural instance without much traffic overhead at the beginning. We did some experiments to show that different data distribution schemes are suitable for different kinds of queries. Besides standard optimization techniques, we developed a machine learning approach to optimize the join order of SPARQL queries typically consisting of a considerable number of join operators. In our proposed solution, the memory consumption scales quadratically to the number of joins per query. Additionally, we were able to use the costs of the network traffic in the rewardfunction in order to minimize network traffic.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung