Query- and Specification Languages for Graph- and Grid-Structured Data
Final Report Abstract
The present project investigated foundations of graph- and grid-structured data. In terms of grid-structured data (aka tabular data or csv-like data), we have shown that the proposal of schema for tabular data of the World Wide Web Consortium can be significantly extended while still allowing efficient validation and static analysis algorithms. Furthermore, we have built a validation engine for this more extended fragment. Since some of these goals had already been reached during the review phase of the project proposal, we continued our work on grid-structured data in the framework of document spanners, which was the basis of a proposal by Arenas et al. (PVLDB 2016) for handling csv-like data. We obtained numerous fundamental results on speeding up the evaluation of document spanners by parallelization, and on how to extend the document spanner framework with weights and aggregation. The core of the present project was on evaluation of navigational queries on graphs. Here, we produced a number of theoretical papers on the efficiency of evaluating modern graph database queries, i.e., under simple paths, trail, and shortest paths semantics, as will be implemented in GQL and SQL/PGQ. We note that two of these modes (simple paths and trail) were already possible in Cypher. We could go significantly beyond the goals of the project by collaborating in several papers around the ISO standardization of query languages for property graphs. We feel that especially this work shows that the project achieved its goal of bringing theory and practice closer together. Furthermore, one of the papers in this part of the project won a prestigious SIGMOD Research Highlight Award. In terms of static analysis of navigational queries on graphs, we have provided a detailed overview of the complexity of containment of CRPQs (the core of pattern matching queries in today’s systems). We focused in practically relevant fragments of CRPQs, guided on our analysis of real-world queries. Our work shows a detailed overview of when containment is expected to be feasible in practice. Finally, we have built a number of practical tools that connect theory with practice. Our tool for validating csv-like data was already finished during the review phase of the project. The other important tool that we built was for analyzing huge amounts of SPARQL queries. We managed to analyze logs consisting of around 800 million SPARQL queries using the tool, which gave us invaluable insights for the fundamental work packages (B,C) in the project. The work packages (A,B,C,E) were therefore completed successfully.
Publications
-
Bridging Theory and Practice with Query Log Analysis. SIG- MOD Rec. 48(1): 6-13 (2019)
Wim Martens & Tina Trautner
-
Dichotomies for Evaluating Simple Regular Path Queries. ACM Transactions on Database Systems, 44(4), 1-46.
Martens, Wim & Trautner, Tina
-
Split-Correctness in Information Extraction. Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 149-163. ACM.
Doleschal, Johannes; Kimelfeld, Benny; Martens, Wim; Nahshon, Yoav & Neven, Frank
-
Containment of Simple Conjunctive Regular Path Queries. Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning, 371-380. International Joint Conferences on Artificial Intelligence Organization.
Figueira, Diego; Godbole, Adwait; Krishna, S.; Martens, Wim; Niewerth, Matthias & Trautner, Tina
-
SHARQL: Shape Analysis of Recursive SPARQL Queries. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2701-2704. ACM.
Bonifati, Angela; Martens, Wim & Timm, Thomas
-
PG-Keys: Keys for Property Graphs. Proceedings of the 2021 International Conference on Management of Data, 2423-2436. ACM.
Angles, Renzo; Bonifati, Angela; Dumbrava, Stefania; Fletcher, George; Hare, Keith W.; Hidders, Jan; Lee, Victor E.; Li, Bei; Libkin, Leonid; Martens, Wim; Murlak, Filip; Perryman, Josh; Savković, Ognjen; Schmidt, Michael; Sequeda, Juan; Staworko, Slawek & Tomaszuk, Dominik
-
Graph Pattern Matching in GQL and SQL/PGQ. Proceedings of the 2022 International Conference on Management of Data, 2246-2258. ACM.
Deutsch, Alin; Francis, Nadime; Green, Alastair; Hare, Keith; Li, Bei; Libkin, Leonid; Lindaaker, Tobias; Marsault, Victor; Martens, Wim; Michels, Jan; Murlak, Filip; Plantikow, Stefan; Selmer, Petra; van Rest, Oskar; Voigt, Hannes; Vrgoč, Domagoj; Wu, Mingxi & Zemke, Fred
-
Towards Theory for Real-World Data. Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 261-276. ACM.
Martens, Wim
-
Weight Annotation in Information Extraction. Logical Methods in Computer Science, Volume 18, Issue 1.
Doleschal, Johannes; Kimelfeld, Benny; Martens, Wim & Peterfreund, Liat
-
A Trichotomy for Regular Trail Queries. Logical Methods in Computer Science, Volume 19, Issue 4.
Martens, Wim; Niewerth, Matthias & Popp, Tina
