Efficient Enumeration of Path Query Results for Graph Databases
Final Report Abstract
The main objective of the research project was the development and theoretical investigation of efficient evaluation algorithms (including enumeration) for path queries for graph databases. We were also interested in other data models and their respective query classes. The project’s results can be structured into three main branches. Graph databases and path queries: With respect to graph databases, a special focus of the project was on query languages that use regular expressions (and variants thereof) as their main navigational feature. In this regard, we provide an in-depth analysis of how the well-known concept of capture groups (also called backreferences) of practical regular expression engines can be added to the regular expression-based navigational features of practically relevant path query languages. In addition, we analysed the classical regular path queries (a core-feature of academical and commercial graph database systems) in the modern complexity framework of fine-grained complexity. Information Extraction and Document Spanners: Document spanners – a current hot-topic in database theory – extract tuples of intervals (which are called spans) from text documents. We formulated a class of document spanners that, in terms of expressive power, covers a large portion of the rather powerful yet mostly intractable class of core spanners, while at the same time exhibit evaluation complexities much closer to the well-behaved class of regular spanners. Moreover, we designed algorithms for evaluating (including enumerating) spanners on documents that are compressed by straight-line programs (SLPs) – a grammar-based compressions scheme that is famous in different areas of theoretical computer science, while also having many practical applications. The running times of our algorithms only depend linearly on the compressed size of the documents, which, in the best case, might be exponentially smaller compared to the uncompressed data. Due to the high compressibility of textual data in practical settings and due to the fact that many algorithms for SLP-compression exist, our approach is of practical relevance. Moreover, we were able to extend our algorithms to the dynamic setting, where we can perform updates to our compressed textual data without the need for decompression. Complex Event Processing and Subsequence Queries: The processing and analysis of traces of events that are generated on runtime by large software systems is a rather recent topic of both applied as well as theoretical research on databases. Based on classical results on the inductive inference of Angluin-style pattern languages, we formulated a class of subsequence-based queries for event traces, and provided an algorithm that infers the common structure of a given set of such event traces. This work also inspired new algorithmic challenges that are of independent interest within the field of string algorithms that we were able to partly solve.
Publications
-
Conjunctive Regular Path Queries with String Variables. Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 361-374. ACM.
Schmid, Markus L.
-
Spanner Evaluation over SLP-Compressed Documents. Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 153-165. ACM.
Schmid, Markus L. & Schweikardt, Nicole
-
Conjunctive Regular Path Queries with Capture Groups. ACM Transactions on Database Systems, 47(2), 1-52.
Schmid, Markus L.
-
Discovering event queries from traces: Laying foundations for subsequence-queries with wildcards and gap-size constraints. In 25th International Conference on Database Theory, ICDT 2022, March 29 to April 1, 2022, Edinburgh, UK (Virtual Conference), pages 18:1–18:21, 2022.
Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt & Matthias Weidlich
-
Document Spanners - A Brief Overview of Concepts, Results, and Recent Developments. Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 139-150. ACM.
Schmid, Markus L. & Schweikardt, Nicole
-
Query Evaluation over SLP-Represented Document Databases with Complex Document Editing. Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 79-89. ACM.
Schmid, Markus L. & Schweikardt, Nicole
-
Subsequences with gap constraints: Complexity bounds for matching and analysis problems. In 33rd International Symposium on Algorithms and Computation, ISAAC 2022, December 19-21, 2022, Seoul, Korea, pages 64:1–64:18, 2022.
Joel D. Day, Maria Kosche, Florin Manea & Markus L. Schmid
-
Discovering multi-dimensional subsequence queries from traces - from theory to prac tice. In Datenbanksysteme für Business, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme” (DBIS), 06.-10, März 2023, Dresden, Germany, Proceedings, pages 511–533, 2023.
Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt & Matthias Weidlich
-
Fine-Grained Complexity of Regular Path Queries. Logical Methods in Computer Science, Volume 19, Issue 4.
Casel, Katrin & Schmid, Markus L.
-
Refl-Spanners: A Purely Regular Approach to Non-Regular Core Spanners. Logical Methods in Computer Science, Volume 20, Issue 4.
Schmid, Markus L. & Schweikardt, Nicole
