GraphQueryML: Using Machine Learning to Optimize Queries in Graph Databases
Final Report Abstract
The goal of this project was to build on and extend the current state of the art in learned query optimization to enable the application of machine learning to query optimization in graph database systems. In database systems research, query optimization is one of the major open problems. As relational database queries grow in complexity and graph database systems gain importance, many approaches emerged that study the application of machine learning to database query optimization. Conceptually these approaches can roughly be categorized into three groups. The first group of approaches use machine learning to estimate the result cardinalities of a query and its subqueries. These learned cardinalities are then injected into a traditional query optimizer. The second group of approaches learn to predict good execution plans directly and thus replace the traditional optimizer. Finally, approaches that fall into the third group operate entirely outside the traditional optimizer by learning hint sets for queries that will steer the optimizer to an efficient plan. In this project, we mainly contributed to the first group of approaches, i.e., learned cardinality estimation. Before venturing into graph database systems, we first conducted several subprojects in the setting of relational database systems, which are still more wide-spread and standardized benchmarks are more readily available to evaluate and demonstrate contributions. Observing that most state-of-the-art learned cardinality estimation technique require an impractically large amount of training data, we studied how geometric learning can be applied to obtain a more sample efficient learned cardinality estimator. As a precursor to regular path queries in graph database system, we investigated the related problem of LIKE-predicate cardinality estimation in SQL. We contributed a novel estimation technique based on a novel language model that significantly outperforms the state of the art in terms of accuracy. Moving on to graph database systems, we studied how sequence-based learning can be applied to cardinality estimation for path queries and proposed a new technique with lower overall q-error than current state-of-the-art techniques. Finally, the project also investigated the use of quantum computing for cardinality estimation and proposed a quantum neural network approach to estimate the cardinality of join queries. Apart from these contributions to the first group of learned query optimization techniques, the project also contributed a new end-to-end benchmarking framework to evaluate approaches of the second and third group. All of the contributions of this project have resulted in publications that were well-received and published in the top A*-ranked conferences of the database systems research community.
Publications
-
Sample-Efficient Cardinality Estimation Using Geometric Deep Learning. Proceedings of the VLDB Endowment, 17(4), 740-752.
Reiner, Silvan & Grossniklaus, Michael
-
Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective. Proceedings of the VLDB Endowment, 17(7), 1565-1577.
Lehmann, Claude; Sulimov, Pavel & Stockinger, Kurt
-
LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries. Proceedings of the ACM on Management of Data, 2(1), 1-25.
Aytimur, Mehmet; Reiner, Silvan; Wörteler, Leonard; Chondrogiannis, Theodoros & Grossniklaus, Michael
-
QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries. Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications, 2-13. ACM.
Kittelmann, Florian; Sulimov, Pavel & Stockinger, Kurt
-
GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints. Proceedings of the ACM on Management of Data, 3(4), 1-25.
Sulimov, Pavel; Lehmann, Claude & Stockinger, Kurt
-
SPACE: Cardinality Estimation for Path Queries Using Cardinality-Aware Sequence-based Learning. Proceedings of the ACM on Management of Data, 3(3), 1-26.
Aytimur, Mehmet; Chondrogiannis, Theodoros & Grossniklaus, Michael
