Project Details
Projekt Print View

GraphQueryML: Using Machine Learning to Optimize Queries in Graph Databases

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term from 2020 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 441617860
 
Final Report Year 2025

Final Report Abstract

The goal of this project was to build on and extend the current state of the art in learned query optimization to enable the application of machine learning to query optimization in graph database systems. In database systems research, query optimization is one of the major open problems. As relational database queries grow in complexity and graph database systems gain importance, many approaches emerged that study the application of machine learning to database query optimization. Conceptually these approaches can roughly be categorized into three groups. The first group of approaches use machine learning to estimate the result cardinalities of a query and its subqueries. These learned cardinalities are then injected into a traditional query optimizer. The second group of approaches learn to predict good execution plans directly and thus replace the traditional optimizer. Finally, approaches that fall into the third group operate entirely outside the traditional optimizer by learning hint sets for queries that will steer the optimizer to an efficient plan. In this project, we mainly contributed to the first group of approaches, i.e., learned cardinality estimation. Before venturing into graph database systems, we first conducted several subprojects in the setting of relational database systems, which are still more wide-spread and standardized benchmarks are more readily available to evaluate and demonstrate contributions. Observing that most state-of-the-art learned cardinality estimation technique require an impractically large amount of training data, we studied how geometric learning can be applied to obtain a more sample efficient learned cardinality estimator. As a precursor to regular path queries in graph database system, we investigated the related problem of LIKE-predicate cardinality estimation in SQL. We contributed a novel estimation technique based on a novel language model that significantly outperforms the state of the art in terms of accuracy. Moving on to graph database systems, we studied how sequence-based learning can be applied to cardinality estimation for path queries and proposed a new technique with lower overall q-error than current state-of-the-art techniques. Finally, the project also investigated the use of quantum computing for cardinality estimation and proposed a quantum neural network approach to estimate the cardinality of join queries. Apart from these contributions to the first group of learned query optimization techniques, the project also contributed a new end-to-end benchmarking framework to evaluate approaches of the second and third group. All of the contributions of this project have resulted in publications that were well-received and published in the top A*-ranked conferences of the database systems research community.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung