Project Details
Metaserve: A Platform for Application-driven Data Profiling
Applicant
Professor Dr. Thorsten Papenbrock
Subject Area
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 560957958
The Metaserve project aims to develop an innovative data profiling approach that is based on a novel, declarative data profiling query language. Data profiling describes the activity of extracting implicit metadata, such as schema descriptions, data types, and various kinds of data dependencies, from a given dataset. Such structural metadata statements are important for many data-intensive applications, such as data integration, data cleaning, machine learning, and query optimization. State-of-the-art data profiling algorithms are, however, difficult to deploy; they also discover too many and, in most cases, unsuitable metadata statements within still unreasonably long profiling times. For this reason, we propose a novel, declarative data profiling query language (DPQL) in combination with an effective data profiling engine to discover both simple and complex metadata structures based on the concrete needs of applications and users. More specifically, we study the design and mathematical foundation for a query language that allows the formulation of concrete metadata structures, such as foreign-key relationships, schema normal form violations, data cleaning rules, circular dependencies, or feature correlations. Then, we design and develop a holistic execution engine that processes these data profiling queries with needs-based results, higher performance, and better accessibility than existing data profiling solution. Metaserve aims to make data profiling more practically accessible to modern applications ranging from database systems over various data engineering tools to data analytics and machine learning workflows. The Metaserve project builds upon the popular, open-source data profiling framework Metanome and intends to feed into its development.
DFG Programme
Research Grants
