ALIEN: Abstractions, Languages, and Implementation Techniques That Cross the Program/Query Divide
Software Engineering and Programming Languages
Final Report Abstract
During the last four decades, database query languages (QLs) and programming languages (PLs) have been designed, developed, and studied in largely disconnected threads of research. This has left a lot of opportunities for the integration of both strands untapped: query languages are specific, often functional, programming languages; programming languages are used to analyze, transform, or maintain (and thus: query) collections of data. Both fields have repeatedly witnessed—oftentimes mediocre—reinventions of insights and techniques that have long been established as folklore on the other side of the language fence. The research project A LIEN has been defined to take peeks over that fence and to explore opportunities to adopt PL-inspired problem solutions on the QL side and vice versa. It was expected that plain adoption would not be sufficient: design goals (e.g., scalability, usability, interpretation vs. compilation) and system context (e.g., runtime systems, availability of secondary storage, persistence, concurrency requirements) may considerably vary for PLs and QLs. This called for the adaptation and further development of established ideas. Applying PL insights and techniques on the QL side. A main driving force behind Alien has been the desire to perform complex computation close to the data and thus inside the database system which holds the data instances. Complex computational tasks deviate from the vanilla database query and thus called for new approaches to query plan generation. To this end, we adopted techniques of modular compilation that assembles complex plans from pieces and admits local plan changes that do not require plan regeneration. We also learned that it pays off to carefully consider expression compilation, e.g., through the seamless integration of just-in-time (JIT) code generation in database systems that otherwise depend on plan interpretation—expressions are widely considered second-class citizens in query planning and execution, leaving considerable performance improvements on the floor. User-defined functions (PL/SQL UDFs, in particular) admit arbitrary computation inside a database kernel. With a compilation pipeline that relies on single static assignment and administrative normal forms (SSA/ANF), and trampoline-style functions, we devised a translation from PL/SQL to plain recursive SQL that enables more database systems to provide Turing-complete UDFs and do so without any overhead that comes with PL/SQL interpretation. Adaptation of PL ideas indeed went so far as to reverse the established order of compilation stages (here: from an imperative form towards recursion). Pursuing the inverse direction, we translated declarative SQL queries into simple imperative programs which were then subject to Weiser-style program slicing. This paved the road towards fine-granular data provenance analysis for a class of very complex queries that was considered out of range for established approaches to provenance derivation. Applying QL insights and techniques on the PL side. The first fruitful idea that we explored in this domain was the idea of incremental view maintenance and how it could be applied to PLs. The main challenge, compared to the database setting, was how to deal with arbitrary computation, including recursion, and how to deal with first-class functions. We have developed an influential calculus of incremental computation, which forms the basis of several implementations of incremental view maintenance. Another field of inspiration was the strong emphasis on data modeling in the DB field, which we looked at through the lens of categorical duality to identify a match between data and data-consuming queries on one hand and codata and codata-producing coqueries on the other hand. Finally, the declarative and purely functional reading of database queries has been the motivation for exploring the use of effect handlers, which allow a mix of declarative queries as known from databases and fine-grained control mechanisms as required for many programming problems.
Publications
-
Precision Performance Surgery for PostgreSQL—LLVM-Based Expression Compilation, Just in Time. In Proc. of the VLDB Endowment (PVLDB), volume 9, New Delhi, India, 2016
D. Butterstein and T. Grust.
-
Dualizing generalized algebraic data types by matrix transposition. In Amal Ahmed, editor, Programming Languages and Systems - 27th European Symposium on Programming, ESOP, volume 10801 of Lecture Notes in Computer Science, pages 60–85. Springer, 2018
Klaus Ostermann and Julian Jabs
-
Effect handlers for the masses. Proc. ACM Program. Lang., 2(OOPSLA):111:1–111:27, 2018
Jonathan Immanuel Brachthäuser, Philipp Schuster, and Klaus Ostermann
-
How How Explains What What Computes—How- Provenance for SQL and Query Compilers. In 10th USENIX Workshop on Theory and Practica of Provenance (TaPP), London, UK, 2018
D. O’Grady, T. Müller, and T. Grust
-
Incremental λ-calculus in cache-transfer style. In Luís Caires, editor, Programming Languages and Systems (ESOP), pages 553–580. Springer International Publishing, 2019
Paolo G. Giarrusso, Yann Régis-Gianas, and Philipp Schuster
-
PgCuckoo—Laying Plan Eggs Into PostgreSQL’s Nest. In Proc. of the ACM SIGMOD Int’l Conference on Management of Data (SIGMOD), Amsterdam, The Netherlands, 2019
D. Hirn and T. Grust
-
Compiling PL/SQL Away. In Proc. of the Conference on Innovative Data Systems Research (CIDR), Amsterdam, The Netherlands, 2020
C. Duta, D. Hirn, and T. Grust
-
Decomposition diversity with symmetric data and codata. Proc. ACM Program. Lang., 4(POPL):30:1–30:28, 2020
David Binder, Julian Jabs, Ingo Skupin, and Klaus Ostermann
-
Effekt: Capability-passing style for type- and effect-safe, extensible effect handlers in scala. J. Funct. Program., 30:e8, 2020
Jonathan Immanuel Brachthäuser, Philipp Schuster, and Klaus Ostermann