Querschnitte: XML und formale Sprachen - Theorie und Praxis
Final Report Abstract
The Extensible Markup Language (XML) is the standard data format for exchanging data on the World Wide Web. The loose structure of XML is in strong contrast with the rigid structure of relational databases and therefore, the data format XML has posed a wealth of new challenges in the data management community. Since the foundations of XML are based on formal languages (regular expressions, finite automata, regular tree languages, etc.), the deeper research questions in the area are often closely tied to foundational work in formal language theory. This project had two core goals: we want to strengthen the connections between XML research and formal language research and we want to strengthen the ties between theory and practice. These ties between theory and practice should be applicable both in XML research and in formal language research. Both of these goals have been achieved. We have made substantial contributions to formal language theory by solving problems that were motivated by XML research. Examples of such problems were questions about determinism in regular expressions. We solved several openstanding problems in this area, such as the complexity of determining if a regular language is definable by a deterministic regular expression (as used in DTD), and the question if a language is definable by a deterministic regular expression with counters (as used in XML Schema). Another major contribution to formal language theory was the proof that regular languages can be separated by piecewise testable languages in polynomial time. This was the first (nontrivial) tractable separability result in the literature. In the area of incremental query evaluation, we proved that answers to MSO queries on tree-structured data can be enumerated in logarithmic delay, using linear preprocessing time. We further made substantial contributions to database theory by solving the two major open problems concerning tree pattern optimization: the question whether minimality equals nonredundancy, and the computational complexity of the problem. We have also substantially contributed to strengthening the ties between theory and practice. We have developed, from the ground up, a new schema language for XML data, called bonXai. This development spurred theoretical questions but also led to an implementation of a freely available tool for development and validation of bonXai schemas. The tool can be used not only to develop schemas but also to analyse and better understand XML Schema definitions.
Publications
-
Efficient Separability of Regular Languages by Subsequences and Suffixes. In International Conference on Automata, Languages, and Programming (ICALP), pages 150–161, 2013
Wojciech Czerwinski, Wim Martens, and Tomas Masopust
-
Simplifying XML Schema: Single-type approximations of regular tree languages. J. Comput. Syst. Sci., 79(6): 910–936, 2013
Wouter Gelade, Tomasz Idziaszek, Wim Martens, Frank Neven, Jan Paredaens
-
MSO queries on trees: enumerating answers under updates. In Joint Meeting of the Conference on Computer Science Logic (CSL) and the ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 67:1–67:10, 2014
Katja Losemann, Wim Martens
-
Definability by Weakly Deterministic Regular Expressions with Counters is Decidable. In Mathematical Foundations of Computer Science (MFCS), pages 369–381, 2015
Markus Latte, Matthias Niewerth
-
Closure properties and descriptional complexity of deterministic regular expressions. Theor. Comput. Sci. 627: 54–70, 2016
Katja Losemann, Wim Martens, Matthias Niewerth
-
Querying Graphs with Data. J. ACM, 63(2), 14:1–14:53, 2016
Leonid Libkin, Wim Martens, and Domagoj Vrgoč
-
BonXai: Combining the Simplicity of DTD with the Expressiveness of XML Schema. ACM Trans. Database Syst., 42(3): 15:1–15:42, 2017
Wim Martens, Frank Neven, Matthias Niewerth, Thomas Schwentick
-
Deciding definability by deterministic regular expressions. J. Comput. Syst. Sci. 88: 75–89, 2017
Wojciech Czerwinski, Claire David, Katja Losemann, Wim Martens
-
Minimization of Tree Patterns. J. ACM 65(4):26:1–26:46, 2018
Wojciech Czerwinski, Wim Martens, Matthias Niewerth, Pawel Parys
-
Reasoning about XML Constraints based on XML-to-relational mappings. Theory Comput. Syst. 62(8): 1826–1879, 2018
Matthias Niewerth, Thomas Schwentick