Project Details
Projekt Print View

Quantitative Aspects of Grammar-Based Compression

Subject Area Theoretical Computer Science
Term from 2014 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 261105198
 
Final Report Year 2021

Final Report Abstract

In the QUANT-KOMP project we made progress on several intriguing problems in grammar-based compression. For grammar-based string compression, our main achievements are the following: • We solved several open problems from the seminal paper of Charikar et al. concerning the approximation ratio of grammar-based string compressors. For LZ78 and BiSection we precisely determined the approximation ratio. For the global grammar-based string compressors RePair and Greedy we improved the existing lower bounds on the approximation ratio. For our conference paper, we received the best paper award at SPIRE 2016. The journal version was published at IEEE Transactions on Information Theory. • We solved a long standing open problem concerning the depth of string straight-line programs by showing that every string-straight line program of size n that produces a string of length N can be transformed (in linear time) into a string straight-line program of size O(n) and depth O(log N ). In fact, a more general version of this result that covers also grammarbased tree compression can be shown. The result has many applications for algorithms on grammar-based strings and trees. The major part of our results were obtained in the area of grammar-based string compression. • A classical special case of grammar-based tree compression is compression with DAGs (directed acyclic graphs). For a large variety of different classes of random tree models we proved upper and lower bounds on the average size of the DAG. These results generalize older results of Devroye, Flajolet, Gourdon, Martinez, Sipala, and Steyaert. We used these results in order to obtain upper bounds of o(n) on the average case redundancy for a large class of random tree models. • We also studied intensively grammar-based tree compression using tree/forest straight-line programs. We proved an upper bound of O(n/ log n) on the size of a smallest tree/forest straight-line program for a ranked/unranked tree. Using this bound we developed a grammarbased tree compressor that achieves worst-case redundancy o(n) for several natural random tree models. • Based on tree coverings we came up with a tree compressor that achieves all redundancy bounds that are achieved by the tree compressors from the two previous points. This tree compressor has the advantage that it allows to execute a large variety of query operations on compressed trees in constant time. • Finally, we defined a new version of tree entropy, compared it to other notions of tree entropy and proved that in terms of compression ratio this tree entropy can be achieved by grammarbased tree compressors.

Publications

  • Approximation of smallest linear tree grammars, Information and Computation 251, pp. 215–251, 2016
    Artur Jeż and Markus Lohrey
    (See online at https://doi.org/10.1016/j.ic.2016.09.007)
  • Tree compression using string grammars, Algorithmica 80(3), pp. 885–917, 2018 (special issue for LATIN 2016)
    Moses Ganardi, Danny Hucke, Markus Lohrey and Eric Nöth
    (See online at https://doi.org/10.1007/s00453-017-0279-3)
  • Constructing small tree grammars and small circuits for formulas, Journal of Computer and System Sciences 86, pp. 136–158, 2017
    Danny Hucke, Artur Jeż , Moses Ganardi, Markus Lohrey and Eric Nöth
    (See online at https://doi.org/10.1016/j.jcss.2016.12.007)
  • A universal tree balancing theorem, ACM Transactions on Computation Theory 11(1), Article No. 1, 2018
    Moses Ganardi and Markus Lohrey
    (See online at https://doi.org/10.1145/3278158)
  • Universal tree source coding using grammar-based compression, IEEE Transactions on Information Theory 65(10), pp. 6399–6413, 2019
    Moses Ganardi, Danny Hucke, Markus Lohrey and Louisa Seelbach
    (See online at https://doi.org/10.1109/TIT.2019.2919829)
  • Grammar-based compression of unranked trees, Theory of Computing Systems 61(1), pp. 141–176, 2020 (special issue for CSR 2018)
    Adria Gascon, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh and Kurt Sieber
    (See online at https://doi.org/10.1007/s00224-019-09942-y)
  • On the collection of fringe subtrees in random binary trees, Proceedings of LATIN 2020, LNCS 12118, pp. 546–558, Springer 2020
    Louisa Seelbach and Stephan Wagner
    (See online at https://doi.org/10.1007/978-3-030-61792-9_43)
  • The smallest grammar problem revisited, IEEE Transactions on Information Theory 67(1), pp. 317–328, 2020
    Hideo Bannai, Momoko Hirayama, Danny Hucke, Shunsuke Inenaga, Artur Jeż, Markus Lohrey and Carl Philipp Reh
    (See online at https://doi.org/10.1109/TIT.2020.3038147)
  • Balancing straight-line programs, Journal of the ACM 68(4), Article No. 27, pp. 1–40
    Moses Ganardi, Artur Jeż , Markus Lohrey
    (See online at https://doi.org/10.1145/3457389)
  • Hypersuccinct trees – new universal tree source codes for optimal compressed tree data structures and range minima, ESA 2021
    Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner and Sebastian Wild
    (See online at https://doi.org/10.4230/LIPIcs.ESA.2021.70)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung