Project Details
Projekt Print View

Practical and Parallel Text Compression for Highly Repetitive Data

Subject Area Theoretical Computer Science
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 501086801
 
We want to develop practical algorithms for compressing highly repetitive data that overcome the shortcomings of currently common compressors such as gzip or bzip2. These have been established in the 1990s and targeted hardware that was standard in those days; their main disadvantage is that they do not capture repetitions of substrings that are far apart. The first goal is to design and engineer a compression tool that does also benefit from such long range repetitions, but still has only moderate memory requirements.As a second goal, we want to exploit the shared-memory parallelism present in virtually any CPU in order to speed up compression, without losing too much compression ratio. Here, we want to have a broader look at compression algorithms, and in particular include grammar compressors which offer excellent opportunitie for parallelization.In the ideal case, both ideas to make better use of modern resources will be integrated into production-ready software repositories (like Linux distributions) so that end consumers can benefit easily from our algorithm engineering efforts.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung