Parameterized algorithmics for bioinformatics
Zusammenfassung der Projektergebnisse
The goal of the PABI project was to better understand the source of computational hardness of problems in bioinformatics. The approach was parameterized complexity, which examines whether the seemingly unavoidable explosion in running time can be confined to problem-specific parameters. If these parameters are not too large in real-world instances, we can then solve them exactly with strong guarantees on the running time, whereas classic approaches give either non-optimal results or have no useful running time guarantees. In the course of the project, we tackled a large number of problems from bioinformatics, including reconstruction of phylogenetic trees, data clustering, analysis of biological networks, and analysis of genomic data. For many of these, we developed parameterized algorithms, sometimes for several parameters for a problem. As we demonstrated by experiments with real-world data, these algorithms can solve many practically relevant instances optimally within reasonable time bounds. All of our software is publicly available and can be employed by practitioners. One particular tool we employed is data reduction. The idea is to remove redundant parts of the input before starting the actual solving process. Strong data reduction is particularly useful, since it can be combined with any other approach, be it exact, approximative, or heuristic. Parameterized complexity allows to analyze the power of data reduction rules, and thereby guides the development. We found in our experiments that often data reduction actually is much more effective than predicted by these worst-case bounds. One way of explaining this is to look for more parameters that describe the structure of real-world instances. While in the beginning of parameterized complexity studies typically only the “most natural” parameter was considered, the trend also during our project was towards a more differentiated analysis taking several parameters into account, also leading to a “multivariate complexity analysis”. The most common way of solving combinatorially hard problems to optimality in practice is probably integer linear programming (ILP); therefore, we used it as a comparison point in several studies. It turned out that even though lacking guarantees on the running time, these approaches were often surprisingly fast. Therefore, for software intended for real-world use, it might make sense to use hybrid approaches, in particular combinations of data reduction (as developed using the tools of parameterized complexity analysis and kernelization) and ILPs.
Projektbezogene Publikationen (Auswahl)
-
Fixed-parameter algorithms for cluster vertex deletion. Theory of Computing Systems, 47(1): 196–217, 2010
F. Hüffner, C. Komusiewicz, H. Moser, and R. Niedermeier
-
Separator-based data reduction for signed graph balancing. Journal of Combinatorial Optimization, 20(4): 335–360, 2010
F. Hüffner, N. Betzler, and R. Niedermeier
-
Exploiting bounded signal flow for graph orientation based on cause–effect pairs. Algorithms for Molecular Biology, 6(1):1–21, 2011
B. Dorn, F. Hüffner, D. Krüger, R. Niedermeier, and J. Uhlmann
-
Parameterized algorithmics for finding connected motifs in biological networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5):1296–1308, 2011
N. Betzler, R. van Bevern, C. Komusiewicz, M. R. Fellows, and R. Niedermeier
-
A cubic-vertex kernel for flip-consensus tree. Algorithmica, 2012. Online available
C. Komusiewicz and J. Uhlmann
-
Exact combinatorial algorithms and experiments for finding maximum k-plexes. Journal of Combinatorial Optimization, 24(3): 347–373, 2012
H. Moser, R. Niedermeier, and M. Sorge
-
Partitioning into colorful components by minimum edge deletions. In Proceedings of the 23rd Annual Symposium on Combinatorial Pattern Matching (CPM ’12), volume 7354 of LNCS, pages 56–69. Springer, 2012
S. Bruckner, F. Hüffner, C. Komusiewicz, R. Niedermeier, S. Thiel, and J. Uhlmann
-
Partitioning biological networks into highly connected clusters with maximum edge coverage. In Proceedings of the 9th International Symposium on Bioinformatics Research and Applications (ISBRA 13), volume 7875 of LNCS, pages 99– 111. Springer, 2013
F. Hüffner, C. Komusiewicz, A. Liebtrau, and R. Niedermeier