Project Details
WhatsHap: Software to take Genome Research and Clinical Diagnostics to the Haplotype Level
Applicant
Professor Dr. Tobias Marschall
Subject Area
Bioinformatics and Theoretical Biology
Epidemiology and Medical Biometry/Statistics
Epidemiology and Medical Biometry/Statistics
Term
from 2018 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 391137747
The rapid advances in genome sequencing technology are currently transforming biomedical research and health care. The genomes of any two patients are distinct, leading to differences in clinically relevant traits such as susceptibility to disease or intolerance to specific drugs. Personalized medicine aims to exploit such genomic information for improving diagnosis and therapy allocation. That is, an individual's genomic makeup is taken into account for clinical decision making. Genome sequencing machines produce big data, which are analyzed by complex data processing pipelines. Obviously, software solutions used for clinical purposes need to meet highest quality standards. This requirement, however, is in stark contrast to many current bioinformatics tools that remain in a research prototype stadium, often due to a lack of dedicated software engineers.The emergence of second-generation sequencing technologies from 2006 onwards has enabled genome sequencing at manageable costs, with prizes for sequencing a single human genome dropping from hundreds of millions of dollars in 2001 to less than one thousand dollars today. This massive decrease in prices has enabled an equally massive increase in data volume which has, by far, outpaced Moore's law.Today, we are witnessing the emergence of third-generation sequencing technologies. These technologies enable a paradigm shift in the field of genomics: while many studies have focussed on genotype data in the past, we are now in a position to reconstruct human haplotypes. The availability of haplotype information is critically important for a large number of analyses, in particular for studies of human ancestry, migration and admixture, but also for understanding and diagnosing haplotype-specific clinical conditions. Therefore, third-generation sequencing technologies are first impacting genomic and clinical research and will subsequently become part of routine clinical diagnostics and care. In order to turn these opportunities into reality, sustainably maintained production-quality software to process these data is a prerequisite.Our WhatsHap software provides an excellent starting point for this. It is already in use in multiple large-scale projects, is rapidly gaining attention of life sciences researchers as well as of vendors of sequencing machines, and is provided as high-quality open source software. It has hence already attained the status of a demonstrator. The swiftly expanding sector of genome sequencing and precision medicine provides an outstanding environment to raise funds for a non-profit foundation to sustainably maintain WhatsHap after the funding period. Although a long history of foundations that support open source projects exists, this model is -- in our view -- under-explored for academic software. With this project, we aim to increase awareness for this approach and will document our experiences in a public report.
DFG Programme
Research data and software (Scientific Library Services and Information Systems)