Project Details
Accounting for spatial heterogeneity of parameters in the sequentially Markov coalescent process
Applicant
Julien Dutheil, Ph.D.
Subject Area
Mathematics
Term
from 2015 to 2019
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 285412928
The sequentially Markov coalescent (SMC) is an approximation of the coalescent process with recombination enabling its application to whole genome data sets. The SMC model differs from the standard coalescent as it models the genealogy of a set of sequences spatially along the alignment rather than chronologically. In addition, the process of genealogy change along the genome is Markovian, allowing the use of hidden Markov models for inference of population genomic parameters. While the SMC models the coalescent in space, current models so far assume homogeneity of parameters along the genome. This assumption is clearly at odds with our knowledge of the biology of genomes, as mutation rate, recombination rate and effective population size are highly heterogeneous. SMC models have also been exclusively applied to higher eukaryotic species, essentially Primates. These species have very large genomes, for which the parameter heterogeneity is rather diluted. With next-generation sequencing data becoming increasingly affordable, population genomic data sets are being generated for species with smaller, more compact genomes. For these data sets, parameter heterogeneity can be much more extreme than for primate genomes. Such species include economically important fungal pathogens, which cannot be analyzed with current, over-simplistic models. In this project we propose an extension of current SMC models to account for stochastic processes along the genome. The spatial heterogeneity is modeled as a Markov process, which, when combined with the intrinsic Markov property of the SMC, results in a Markov-modulated sequentially Markov model. The project will establish the formal properties of such Markov-modulated SMC (MMSMC) analytically and using simulation procedures. Biological applications are proposed for both primate and fungal data sets.
DFG Programme
Priority Programmes
Subproject of
SPP 1590:
Probabilistic Structures in Evolution
International Connection
Denmark
Cooperation Partners
Professor Asger Hobolth, Ph.D.; Professorin Eva Holtgrewe-Stukenbrock, Ph.D.