Project Details
Projekt Print View

Computational and mathematical methods for population genetics analysis of multi-locus data under selection and strong recombination

Subject Area Mathematics
Term from 2010 to 2013
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 191584514
 
Final Report Year 2013

Final Report Abstract

In this project, I intended to apply and extend the conditional sampling distribution (CSD) developed by my host and his Ph.D. student. The CSD describes the distribution of an additionally sampled DNA sequence, given that a certain set of sequences has already been observed. This quantity can be used in importance sampling schemes or composite likelihood frameworks to approximate the probability of observing a set of sequences under a given population genetic model. My host and his student developed a CSD that approximates the true distribution more accurately then previously developed models. I planned to apply their CSD to the phasing of haplotypes and the imputation of missing sequence data via importance sampling on incomplete histories. However, due to its recursive nature, the computational complexity of the CSD prohibits analyzing large datasets. Thus, I first focussed on deriving a more efficient approximation that can be implemented as a hidden Markov model (HMM). To handle structured populations, I further introduced the exchange of migrants between subpopulations into the model. Subsequently, I extended the CSD to more general demographic scenarios, where populations sizes and migration rates can change over time, and subpopulations originate from splits in ancestral populations. I demonstrated the applicability of this CSD in a composite likelihood approach to infer ancient demographic parameters from DNA sequence data. Furthermore, I planned to study the dynamics of beneficial genetic material in a population. I intended to explore, empirically and via simulation, the epistatic interaction between two advantageous alleles at different loci, and in particular, the impact on statistical measures of correlation across loci. These investigations were hindered by the fact that most existing efficient simulation tools only allow for a very limited range of selection models, whereas tools using more general models are very timeconsuming. In order to develop more flexible simulation methods, I derived an algorithm to explicitly compute the transition density function (TDF) governing the evolution of the population allele frequencies in a single locus di-allelic diffusion model under general diploid selection. I extended this approach further to derive the TDF for a diffusion with an arbitrary number of alleles under a parent-independent mutation model and general selection. Recently, inferring the strength of selection from temporal allele frequency data, that is, samples obtained from a population at different points in time, has received a lot of attention. Because of the high interest and the availability of temporal datasets, I decided to develop an efficient dynamic program, using the TDF, to compute likelihoods from temporal data, thus enabling this type of inference. Moreover, I proposed to investigate the asymptotic sampling formula (ASF) developed by my host and his postdoc. The ASF approximates the probability of observing a certain set of two-locus haplotypes when the probability of a recombination event between the loci is large. I planned to derive a process that approximates the genealogy of such samples when recombination is strong. Though theoretically appealing, the ASF is only applicable to two-locus haplotypes sampled from a single panmictic population. Modern population genetic datasets commonly consist of full genomic or exomic sequence data. Furthermore, there is a strong interest in unravelling the demographic history of contemporary populations, as well as, inferring selection from temporal allele frequency data. Due to these reasons, and the steady progress in the research programs described in the previous two paragraphs, I decided to focus on those projects.

Publications

  • (2012) A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol.
    Steinrücken, M., Paul, J. S., and Song, Y. S.
    (See online at https://doi.org/10.1016/j.tpb.2012.08.004)
  • (2012) A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics, 190(3), 1117–1129
    Song, Y. S. and Steinrücken, M.
  • (2013) An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection. Theor. Popul. Biol., 83, 1–14
    Steinrücken, M., Wang, Y. X. R., and Song, Y. S.
  • (2013) Analysis of DNA sequence variation within marine species using Beta-coalescents. Theor. Popul. Biol.
    Steinrücken, M., Birkner, M. and Blath, J.
    (See online at https://doi.org/10.1016/j.tpb.2013.01.007)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung