Detailseite
Projekt Druckansicht

Mathematische und Computerorientierte Methoden zur Populationsgenetischen Analyse von Multi-Locus-Daten unter Selektion und starker Rekombination

Fachliche Zuordnung Mathematik
Förderung Förderung von 2010 bis 2013
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 191584514
 
Erstellungsjahr 2013

Zusammenfassung der Projektergebnisse

In this project, I intended to apply and extend the conditional sampling distribution (CSD) developed by my host and his Ph.D. student. The CSD describes the distribution of an additionally sampled DNA sequence, given that a certain set of sequences has already been observed. This quantity can be used in importance sampling schemes or composite likelihood frameworks to approximate the probability of observing a set of sequences under a given population genetic model. My host and his student developed a CSD that approximates the true distribution more accurately then previously developed models. I planned to apply their CSD to the phasing of haplotypes and the imputation of missing sequence data via importance sampling on incomplete histories. However, due to its recursive nature, the computational complexity of the CSD prohibits analyzing large datasets. Thus, I first focussed on deriving a more efficient approximation that can be implemented as a hidden Markov model (HMM). To handle structured populations, I further introduced the exchange of migrants between subpopulations into the model. Subsequently, I extended the CSD to more general demographic scenarios, where populations sizes and migration rates can change over time, and subpopulations originate from splits in ancestral populations. I demonstrated the applicability of this CSD in a composite likelihood approach to infer ancient demographic parameters from DNA sequence data. Furthermore, I planned to study the dynamics of beneficial genetic material in a population. I intended to explore, empirically and via simulation, the epistatic interaction between two advantageous alleles at different loci, and in particular, the impact on statistical measures of correlation across loci. These investigations were hindered by the fact that most existing efficient simulation tools only allow for a very limited range of selection models, whereas tools using more general models are very timeconsuming. In order to develop more flexible simulation methods, I derived an algorithm to explicitly compute the transition density function (TDF) governing the evolution of the population allele frequencies in a single locus di-allelic diffusion model under general diploid selection. I extended this approach further to derive the TDF for a diffusion with an arbitrary number of alleles under a parent-independent mutation model and general selection. Recently, inferring the strength of selection from temporal allele frequency data, that is, samples obtained from a population at different points in time, has received a lot of attention. Because of the high interest and the availability of temporal datasets, I decided to develop an efficient dynamic program, using the TDF, to compute likelihoods from temporal data, thus enabling this type of inference. Moreover, I proposed to investigate the asymptotic sampling formula (ASF) developed by my host and his postdoc. The ASF approximates the probability of observing a certain set of two-locus haplotypes when the probability of a recombination event between the loci is large. I planned to derive a process that approximates the genealogy of such samples when recombination is strong. Though theoretically appealing, the ASF is only applicable to two-locus haplotypes sampled from a single panmictic population. Modern population genetic datasets commonly consist of full genomic or exomic sequence data. Furthermore, there is a strong interest in unravelling the demographic history of contemporary populations, as well as, inferring selection from temporal allele frequency data. Due to these reasons, and the steady progress in the research programs described in the previous two paragraphs, I decided to focus on those projects.

Projektbezogene Publikationen (Auswahl)

  • (2012) A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol.
    Steinrücken, M., Paul, J. S., and Song, Y. S.
    (Siehe online unter https://doi.org/10.1016/j.tpb.2012.08.004)
  • (2012) A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics, 190(3), 1117–1129
    Song, Y. S. and Steinrücken, M.
  • (2013) An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection. Theor. Popul. Biol., 83, 1–14
    Steinrücken, M., Wang, Y. X. R., and Song, Y. S.
  • (2013) Analysis of DNA sequence variation within marine species using Beta-coalescents. Theor. Popul. Biol.
    Steinrücken, M., Birkner, M. and Blath, J.
    (Siehe online unter https://doi.org/10.1016/j.tpb.2013.01.007)
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung