November 17, 2012
Populations histories with a diffusion process formulation
Mol Biol Evol (2012) doi: 10.1093/molbev/mss257
Inferring population histories using genome-wide allele frequency data
Mathieu Gautier and Renaud Vitalis
The recent development of high throughput genotyping technologies has revolutionized the collection of data in a wide range of both model and non-model species. These data generally contain huge amounts of information about the past demographic history of populations.
In this study we introduce a new method to estimate divergence times on a diffusion time-scale from large SNP datasets, conditionally on a population history which is represented as a tree. We further assume that all the observed polymorphisms originate from the most ancestral (root) population, i.e. we neglect mutations that occur after the split of the most ancestral population. This method relies on a hierarchical-Bayesian model, based on Kimura's time-dependent diffusion approximation of genetic drift. We implemented a Metropolis–Hastings within Gibbs sampler to estimate the posterior distribution of the parameters of interest in this model, which we refer to as the Kimura model. Evaluating the Kimura model on simulated population histories, we found that it provides accurate estimates of divergence time. Assessing model fit using the deviance information criterion (DIC) proved efficient for retrieving the correct tree topology among a set of competing histories. We show that this procedure is robust to low-to-moderate gene flow, as well as to ascertainment bias, providing that the most distantly related populations are represented in the discovery panel. As an illustrative example, we finally analyzed published human data consisting in genotypes for 452,198 SNPs from individuals belonging to four populations worldwide.
Our results suggest that the Kimura model may be helpful to characterize the demographic history of dierentiated populations, using genome-wide allele frequency data.