Wednesday, November 14, 2012

MULTIMIX

Introduction:
(link)

MULTIMIX is a program that fits a statistical model to linked genome-wide SNP data from admixed individuals to learn about their ancestral origins. It is applied to a dense set of biallelic SNPs and will estimate the ancestral population of origin at each SNP - the local ancestry.

To infer the population of origin at each locus along the admixed chromosome MULTIMIX uses panels of either phased haplotypes or unphased genotypes sampled from each of the candidate ancestral population. The MULTIMIX model is applicable to any number of ancestral source populations, as well as any combination of phased or unphased admixed samples and source panel data.

The user has a choice of three statistical methods with which to implement the model : MCMC sampling, the EM algorithm or the Classification-EM (CEM) algorithm. Each of these methods makes inference on the local ancestry at windows of SNPs meaning that the sites of switches in ancestry are restricted to occur only at the boundaries between these windows. To refine the estimate on the location of switches, an additional step to resolve the boundaries between chunks of differing ancestry can be implemented.

The paper:

Claire Churchhouse, Jonathan Marchini
Genetic Epidemiology
7 Nov 2012
DOI: 10.1002/gepi.21692
(link)

Abstract:  "We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model—Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported."


Related paper:

A Mathematical Theory of Communication
Claude Shannon
The Bell System Technical Journal
Vol. XXVII, No. 3, July 1948
(link)

No comments:

Post a Comment

Comments have temporarily been turned off. Because I currently have a heavy workload, I do not feel that I can do an acceptable job as moderator. Thanks for your understanding.

Note: Only a member of this blog may post a comment.