This time I took the full ~1.5M SNPs and ran ADMIXTURE on two HapMap-3 European populations, CEU Utah Whites and Tuscan Italians. I sorted the individuals based on their ancestral proportions, but still the cutoff is quite obvious, and the separation is perfect.
CEU belong to the blue cluster with 93.7%, and TSI in the red one with 94.9%.
The most red Utah White has 31.4% "southern" ancestry, while the most blue Tuscan has 21.2% "northern" ancestry. Thus, the cutoff "jump" between the two populations, at the middle of the figure, is 47.4%.
The standard deviation of the "southern" component among CEU individuals is 6.9%, and the standard deviation of the "northern" component among Tuscans is 5.7%.
How real is this "admixture"?
As I have mentioned before in the blog, apparent "mixedness" between populations decreases as the number of markers increases. Thus, the question arises: is the apparent "admixture" between Tuscans and Utahns a real effect of individuals diverging toward a population other than their own, or an artefact of a limited number of markers?
We should note that increasing the number of markers has diminishing returns: most new markers are in linkage disequilibrium with existing markers, and hence provide little additional information: going from 10 to 110 markers has a huge effect, but going from 1000 to 1100 a trivial one.
To study this question I took a 1/5 random sample of the markers, or about 300K SNPs and repeated the ADMIXTURE run:
Now, CEU are 93.9% in the blue cluster (vs. 93.7% in the 1.5M run) and the variance of the red component in CEU individuals is 6.8% (vs. 6.9% in the 1.5M run).
Tuscans are 94.0% in the red cluster (vs. 94.9% in the 1.5M run) and the variance of the blue component in TSI individuals is 6.5% (vs. 5.7% in the 1.5M run).
The conclusion is obvious that the 5-fold increase in markers from 300K to 1.5M had no noticeable effect in the apparent mixedness of populations and individuals.