Admixture analyses
From ISOGG Wiki
Admixture analysis (more properly known as biogeographical ancestry analysis) is a method of inferring someone's geographical origins based on an analysis of their genetic ancestry. An admixture analysis is one of the components of an autosomal DNA test. Companies which offer such tests include 23andMe, AncestryDNA, Family Tree DNA, MyHeritage DNA and Living DNA.
Contents
- 1 Admixture calculations
- 2 DTC providers admixture analysis
- 3 Analysis projects / Admixture Calculators
- 3.1 Comparison of SNPs coverage/overlap for admixture calculators of common autosomal/X-Tests
- 3.2 Eurogenes analysis by David Wesolowski
- 3.3 Admixture analysis for Scandinavians by Anders Pålsen
- 3.4 L M Genetics by Lukasz Macuga
- 3.5 Magnus Ducatus Lituaniae Project by Verenich, Kull
- 3.6 McDonald's BGA project by Doug McDonald
- 3.7 Dodecad Ancestry Project by Dienekes Pontikos
- 4 Analysis projects: Do it yourself (DIY)
- 5 Blog posts and articles
- 6 Scientific papers
- 7 Videos
- 8 Further reading
- 9 See also
- 10 References
Admixture calculations
Admixture calculations provide genetic ancestry analysis to individuals tested for high-density single-nucleotide polymorphism (SNP) data. The different SNP extraction methods (mostly SNP-chips) need substantial overlap of extracted SNPs to allow meaningful comparisons. Admixture analysis usually builds ancestral components also called clusters by comparing a dataset of samples. Both the used datasets (regional, continental, worldwide) and the ancestral components (number, age) are very diverse depending on the used setup and analysis method. A new sample (not used in the dataset) is normally compared to the ancestral components by the calculation of the percentages. Additional tools allow also the prediction of ancestral populations. The analysis is strongly limited by the diversity and accuracy of the dataset, for example calculating an Asian individual with an Admixture tool based on an European dataset will not give meaningful results.
Accuracy and sophistication
Most calculators use a shared subset of the up to 0.7 million SNPs provided by Family Finder, AncestryDNA, 23andMe, etc. These are compared with publicly available datasets and the companies' own proprietary datasets. As can be seen from the Autosomal DNA testing comparison chart the accuracy and sophistication vary greatly and have not yet reached the quality desired for accurate genetic genealogy research. The public dbSNP (Build 137) database contains ca. 45 million human SNPs, and comprehensive whole-genome sequencing (WGS) of all human populations could substantially increase that number and allow much better calculators.[1]
DTC providers admixture analysis
Included for everyone who has been tested by the following companies. For further details see Autosomal DNA testing comparison chart
23andMe - Ancestry Composition
The Ancestry Composition feature offers a map view which displays one's ancestral components from various regions of the world as of 500 years ago, a split view for those who also have one or both parents who have been tested by 23andMe, and a breakdown by chromosome. Three settings are available: conservative, standard, and speculative. Overall accuracy is reasonably good, but predictions in Europe are still not optimal, particularly in the speculative mode. Ancestry Finder provides a breakdown of one's ancestry by country.
Family Tree DNA - Population Finder
Population Finder was the first incarnation of the admixture analysis provided with Family Tree DNA's "Family Finder" test. It was replaced by a new feature known as MyOrigins in May 2014. Population Finder used principal component analysis (PCA) to estimate biogeographical percentages of autosomal DNA. The population samples used in the analysis were continental groups (Africa, America, East Asia, Europe, Middle Eastern, Oceania, and South Asia). The analysis did not include the X-chromosome. For historical details of the test see Understanding results: Population Finder in the Internet Archive. The Population Finder analysis was relatively non-specific, particularly for people with European Ancestry.
For an explanation of the workings of Population Finder and the meaning of the Middle Eastern percentages seen in many Population Finder results see the guest blog post by Doug McDonald biogeograpical analysis.
AncestryDNA - Genetic Ethnicity
For background on the AncestryDNA Ethnicity Estimates see the AncestryDNA Ethnicity Estimates White Paper 2018.
Genographic Project - Who Am I
Since a relatively limited number of autosomal SNPs are available in the Geno 2.0 data for analysis, the biogeographical ancestry analysis is somewhat limited relative to other similar tools, particularly relative to Ancestry Composition. The two closest reference populations are given for each person who is tested. However, these predictions, particularly the second closest reference population, are frequently inaccurate.
Analysis projects / Admixture Calculators
To send in, provided by various sites, online tools or also to calculate on the own PC.
Comparison of SNPs coverage/overlap for admixture calculators of common autosomal/X-Tests
See also Autosomal SNP comparison chart and Autosomal DNA testing comparison chart
at/X Test | G25 sim | AncestryDNA | 23andMe | MyHeritage | FamilyTreeDNA | Living DNA | 1240k capture array | WGS Extract (30x) | YSEQ WGS 30x |
---|---|---|---|---|---|---|---|---|---|
Version, in use since | Nganasankhan | v2, May 2016 | v5, Aug. 2017 | v2, Nov. 2016? | v2 | v2, 2016? | Mathieson, Reich et al. 2015 | 23andMe CombinedKit 2023 | 23andMe all_hg19 2023 |
Number of autosomal/X SNPs tested/defined | Correlation | 637,639/28,892 | 630,132/16,530 | 576,157/29,694 | 612,272/16,271 | 683,503/15,028 | ~1,240,000 | 2,010,232/51,970 | 1,450,113/41,984 |
Eurogenes G25 ~300,000 | best in 2017[2] | ||||||||
LM Genetics K47 76,267 | 0.9990 | 67,703; 89% | 21,477; 28% | 28,960; 38% | 27,015; 35% | 76,149; 99.9% | 73,550; 96.4% | ||
MDLP K27 118,536 | 0.9990 | 107,753; 91% | 33,708; 28% | 47,824; 40% | 44,513; 38% | 118,349; 99.8% | 114,457; 96.6% | ||
HarappaWorld K16 188,173 | 0.9988 | 171,503; 91% | 52,910; 28% | 73,065; 39% | 68,610; 36% | 187,890; 99.9% | 187,597; 99.7% | ||
Eurogenes K36 165,688 | 0.9981 | 155,228; 94% | 52,532; 32% | 72,407; 44% | 68,158; 41% | 165,401; 99.8% | 165,110; 99.7% | ||
Dodecad Globe13 166,255 | 0.9982 | 152,175; 92% | 47,411; 29% | 66,026; 40% | 62,231; 37% | 166,014; 99.9% | 165,758; 99.7% | ||
Eurogenes K13 182,705 | 0.9971 | 172,294; 94% | 59,077; 32% | 78,222; 43% | 73,805; 40% | 182,402; 99.8% | 182,109; 99.7% |
Based on single results, either from Gedmatch or Admixture Studio. Please edit/expand missing values or send them to ChrisR et al
Eurogenes analysis by David Wesolowski
David does free analysis of raw data files from both 23andMe and FTDNA's Family Finder using the programs ADMIXTURE, BEAGLE, PLINK and ADMIXMAP. Results are distributed as Excel spreadsheets and as .png files. See http://eurogenes.blogspot.com/ and http://www.bga101.blogspot.com for background. Also see http://www.23andme.com/you/community/thread/5182. Information on how to interpret the results may be found at archive of http://bga101.blogspot.com/2010/10/brief-guide-to-output-youre-seeing.html. If you are interested in participating in his project contact him at .
Anders does a free analysis of admixture for people of Scandinavian ancestry who have been tested by 23andMe. Participants must have their primary ancestry from Norway, Sweden or Finland. The raw 23andMe data files are analyzed using the program ADMIXTURE and the ancestry is presented in a STRUCTURE like graph. For additional background see http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-10/12863480 59 and http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-11/12891438 80. If you are interested in participating in his project contact him at .
L M Genetics by Lukasz Macuga
Lukasz provides a detailed report based on the Eurogenes K36 calculator on GEDmatch. The report includes a correlation map of your ancestral regions, population estimations and ancestry statistics in the form multidimensional plots and dendrograms. For further details see the L M Genetics website
Magnus Ducatus Lituaniae Project by Verenich, Kull
A biogeographical analysis project for the territories of the former Grand Duchy of Lithuania. Admin: Vadim Verenich Co-admin: Leon Kull. See the Magnus Ducatus Lituaniae Project blog for further details.
McDonald's BGA project by Doug McDonald
Doug McDonald does two types of free tests. One is like 23andme's "Advanced Global Similarity", except that he does more "dimensions". For people with ancestry outside Europe four of these are shown. For pure Europeans his world graph is essentially identical to 23andMe's so instead he shows a European graph, which includes (at lower right) the Adygei, a tribe living on the eastern shores of the Black Sea. The higher dimensions do not give additional information for pure Europeans so they are not shown. The results are sent to participants on graphs as .png files. Doug also does quantitative tests. These come in three flavors, first without South Asia (represented by Pakistan) and the Middle East, second with South Asia, and finally with all three, as comparison panels. See the ISOGG Wiki page on McDonald's BGA project for the qualifying criteria.
Dodecad Ancestry Project by Dienekes Pontikos
See http://dodecad.blogspot.com for details. Also see the summary written on November 7, 2010 on his anthropology blog. This analysis is currently closed to participants, but Dienekes says that he "may or may not process data from relatives, or non-target groups that was already sent to me and that was not assigned a DOD number." Contact him directly at to see if he might be willing to accept your data at some future point in time.
Analysis projects: Do it yourself (DIY)
GEDmatch online admixture applications
This free online service was created by John Olson and Curtis Rogers under www.gedmatch.com. The big data sizes to transfer and heavy usage sometimes leads to server problems; donations are welcome to help funding the service. Various admixture (ethnicity or deep ancestry) tools are included:
- Dodecad
- Eurogenes
- 4-Ancestors Oracle December 2012
- MDLP
- World 22 showcase, component maps World22, September 2012
- etc.
DIYDodecad
Dienekes Pontikus published the Do-It-Yourself Dodecad tool free of charge for non-commercial use. DIYDodecad can do admixture analysis on Windows or Linux 32bit/64bit machines. The analysis is carried out based on calculator files and appropriately standardized autosomal SNP raw data. There is an interesting admixture calculator which gives percentages for the different population clusters.
Versions
- v1.0 July 2011: Dodecad v3 calculator included, Dodecad Oracle possible
- v2.0 August 2011: new features including by-chromosome and by-segment ancestry analysis, etc.
- v2.1 September 2011: allows incomplete genotype files to be used and not only the Illumina platforms
Standardize raw data
To convert your data from the company-specific format to a common format the R software is required, which can be downloaded and installed from http://www.r-project.org/. Follow the instructions in the DIYDodecad readme.txt
- Geno 2.0 patch: new standardize.r and hgdp.base.txt, November 2012
Calculator files
Different calculator files from various projects are published regularly. Numbers in the calculator file usually describe the number of population clusters. You should look at their blogs for new versions:
- Dodecad Project: Admixture, Oracle
- globe13, globe13 participant results, globe13 files, globe10, globe 10 files, October 2012
- weac2 (West Eurasian cline) - weac2 files, K10a - K10a files, June 2012
- K7b, K12b, Oracle K12b - K7b files, K12b files, Oracle K12b file, January 2012
- K12a, world9, Oracle K12a, Euro-DNA, Eurasia7, Africa9, weac, BAT, Euro7, Oracle v1, 2011
- Eurogenes Project:
- K36 and K35 deep regional ancestry, K36 Download: K36 files and K35 (no South Chinese cluster) files, March 2013
- Jtest K14 - Ashkenazi ancestry, Jtest K14 files, EUtest K13 files, all calc files, September 2012
- EUtest K9, K10, K11, K12, K12b, K13 May 2012
- K=14, Eurasian K=10, 2011
- MDLP Project:
- Oracle_AdMix4 for World-22
- K=5 to K=15 April 2012
- K=7 2011
SPatial Ancestry analysis (SPA)
Method for predicting ancestry or where an individual is from.
- SPA homepage cs.ucla.edu Version 1.13 April 2013, Eurogenes review November 2012 and March 2012
- Eurogenes SPA "model" files November 2012
SnpMap
Little program to view SNP data, and see how the data compares to other populations and regions of the world.
- SnpMap version 1.0.4, June 2011
ADMIXTURE and PLINK
Razib Khan has provided tutorials for users who wish to perform DIY analyses on their autosomal DNA results using the software programs ADMIXTURE and PLINK:
- Eurasia ADMIXTURE supervised and unsupervised, 16 March 2011
- Analyzing ancestry with ADMIXTURE step by step, 14 March 2011
- Using your 23andMe data in PLINK, 7 January 2013
- Using your 23andMe data: exploring with MDS, 8 January 2013.
Blog posts and articles
- Kampourakis, K. Who do you think you are? Genetics and identity. OUP blog, 20 March 2024.
- Booth T. Am I related to a Viking? The reliability of genetic ancestry testing. DigIt (Undated. Accessed 5 August 2022).
- Introduction to ethnicity admixture by Paul Woodbury, LegacyTree Genealogists blog, June 2022.
- What do DNA test results mean? by Debbie Kennett, Who Do You Think You Are? Magazine, May 2020.
- What is ancestry? by Joe Pickrell, The Gencove Blog, 18 January 2018.
- Mohammed is Palestinian. Why does 23andMe think he's Egyptian? by Jacklin Kwan, Wired, 15 September 2021.
- Ancestry DNA: You may not be from where you think you are! by Xcode. Dec 2017
- Why do percentage estimates of “ancestry” vary so much? by Razib Khan. Gene Expression, 29 August 2017
- Ancestry inference won't tell you things you don't care about but could by Razib Khan, Gene Expression, 23 March 2017.
- Ancestry inference is precise and accurateish by Razib Khan, Gene Expression, 23 March 2017.
- Six siblings Part 1 Origins by Israel Pickholtz, All My Foreparents, 2 February 2017. A comparison of FTDNA MyOrigins results for six siblings from an Ashkenazi Jewish family.
- How to look at population structure by Razib Khan. GeneExpression Blog, 3 October 2016.
- Those percentages, if you must by Judy G Russell, The Legal Genealogist, 14 August 2016.
- Ancestry ethnicity estimate by Elizabeth Kipp, English Research from Canada, 1 August 2016.
- Exporing ethnity with DNA. Part II: autosomal DNA testing by Paul Woodbury, Legacy Tree Genealogists, 6 May 2016.
- Those percentages revisited by Judy G Russell, The Legal Genealogist, 1 May 2016.
- Results may vary - one family's DNA ethnicity percentages by Diahan Southard, Lisa Louise Cooke's Genealogy Gems, 28 February 2016.
- Making the best of what's not so good by Judy G Russell, The Legal Genealogist, 22 February 2015.
- Understanding Patterns of Inheritance: Where Did My DNA Come From? (And Why It Matters) by Anna Swayne, Ancestry blog, 5 March 2014. Includes results for four siblings tested at AncestryDNA.
- Racing to the wrong conclusion Genealogy for the Everyman blog, 9 February 2013. The article provides a good summary of the problem of assigning arbitrary labels to "races".
- Understanding correlations and debunking misconceptions in DNA genealogy by Steve Handy. DNA Genealogical Experiences and Tutorials blog, 29 May 2013.
- Ethnicity results - true or not? by Roberta Estes, DNAeXplained, 4 October 2013.
- Understanding BGA testing DNA Genealogical Experiences and Tutorials blog, 3 November 2012.
Scientific papers
- Mathieson I, Scally A (2020). What is ancestry? PLoS Genetics 16(3): e1008624. An essay discussing the meaning of the term ancestry.
- Schraiber JG, Akey JM (2015). Methods and models for unravelling human evolutionary history. Nature Reviews Genetics 16: 727–740. An excellent review article summarising the different methods used to make inferences about populations from genetic data.
Videos
Ancestry reimagined: dismantling the myth of genetic ethnicities by Kostas Kampourakis:
What can DNA tests really tell us about our ancestry? A short tutorial from Prosanta Chakrabarty:
Ethnicity percentages demystified. A lecture given by Debbie Kennett at Family Tree Live in April 2019:
Further reading
- Wikipedia article on human genetic clustering
- What our DNA can tell us about the history of humans by Leo Speidel and Clare Bycroft. Frontiers for Young Minds, 10 September 2020.
- DNA testing can bring families together, but gives mixed answers on ethnicity by Tina Hesman Saey, Science News, 13 June 2018
- Roots: origin stories by Paul Jones. Canada's History Magazine, October/November 2016.
- How long ago did African ancestry enter my family tree? by Henry Louis Gates Jr and Kasia Bryc. The Root, 10 July 2015.
- The rise of the genome bloggers Nature, 15 December 2010, 468, pp 880-881.
See also
References
- ↑ Figure 1 Venn diagram, Francioli et al 2014, Whole-genome sequence variation, population structure and demographic history of the Dutch population, Nature Genetics, http://dx.doi.org/10.1038/ng.3021
- ↑ https://eurogenes.blogspot.com/2017/10/genetic-ancestry-online-store-to-be.html