Evaluation of the performance of commonly applied global ancestry algorithms in complex spatial demographic scenarios
Visualitza/Obre
Autor/a
Altres autors/es
Data de publicació
2016-09Resum
The development of new methods for inferring ancestral origins in human populations has atracted a
renewed interest for human population geneticists for better understanding recent human
evolutonary history or for correcting the presence of hidden population substructure in genome-wide
association studies (GWAS). The algorithms for detecting population substructure present several
problems such as the dependency on the assumptions of the algorithm, the type and number of
considered DNA markers, the underlying demographic relationship among the considered populations
and the sample size of the target populations.
With this concern in mind, we have constructed an experimental model for testing the performance of
currently algorithms applied for estimating population substructure which starts by designing two ideal
prototypes of spatially structured populations (2D stepping stone and anisotropic). From each model
we have generated a pool of 78 experimental datasets, simulating the genomic molecular diversity
with Fastsimcoal2 under various migration rate conditions, performing the sampling of individuals and
populations and selecting different filtering strategies: Minor Allele Frequency (MAF) and Linkage
Disequilibrium (LD). Those 78 datasets (plink bed files) have been processed to evaluate the response
of commonly applied algorithms to SNP data for quantifying individual population substructure:
Principal Components Analysis (smartPCA), Multidimensional Scaling (MDS-PLINK), Spatial Ancestry
Analysis (SPA), ADMIXTURE and SNMF. For those algorithms in which the output is a coordinate (PCA,
MDS and SPA), we have evaluated the correlation (via Mantel and Procrustes tests) of these estimated
coordinates with the geographic sampling coordinates of individuals in our original ideal artifacts. For
ADMIXTURE and SNMF we have applied different algorithms for assessing the best K number of
ancestries and we have applied CLUMPP sotware to compare their output matrices.
This ideal prototype has enabled us to establish the robustness of the five algorithms, identify best
performing algorithms and determine the impact of the conditions imposed on the results of these
programs.
Tipus de document
Treball fi de màster
Versió del document
Director/a: Oscar Lao
Llengua
Anglès
Paraules clau
Algorismes
Genètica de poblacions humanes
Pàgines
83 p.
Nota
Curs 2015-2016
Aquest element apareix en la col·lecció o col·leccions següent(s)
Drets
Aquest document està subjecte a aquesta llicència Creative Commons
Excepte que s'indiqui una altra cosa, la llicència de l'ítem es descriu com http://creativecommons.org/licenses/by-nc-nd/3.0/es/