Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.
Ontology highlight
ABSTRACT: Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10(-3)) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.
SUBMITTER: Srivastava AK
PROVIDER: S-EPMC4150763 | biostudies-literature | 2014 Sep
REPOSITORIES: biostudies-literature
ACCESS DATA