Unknown

Dataset Information

0

Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.


ABSTRACT: Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10(-3)) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.

SUBMITTER: Srivastava AK 

PROVIDER: S-EPMC4150763 | biostudies-literature | 2014 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.

Srivastava Amit Kumar AK   Chopra Rupali R   Ali Shafat S   Aggarwal Shweta S   Vig Lovekesh L   Bamezai Rameshwar Nath Koul RN  

Nucleic acids research 20140716 15


Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we  ...[more]

Similar Datasets

| S-EPMC6887481 | biostudies-literature
| S-EPMC1620014 | biostudies-literature
| S-EPMC10471895 | biostudies-literature
2022-11-19 | E-MTAB-8173 | biostudies-arrayexpress
| S-EPMC2686685 | biostudies-literature
| S-EPMC7335086 | biostudies-literature
2008-08-30 | GSE12627 | GEO
| S-EPMC4989243 | biostudies-literature
| S-EPMC9878829 | biostudies-literature
| S-EPMC5751574 | biostudies-literature