Dataset Information

Population clustering based on copy number variations detected from next generation sequencing data.

ABSTRACT: Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

SUBMITTER: Duan J

PROVIDER: S-EPMC4504183 | biostudies-literature | 2014 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Population clustering based on copy number variations detected from next generation sequencing data.

Duan Junbo J Zhang Ji-Gang JG Wan Mingxi M Deng Hong-Wen HW Wang Yu-Ping YP

Journal of bioinformatics and computational biology 20140819 4

Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed int ...[more]

PMID: 25152046

Dataset Information

Population clustering based on copy number variations detected from next generation sequencing data.

Publications

Population clustering based on copy number variations detected from next generation sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Detection of common copy number variation with application to population clustering from next generation sequencing data.
| S-EPMC4154475 | biostudies-literature

MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data.
| S-EPMC7243272 | biostudies-literature

A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data.
| S-EPMC7838601 | biostudies-literature

Detection of copy number variations based on a local distance using next-generation sequencing data.
| S-EPMC10556732 | biostudies-literature

HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data.
| S-EPMC8215577 | biostudies-literature

A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data.
| S-EPMC8273656 | biostudies-literature

SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data.
| S-EPMC5335817 | biostudies-literature

A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data.
| S-EPMC7058119 | biostudies-literature

Next-generation sequencing identifies recurrent copy number variations in invasive breast carcinomas from Ghana.
| S-EPMC7390688 | biostudies-literature

Assessing copy number alterations in targeted, amplicon-based next-generation sequencing data.
| S-EPMC5707205 | biostudies-literature