Unknown

Dataset Information

0

Estimating copy numbers of alleles from population-scale high-throughput sequencing data.


ABSTRACT:

Background

With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci.

Results

We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring.

Conclusions

Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.

SUBMITTER: Mimori T 

PROVIDER: S-EPMC4331703 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Estimating copy numbers of alleles from population-scale high-throughput sequencing data.

Mimori Takahiro T   Nariai Naoki N   Kojima Kaname K   Sato Yukuto Y   Kawai Yosuke Y   Yamaguchi-Kabata Yumi Y   Nagasaki Masao M  

BMC bioinformatics 20150121


<h4>Background</h4>With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci.<h4>Results</h4>We propose a novel approach to infer copy unit alleles a  ...[more]

Similar Datasets

| S-EPMC7205152 | biostudies-literature
| S-EPMC5909048 | biostudies-other
| S-EPMC6250075 | biostudies-other
| S-EPMC3105418 | biostudies-literature
| S-EPMC4117933 | biostudies-literature
| S-EPMC3077050 | biostudies-literature
| S-EPMC4345584 | biostudies-literature
| S-EPMC3083417 | biostudies-literature
| S-EPMC4125401 | biostudies-literature
| S-EPMC3284879 | biostudies-other