Unknown

Dataset Information

0

Estimating copy numbers of alleles from population-scale high-throughput sequencing data.


ABSTRACT: BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ? 0.9 for data with mean coverage ? 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.

SUBMITTER: Mimori T 

PROVIDER: S-EPMC4331703 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Estimating copy numbers of alleles from population-scale high-throughput sequencing data.

Mimori Takahiro T   Nariai Naoki N   Kojima Kaname K   Sato Yukuto Y   Kawai Yosuke Y   Yamaguchi-Kabata Yumi Y   Nagasaki Masao M  

BMC bioinformatics 20150121


<h4>Background</h4>With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci.<h4>Results</h4>We propose a novel approach to infer copy unit alleles a  ...[more]

Similar Datasets

| S-EPMC7205152 | biostudies-literature
| S-EPMC5909048 | biostudies-other
| S-EPMC6250075 | biostudies-other
| S-EPMC3105418 | biostudies-literature
| S-EPMC4117933 | biostudies-literature
| S-EPMC3077050 | biostudies-literature
| S-EPMC4345584 | biostudies-literature
| S-EPMC3083417 | biostudies-literature
| S-EPMC4125401 | biostudies-literature
| S-EPMC2911117 | biostudies-literature