Unknown

Dataset Information

0

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis.


ABSTRACT: Copy number variants (CNVs) are alternations of DNA of a genome that results in the cell having a less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in a long linear sequence of data with an unspecified noise distribution. We propose a computationally efficient method that provides a robust and near-optimal solution for segment identification over a wide range of noise distributions. We theoretically quantify the conditions for detecting the segment signals and show that the method near-optimally estimates the signal segments whenever it is possible to detect their existence. Simulation studies are carried out to demonstrate the efficiency of the method under different noise distributions. We present results from a CNV analysis of a HapMap Yoruban sample to further illustrate the theory and the methods.

SUBMITTER: Cai TT 

PROVIDER: S-EPMC3563068 | biostudies-literature | 2012 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis.

Cai T Tony TT   Jeng X Jessie XJ   Li Hongzhe H  

Journal of the Royal Statistical Society. Series B, Statistical methodology 20121101 5


Copy number variants (CNVs) are alternations of DNA of a genome that results in the cell having a less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in a long linear sequence of data with an unspecified  ...[more]

Similar Datasets

| S-EPMC4834947 | biostudies-literature
| S-EPMC4219371 | biostudies-literature
| S-EPMC6587877 | biostudies-literature
| S-EPMC6553498 | biostudies-literature
| S-EPMC9068763 | biostudies-literature
| S-EPMC6612810 | biostudies-literature
| S-EPMC3924722 | biostudies-literature
| S-EPMC4224119 | biostudies-literature
| S-EPMC2861314 | biostudies-literature
| S-EPMC6953292 | biostudies-literature