Unknown

Dataset Information

0

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.


ABSTRACT: Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ?1?s and a 1.8 million-probe array in ?8?s.

SUBMITTER: Chen CH 

PROVIDER: S-EPMC3141250 | biostudies-literature | 2011 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.

Chen Chih-Hao CH   Lee Hsing-Chung HC   Ling Qingdong Q   Chen Hsiao-Rong HR   Ko Yi-An YA   Tsou Tsong-Shan TS   Wang Sun-Chong SC   Wu Li-Ching LC   Lee H C HC  

Nucleic acids research 20110516 13


Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are base  ...[more]

Similar Datasets

| S-EPMC9903802 | biostudies-literature
| S-EPMC4410664 | biostudies-literature
| S-EPMC3317159 | biostudies-literature
| S-EPMC3563612 | biostudies-literature
| S-EPMC2495074 | biostudies-literature
| S-EPMC2822765 | biostudies-literature
| S-EPMC4345604 | biostudies-literature
2012-01-25 | GSE31018 | GEO
| S-EPMC3511991 | biostudies-literature
| S-EPMC8984164 | biostudies-literature