Dataset Information

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.

ABSTRACT: Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ?1?s and a 1.8 million-probe array in ?8?s.

SUBMITTER: Chen CH

PROVIDER: S-EPMC3141250 | biostudies-literature | 2011 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.

Chen Chih-Hao CH Lee Hsing-Chung HC Ling Qingdong Q Chen Hsiao-Rong HR Ko Yi-An YA Tsou Tsong-Shan TS Wang Sun-Chong SC Wu Li-Ching LC Lee H C HC

Nucleic acids research 20110516 13

Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are base ...[more]

PMID: 21576227

Similar Datasets

Project description:Objectives: Copy number variant (CNV) is believed to be the potential genetic cause of pregnancy loss. However, CNVs less than 3 Mb in euploid products of conceptions (POCs) remain largely unexplored. The aim of this study was to investigate the features of CNVs less than 3 Mb in POCs and their potential clinical significance in pregnancy loss/fetal death. Methods: CNV data were extracted from a cohort in our institution and 19 peer-reviewed publications, and only those CNVs less than 3 Mb detected in euploid pregnancy loss/fetal death were included. We conducted a CNV map to analyze the distribution of CNVs in chromosomes using R packages karyoploteR_1.10.5. Gene names and annotated gene types covered by those CNVs were mined from the human Release 19 reference genome file and GENECODE database. We assessed the expression patterns and the consequences of murine knock-out of those genes using TiGER and Mouse Genome Informatics (MGI) databases. Functional enrichment and pathway analysis for genes in CNVs were performed using clusterProfiler V3.12.0. Result: Breakpoints of 564 CNVs less than 3 Mb were obtained from 442 euploid POCs, with 349 gains and 185 losses. The CNV map showed that CNVs were distributed in all chromosomes, with the highest frequency detected in chromosome 22 and the lowest frequency in chromosome Y, and CNVs showed a higher density in the pericentromeric and sub-telomeric regions. A total of 5,414 genes mined from the CNV regions (CNVRs), Gene Ontology (GO), and pathway analysis showed that the genes were significantly enriched in multiple terms, especially in sensory perception, membrane region, and tight junction. A total of 995 protein-coding genes have been reported to present mammalian phenotypes in MGI, and 276 of them lead to embryonic lethality or abnormal embryo/placenta in knock-out mouse models. CNV located at 19p13.3 was the most common CNV of all POCs. Conclusion: CNVs less than 3 Mb in euploid POCs distribute unevenly in all chromosomes, and a higher density was seen in the pericentromeric and sub-telomeric regions. The genes in those CNVRs are significantly enriched in biological processes and pathways that are important to embryonic/fetal development. CNV in 19p13.3 and the variations of ARID3A and FSTL3 might contribute to pregnancy loss.

Dataset Information

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.

Publications

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets