Ontology highlight
ABSTRACT: Motivation
Copy number variation (CNV) is a type of structural variation, usually defined as genomic segments that are 1?kb or larger, which present variable copy numbers when compared with a reference genome. The screening and ranking algorithm (SaRa) was recently proposed as an efficient approach for multiple change-points detection, which can be applied to CNV detection. However, some practical issues arise from application of SaRa to single nucleotide polymorphism data.Results
In this study, we propose a modified SaRa on CNV detection to address these issues. First, we use the quantile normalization on the original intensities to guarantee that the normal mean model-based SaRa is a robust method. Second, a novel normal mixture model coupled with a modified Bayesian information criterion is proposed for candidate change-point selection and further clustering the potential CNV segments to copy number states. Simulations revealed that the modified SaRa became a robust method for identifying change-points and achieved better performance than the circular binary segmentation (CBS) method. By applying the modified SaRa to real data from the HapMap project, we illustrated its performance on detecting CNV segments. In conclusion, our modified SaRa method improves SaRa theoretically and numerically, for identifying CNVs with high-throughput genotyping data.Availability and implementation
The modSaRa package is implemented in R program and freely available at http://c2s2.yale.edu/software/modSaRa.Supplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Xiao F
PROVIDER: S-EPMC4410664 | biostudies-literature | 2015 May
REPOSITORIES: biostudies-literature
Bioinformatics (Oxford, England) 20141225 9
<h4>Motivation</h4>Copy number variation (CNV) is a type of structural variation, usually defined as genomic segments that are 1 kb or larger, which present variable copy numbers when compared with a reference genome. The screening and ranking algorithm (SaRa) was recently proposed as an efficient approach for multiple change-points detection, which can be applied to CNV detection. However, some practical issues arise from application of SaRa to single nucleotide polymorphism data.<h4>Results</h ...[more]