Project description:Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385,000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH, and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM’s performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of less than 10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2-5 kb, and 1 kb, respectively. Keywords: CNV detection algorithm development and assessment
Project description:The extent to which differences in germ line DNA copy number contribute to natural phenotypic variation is unknown. We analyzed the copy number content of the mouse genome to a sub-10 kb resolution. We identified over 1,300 copy number variant regions (CNVRs), most of which are < 10 kb in length, are found in more than one strain, and, in total, span 3.2% (85 Mb) of the genome. To assess the potential functional impact of copy number variation, we mapped expression profiles of purified hematopoietic stem and progenitor cells, adipose tissue and hypothalamus to CNVRs in cis. Of the more than 600 significant associations between CNVRs and expression profiles, most map to CNVRs outside of the transcribed regions of genes. In hematopoietic stem/progenitor cells, up to 28% of strain-dependent expression variation is associated with copy number variation, supporting the role of germ line CNVs as major contributors to natural phenotypic variation in the laboratory mouse. To map the CNV content of the mouse genome, we selected 17 Tier 1-3 Mouse Phenome Project strains and three additional strains of biomedical interest, representing all major inbred lineages. We performed comparative genomic hybridization using a long-oligonucleotide array containing 2,149,887 probes evenly spaced across the reference genome with a median inter-probe spacing of 1,015 bases. Labeling, hybridization, washing and array imaging were performed as previously described (PMID:16075461). We performed segmentation using wuHMM, a Hidden Markov Model algorithm that utilizes sequence-level information and can detect CNVs less than 5 kb in length (fewer than five probes) at a low false positive rate (PMID:18334530). To estimate the overall impact of CNV on gene expression in vivo, we performed expression profiling of hematopoietic stem/progenitors cells using the Illumina Mouse Beadchip-6v1 platform. See manuscript for further details.
Project description:Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385,000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH, and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM’s performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of less than 10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2-5 kb, and 1 kb, respectively. Keywords: CNV detection algorithm development and assessment All four samples in this series are hybridizations of genomic DNA from inbred mouse strains 129X1/SvJ versus C57BL6/J. The experiments were performed at increasing resolutions (one 385K, two 2.1M, and one 3.1M).
Project description:We saw a patient who presented with respiratory distress from birth due to interstitial lung disease. Before the age of three months a diagnosis of nephrotic syndrome was made. Lung biopsy revealed pulmonary interstitial glycogenosis. Despite extensive investigations, no known genetic or infectious cause was found for the congenital nephrotic syndrome. The patient died at the age of 8 months due to respiratory failure. A 20 Mb homozygous region was identified on chromosome 17 in the patientM-bM-^@M-^Ys DNA, revealing a novel homozygous missense variant in ITGA3 gene. Genomic DNA was obtained from peripheral blood samples of the patient with interstitial lung fibrosis and nephrotic syndrome. Copy number variation (CNV) screening by means of microarray analyses was carried out on the Affymetrix GeneChip 250k (NspI) SNP array platform (Affymetrix, Inc., Santa Clara, CA, USA), which contains 25-mer oligonucleotides representing a total of 262,264 SNPs. Hybridizations were performed according to the manufacturerM-bM-^@M-^Ys protocols. Copy numbers and M-bM-^@M-^\long contiguous stretches of homozygosityM-bM-^@M-^] (LCSH/LOH) were determined using the 2.0 version of the CNAG (Copy Number Analyzer for Affymetrix GeneChip mapping) software package (Nannya Y, Sanada M, et al (2005) "A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays." Cancer Res; 65: 6071M-bM-^@M-^S6079.). The average resolution of this array platform, described by McMullan et al is 150M-bM-^@M-^S200 kb (Mc McMullan DJ, Bonin M et al. (2009), M-bM-^@M-^\Molecular karyotyping of patients with unexplained mental retardation by SNP arrays: a multicenter studyM-bM-^@M-^] Hum Mutat. Jul;30(7):1082-92).
Project description:Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice. To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 715,851 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars M-bM-^@M-^XNipponbareM-bM-^@M-^Y (from O. sativa ssp. japonica) and M-bM-^@M-^XGuang-lu-ai 4M-bM-^@M-^Y (from O. sativa ssp. indica). These CNVs contain some known genes. They are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics. Genomic DNA isolated from Nipponbare and Guang-lu-ai 4. Cy5 labeled DNA from Nipponbare used as reference and Cy3 labeled DNA from Guang-lu-ai 4 was hybridized to the 720K rice tiling array including three replicates. Fluorescence intensity data were normalized with qspline algorithm and ratio data were analyzed with the circular binary segmentation algorithm. Copy number variation calls were made if the averaged Log2 ratio of a segment was shifted by 1.0 from the baseline.