Supervised Lowess normalization of comparative genome hybridization data: application to lactococcal strain comparisons.
Ontology highlight
ABSTRACT: Background Array-based comparative genome hybridization (aCGH) is commonly used to determine the genomic content of bacterial strains. In aCGH data, systematic errors are comparable to those occurring in transcriptome data. However, especially for microbes, an additional source of variation exists: differences in hybridization due to gene sequence divergence between the strains hybridized. Current normalization methods do not take this source of variation into consideration. Results We present Supervised Lowess, or S-Lowess, an application of the subset Lowess normalization method that does take difference in genomic content into account. The performance of S-Lowess was assessed on aCGH experiments in which differentially labeled genomic DNA fragments of Lactococcus lactis IL1403 and L. lactis MG1363 strains were hybridized to IL1403 microarrays. Since both genome sequences are known (they have only on average 85 % sequence identity), the success rate in detecting deletions in the MG1363 genome of different aCGH normalization methods can be compared. S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions. Conclusions S-Lowess removes systematic errors from dual-dye aCGH data in two steps: (i) determination of likely homologous genes (LHG); and (ii) estimation of correction factors for systematic errors from spots of LHG and subset Lowess normalization of the remaining spots using these correction factors. It is implemented in a user-friendly web-tool accessible from http://bioinformatics.biol.rug.nl/websoftware/s-lowess. We demonstrate that it outperforms existing normalization methods and maximizes the number of detectable genomic deletions or duplications from microbial aCGH data. Keywords: comparative genome hybridisation In this study, 4 aCGH comparisons (slides) between L. lactis MG1363 and L. lactis IL1403 were performed (including dye swap with biological replicates; see also supplementary materials). The resulting aCGH slide signals were normalized using the different 'likely homologous gene' (LHG) sets (see above) yielding ratios of signals of labelled DNA of MG1363 over those of IL1403. A maximum of 8 ratios per amplicon (gene) were obtained for the 4 hybridized slides (each with 2 replicate spots per amplicon). Only genes with at least 5 measurements were used in this study. The normalization methods evaluated in this study are: a) no normalization, b) total signal normalization, c) grid-based Lowess (implemented in PreP; f = 0.7) [GarcM-CM--a de la Nava, J et al. 2003. Bioinformatics 19:2328-2329], and d) S-Lowess using different subsets of conserved lactococcal genes (for details see above). Results with the MANOR R package (spatial normalization; standard parameters) [Neuvial P, et al. 2006. BMC Bioinformatics 7:264; Liva S, et al. 2006. Nucleic Acids Res 34:W477-W481] are shown in our supplementary materials.
ORGANISM(S): Lactococcus lactis subsp. cremoris MG1363
SUBMITTER: Sacha van Hijum
PROVIDER: E-GEOD-9039 | biostudies-arrayexpress |
REPOSITORIES: biostudies-arrayexpress
ACCESS DATA