Unknown

Dataset Information

0

ArrayOme: a program for estimating the sizes of microarray-visualized bacterial genomes.


ABSTRACT: ArrayOme is a new program that calculates the size of genomes represented by microarray-based probes and facilitates recognition of key bacterial strains carrying large numbers of novel genes. Protein-coding sequences (CDS) that are contiguous on annotated reference templates and classified as 'Present' in the test strain by hybridization to microarrays are merged into ICs (ICs). These ICs are then extended to account for flanking intergenic sequences. Finally, the lengths of all extended ICs are summated to yield the 'microarray-visualized genome (MVG)' size. We tested and validated ArrayOme using both experimental and in silico-generated genomic hybridization data. MVG sizing of five sequenced Escherichia coli and Shigella strains resulted in an accuracy of 97-99%, as compared to true genome sizes, when the comprehensive ShE.coli meta-array gene sequences (6239 CDS) were used for in silico hybridization analysis. However, the E.coli CFT073 genome size was underestimated by 14% as this meta-array lacked probes for many CFT073 CDS. ArrayOme permits rapid recognition of discordances between PFGE-measured genome and MVG sizes, thereby enabling high-throughput identification of strains rich in novel genes. Gene discovery studies focused on these strains will greatly facilitate characterization of the global gene pool accessible to individual bacterial species.

SUBMITTER: Ou HY 

PROVIDER: S-EPMC546176 | biostudies-literature | 2005

REPOSITORIES: biostudies-literature

altmetric image

Publications

ArrayOme: a program for estimating the sizes of microarray-visualized bacterial genomes.

Ou Hong-Yu HY   Smith Rebecca R   Lucchini Sacha S   Hinton Jay J   Chaudhuri Roy R RR   Pallen Mark M   Barer Michael R MR   Rajakumar Kumar K  

Nucleic acids research 20050107 1


ArrayOme is a new program that calculates the size of genomes represented by microarray-based probes and facilitates recognition of key bacterial strains carrying large numbers of novel genes. Protein-coding sequences (CDS) that are contiguous on annotated reference templates and classified as 'Present' in the test strain by hybridization to microarrays are merged into ICs (ICs). These ICs are then extended to account for flanking intergenic sequences. Finally, the lengths of all extended ICs ar  ...[more]

Similar Datasets

| S-EPMC10797056 | biostudies-literature
| S-EPMC3697970 | biostudies-literature
| S-EPMC3601407 | biostudies-other
| S-EPMC2262894 | biostudies-literature
| S-EPMC3923086 | biostudies-literature
| S-EPMC6375794 | biostudies-literature
| S-EPMC4821278 | biostudies-literature
| PRJNA700858 | ENA
| S-EPMC6442598 | biostudies-literature
| S-EPMC6635912 | biostudies-literature