Unknown

Dataset Information

0

Comparing vertebrate whole-genome shotgun reads to the human genome.


ABSTRACT: Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted approximately 42,500 genes in the human, slightly more than reported previously.

SUBMITTER: Chen R 

PROVIDER: S-EPMC311156 | biostudies-literature | 2001 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Comparing vertebrate whole-genome shotgun reads to the human genome.

Chen R R   Bouck J B JB   Weinstock G M GM   Gibbs R A RA  

Genome research 20011101 11


Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the  ...[more]

Similar Datasets

| S-EPMC4120091 | biostudies-literature
| S-EPMC357027 | biostudies-literature
| S-EPMC1232128 | biostudies-literature
| S-EPMC3232206 | biostudies-literature
| S-EPMC3953531 | biostudies-literature
| S-EPMC8601920 | biostudies-literature
| S-EPMC4865240 | biostudies-literature
| S-EPMC3510651 | biostudies-literature
| S-EPMC3457223 | biostudies-literature
| S-EPMC151187 | biostudies-literature