Unknown

Dataset Information

0

Computational inference of homologous gene structures in the human genome.


ABSTRACT: With the human genome sequence approaching completion, a major challenge is to identify the locations and encoded protein sequences of all human genes. To address this problem we have developed a new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model. Extensive testing shows that GenomeScan can accurately identify the exon-intron structures of genes in finished or draft human genome sequence with a low rate of false-positives. Application of GenomeScan to 2.7 billion bases of human genomic DNA identified at least 20,000-25,000 human genes out of an estimated 30,000-40,000 present in the genome. The results show an accurate and efficient automated approach for identifying genes in higher eukaryotic genomes and provide a first-level annotation of the draft human genome.

SUBMITTER: Yeh RF 

PROVIDER: S-EPMC311055 | biostudies-literature | 2001 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Computational inference of homologous gene structures in the human genome.

Yeh R F RF   Lim L P LP   Burge C B CB  

Genome research 20010501 5


With the human genome sequence approaching completion, a major challenge is to identify the locations and encoded protein sequences of all human genes. To address this problem we have developed a new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model. Extensive testing shows that GenomeScan can accurately identify the exon-intron structures of genes in finished or draft human genome sequ  ...[more]

Similar Datasets

| S-EPMC2998322 | biostudies-literature
| S-EPMC470742 | biostudies-literature
| S-EPMC11217675 | biostudies-literature
| S-EPMC7384676 | biostudies-literature
| S-EPMC7730797 | biostudies-literature
| S-EPMC10835048 | biostudies-literature
| S-EPMC7842877 | biostudies-literature
| S-EPMC5766185 | biostudies-literature
| S-EPMC2065750 | biostudies-literature
| S-EPMC8429276 | biostudies-literature