Dataset Information

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.

ABSTRACT:

Background

This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets.

Results

On the ENCODE regions of the human genome, Pairagon+N-SCAN_EST was as accurate as any other system tested in the EGASP assessment, including ENSEMBL and ExoGean.

Conclusion

With sufficient mRNA/EST evidence, genome annotation without trans alignments can compete successfully with systems like ENSEMBL and ExoGean, which use trans alignments.

SUBMITTER: Arumugam M

PROVIDER: S-EPMC1810554 | biostudies-literature | 2006

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.

Arumugam Manimozhiyan M Wei Chaochun C Brown Randall H RH Brent Michael R MR

Genome biology 20060807

<h4>Background</h4>This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. Th ...[more]

PMID: 16925839

Dataset Information

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.

Background

Results

Conclusion

Publications

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

AASRA: an anchor alignment-based small RNA annotation pipeline†.
| S-EPMC8256102 | biostudies-literature

NCBI prokaryotic genome annotation pipeline.
| S-EPMC5001611 | biostudies-literature

FertilityOnline: A Straightforward Pipeline for Functional Gene Annotation and Disease Mutation Discovery.
| S-EPMC9801063 | biostudies-literature

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.
| S-EPMC7779008 | biostudies-literature

Bio301: A Web-Based EST Annotation Pipeline That Facilitates Functional Comparison Studies.
| S-EPMC4407203 | biostudies-literature

GAAP: A Genome Assembly + Annotation Pipeline.
| S-EPMC6617929 | biostudies-literature

TransAnnot-a fast transcriptome annotation pipeline.
| S-EPMC11530227 | biostudies-literature

DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.
| S-EPMC3738164 | biostudies-literature

A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data.
| S-EPMC2817484 | biostudies-literature

LASCA: loop and significant contact annotation pipeline.
| S-EPMC7973524 | biostudies-literature