Dataset Information

AntiFam: a tool to help identify spurious ORFs in protein annotation.

ABSTRACT: As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.

SUBMITTER: Eberhardt RY

PROVIDER: S-EPMC3308159 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

AntiFam: a tool to help identify spurious ORFs in protein annotation.

Eberhardt Ruth Y RY Haft Daniel H DH Punta Marco M Martin Maria M O'Donovan Claire C Bateman Alex A

Database : the journal of biological databases and curation 20120320

As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein ...[more]

PMID: 22434837

Dataset Information

AntiFam: a tool to help identify spurious ORFs in protein annotation.

Publications

AntiFam: a tool to help identify spurious ORFs in protein annotation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Gene Unprediction with Spurio: A tool to identify spurious protein sequences.
| S-EPMC5897793 | biostudies-literature

orfipy: a fast and flexible tool for extracting ORFs.
| S-EPMC8479652 | biostudies-literature

sitePath: a visual tool to identify polymorphism clades and help find fixed and parallel mutations.
| S-EPMC9701067 | biostudies-literature

ePIANNO: ePIgenomics ANNOtation tool.
| S-EPMC4747527 | biostudies-literature

ProFAT: a web-based tool for the functional annotation of protein sequences.
| S-EPMC1636073 | biostudies-literature

Pharmacogenomics Clinical Annotation Tool (PharmCAT).
| S-EPMC6977333 | biostudies-literature

STRAP PTM: Software Tool for Rapid Annotation and Differential Comparison of Protein Post-Translational Modifications.
| S-EPMC4240648 | biostudies-other

Profiling mouse brown and white adipocytes to identify metabolically relevant small ORFs and functional microproteins.
| S-EPMC9889109 | biostudies-literature

Pharokka: a fast scalable bacteriophage annotation tool.
| S-EPMC9805569 | biostudies-literature

GLANET: genomic loci annotation and enrichment tool.
| S-EPMC6355098 | biostudies-literature