Dataset Information

The Booly aliasing resource: a database of grouped biological identifiers.

ABSTRACT: UNLABELLED:Redundancy among sequence identifiers is a recurring problem in bioinformatics. Here, we present a rapid and efficient method of fingerprinting identifiers to ascertain whether two or more aliases are identical. A number of tools and approaches have been developed to resolve differing names for the same genes and proteins, however, these methods each have their own limitations associated with their various goals. We have taken a different approach to the aliasing problem by simplifying the way aliases are stored and curated with the objective of simultaneously achieving speed and flexibility. Our approach (Booly-hashing) is to link identifiers with their corresponding hash keys derived from unique fingerprints such as gene or protein sequences. This tool has proven invaluable for designing a new data integration platform known as Booly, and has wide applicability to situations in which a dedicated efficient aliasing system is required. Compared with other aliasing techniques, Booly-hashing methodology provides 1) reduced run time complexity, 2) increased flexibility (aliasing of other data types, e.g. pharmaceutical drugs), 3) no required assumptions regarding gene clusters or hierarchies, and 4) simplicity in data addition, updating, and maintenance. The new Booly-hashing aliasing model has been incorporated as a central component of the Booly data integration platform we have recently developed and shoud be broadly applicable to other situations in which an efficient streamlined aliasing systems is required. This aliasing tool and database, which allows users to quickly group the same genes and proteins together can be accessed at: http://booly.ucsd.edu/alias. AVAILABILITY:The database is available for free at http://booly.ucsd.edu/alias.

SUBMITTER: Do LH

PROVIDER: S-EPMC3082858 | biostudies-other | 2011 Mar

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

The Booly aliasing resource: a database of grouped biological identifiers.

Do Long Hoang LH Bier Ethan E

Bioinformation 20110326 2

<h4>Unlabelled</h4>Redundancy among sequence identifiers is a recurring problem in bioinformatics. Here, we present a rapid and efficient method of fingerprinting identifiers to ascertain whether two or more aliases are identical. A number of tools and approaches have been developed to resolve differing names for the same genes and proteins, however, these methods each have their own limitations associated with their various goals. We have taken a different approach to the aliasing problem by si ...[more]

PMID: 21544171

Dataset Information

The Booly aliasing resource: a database of grouped biological identifiers.

Publications

The Booly aliasing resource: a database of grouped biological identifiers.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

NPBS database: a chemical data resource with relational data between natural products and biological sources.
| S-EPMC7731925 | biostudies-literature

Amaranth Genomic Resource Database: an integrated database resource of Amaranth genes and genomics.
| S-EPMC10337998 | biostudies-literature

DsTRD: Danshen Transcriptional Resource Database.
| S-EPMC4765898 | biostudies-literature

The Chinchilla Research Resource Database: resource for an otolaryngology disease model.
| S-EPMC4865329 | biostudies-literature

Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community.
| S-EPMC3509690 | biostudies-literature

EuPathDB: The Eukaryotic Pathogen Genomics Database Resource.
| S-EPMC7124890 | biostudies-literature

PCAS--a precomputed proteome annotation database resource.
| S-EPMC293463 | biostudies-literature

Immune epitope database analysis resource (IEDB-AR).
| S-EPMC2447801 | biostudies-literature

Medicago truncatula transporter database: a comprehensive database resource for M. truncatula transporters.
| S-EPMC3298476 | biostudies-literature

bioDBnet: the biological database network.
| S-EPMC2642638 | biostudies-literature