Dataset Information

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

ABSTRACT: Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3-14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.

SUBMITTER: Wittwer LD

PROVIDER: S-EPMC4193403 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Wittwer Lucas D LD Piližota Ivana I Altenhoff Adrian M AM Dessimoz Christophe C

PeerJ 20141007

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the trans ...[more]

PMID: 25320677

Dataset Information

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Publications

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Speeding up MadGraph5_aMC@NLO.
| S-EPMC8136271 | biostudies-literature

GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.
| S-EPMC4970815 | biostudies-literature

Speeding Up Gait in Parkinson's Disease.
| S-EPMC7304052 | biostudies-literature

Speeding up research with the Semantic Web
| S-EPMC3504542 | biostudies-literature

Speeding up biomolecular interactions by molecular sledding.
| S-EPMC4762599 | biostudies-literature

Speeding up quantum perceptron via shortcuts to adiabaticity.
| S-EPMC7952456 | biostudies-literature

Micro Magnetic Gyromixer for Speeding up Reactions in Droplets.
| S-EPMC3374403 | biostudies-literature

SparkEC: speeding up alignment-based DNA error correction tools.
| S-EPMC9639292 | biostudies-literature

Speeding up direct (15)N detection: hCaN 2D NMR experiment.
| S-EPMC3338130 | biostudies-literature

Posterior-based proposals for speeding up Markov chain Monte Carlo.
| S-EPMC6894579 | biostudies-literature