Ontology highlight
ABSTRACT:
SUBMITTER: Steinegger M
PROVIDER: S-EPMC6026198 | biostudies-other | 2018 Jun
REPOSITORIES: biostudies-other
Steinegger Martin M Söding Johannes J
Nature communications 20180629 1
Metagenomic datasets contain billions of protein sequences that could greatly enhance large-scale functional annotation and structure prediction. Utilizing this enormous resource would require reducing its redundancy by similarity clustering. However, clustering hundreds of millions of sequences is impractical using current algorithms because their runtimes scale as the input set size N times the number of clusters K, which is typically of similar order as N, resulting in runtimes that increase ...[more]