Unknown

Dataset Information

0

Cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly.


ABSTRACT: MOTIVATION:Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. RESULTS:We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ?1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. AVAILABILITY AND IMPLEMENTATION:CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Lewis TE 

PROVIDER: S-EPMC6513158 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly.

Lewis T E TE   Sillitoe I I   Lees J G JG  

Bioinformatics (Oxford, England) 20190501 10


<h4>Motivation</h4>Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory.<h4>Results</h4>We present cath-resolve-hits (CRH), a ne  ...[more]

Similar Datasets

| S-EPMC7585182 | biostudies-literature
| S-EPMC4094424 | biostudies-literature
| S-EPMC3998138 | biostudies-literature
| S-EPMC7211344 | biostudies-literature
| S-EPMC6253844 | biostudies-literature
| S-EPMC4612221 | biostudies-literature
| S-EPMC29791 | biostudies-literature
| S-EPMC2366924 | biostudies-literature
| S-EPMC155287 | biostudies-literature
| S-EPMC3525972 | biostudies-literature