Unknown

Dataset Information

0

Tally-2.0: upgraded validator of tandem repeat detection in protein sequences.


ABSTRACT:

Motivation

Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs.

Results

Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%.

Availability and implementation

Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Perovic V 

PROVIDER: S-EPMC7214015 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9252815 | biostudies-literature
| S-EPMC4393523 | biostudies-literature
| S-EPMC3488214 | biostudies-literature
| S-EPMC2628660 | biostudies-literature
| S-EPMC3402919 | biostudies-literature
| S-EPMC4034141 | biostudies-literature
| S-EPMC6425644 | biostudies-literature
| S-EPMC7274563 | biostudies-literature
| S-EPMC3964956 | biostudies-literature