Unknown

Dataset Information

0

Analysis of several key factors influencing deep learning-based inter-residue contact prediction.


ABSTRACT: Motivation: Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated.

Results: We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction.

Availability and implementation: https://github.com/multicom-toolbox/DNCON2/.

Supplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: Wu T 

PROVIDER: S-EPMC7703788 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Analysis of several key factors influencing deep learning-based inter-residue contact prediction.

Wu Tianqi T   Hou Jie J   Adhikari Badri B   Cheng Jianlin J  

Bioinformatics (Oxford, England) 20200201 4


<h4>Motivation</h4>Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated.<h4>Results</h4>We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence ali  ...[more]

Similar Datasets

| S-EPMC3164537 | biostudies-literature
| S-EPMC2929137 | biostudies-literature
| S-EPMC6324825 | biostudies-literature
| S-EPMC5346958 | biostudies-literature
| S-EPMC8027171 | biostudies-literature
| S-EPMC5860164 | biostudies-literature
| S-EPMC7831258 | biostudies-literature
| S-EPMC6821021 | biostudies-literature
| S-EPMC6172270 | biostudies-literature
| S-EPMC3823628 | biostudies-literature