Unknown

Dataset Information

0

DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction.


ABSTRACT:

Motivation

Automated function prediction (AFP) of proteins is a large-scale multi-label classification problem. Two limitations of most network-based methods for AFP are (i) a single model must be trained for each species and (ii) protein sequence information is totally ignored. These limitations cause weaker performance than sequence-based methods. Thus, the challenge is how to develop a powerful network-based method for AFP to overcome these limitations.

Results

We propose DeepGraphGO, an end-to-end, multispecies graph neural network-based method for AFP, which makes the most of both protein sequence and high-order protein network information. Our multispecies strategy allows one single model to be trained for all species, indicating a larger number of training samples than existing methods. Extensive experiments with a large-scale dataset show that DeepGraphGO outperforms a number of competing state-of-the-art methods significantly, including DeepGOPlus and three representative network-based methods: GeneMANIA, deepNF and clusDCA. We further confirm the effectiveness of our multispecies strategy and the advantage of DeepGraphGO over so-called difficult proteins. Finally, we integrate DeepGraphGO into the state-of-the-art ensemble method, NetGO, as a component and achieve a further performance improvement.

Availability and implementation

https://github.com/yourh/DeepGraphGO.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: You R 

PROVIDER: S-EPMC8294856 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction.

You Ronghui R   Yao Shuwei S   Mamitsuka Hiroshi H   Zhu Shanfeng S  

Bioinformatics (Oxford, England) 20210701 Suppl_1


<h4>Motivation</h4>Automated function prediction (AFP) of proteins is a large-scale multi-label classification problem. Two limitations of most network-based methods for AFP are (i) a single model must be trained for each species and (ii) protein sequence information is totally ignored. These limitations cause weaker performance than sequence-based methods. Thus, the challenge is how to develop a powerful network-based method for AFP to overcome these limitations.<h4>Results</h4>We propose DeepG  ...[more]

Similar Datasets

| S-EPMC8808544 | biostudies-literature
| S-EPMC6602452 | biostudies-literature
| S-EPMC11252844 | biostudies-literature
| S-EPMC11810639 | biostudies-literature
| S-EPMC8388039 | biostudies-literature
| S-EPMC11302905 | biostudies-literature
| S-EPMC10319785 | biostudies-literature
| S-EPMC10243863 | biostudies-literature
| S-EPMC6221071 | biostudies-literature
| S-EPMC6311942 | biostudies-literature