Unknown

Dataset Information

0

A unified multitask architecture for predicting local protein properties.


ABSTRACT: A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

SUBMITTER: Qi Y 

PROVIDER: S-EPMC3312883 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

A unified multitask architecture for predicting local protein properties.

Qi Yanjun Y   Oja Merja M   Weston Jason J   Noble William Stafford WS  

PloS one 20120326 3


A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, succe  ...[more]

Similar Datasets

| S-EPMC5480319 | biostudies-other
| S-EPMC9045985 | biostudies-literature
| S-EPMC3386866 | biostudies-literature
| S-EPMC8515573 | biostudies-literature
| S-EPMC3694681 | biostudies-literature
| S-EPMC10246592 | biostudies-literature
| S-EPMC3317885 | biostudies-literature
| S-EPMC8381874 | biostudies-literature
| S-EPMC6794641 | biostudies-literature
| S-EPMC3492357 | biostudies-literature