Unknown

Dataset Information

0

Machine learning in computational biology to accelerate high-throughput protein expression.


ABSTRACT:

Motivation

The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40?000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility.

Results

Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation.

Availability and implementation

We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets.

Contact

ebrunk@ucsd.edu or johanr@biotech.kth.se.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Sastry A 

PROVIDER: S-EPMC5870730 | biostudies-literature | 2017 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Machine learning in computational biology to accelerate high-throughput protein expression.

Sastry Anand A   Monk Jonathan J   Tegel Hanna H   Uhlen Mathias M   Palsson Bernhard O BO   Rockberg Johan J   Brunk Elizabeth E  

Bioinformatics (Oxford, England) 20170801 16


<h4>Motivation</h4>The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molec  ...[more]

Similar Datasets

| S-EPMC9284150 | biostudies-literature
| 2684555 | ecrin-mdr-crc
| S-EPMC8148342 | biostudies-literature
| S-EPMC10567565 | biostudies-literature
| S-EPMC5891835 | biostudies-literature
| S-EPMC10176946 | biostudies-literature
| S-EPMC8706962 | biostudies-literature
| S-EPMC7205464 | biostudies-literature
| S-EPMC10019373 | biostudies-literature
| S-EPMC4305044 | biostudies-literature