Dataset Information

Improving the prediction accuracy of protein abundance in Escherichia coli using mRNA accessibility.

ABSTRACT: RNA secondary structure around translation initiation sites strongly affects the abundance of expressed proteins in Escherichia coli. However, detailed secondary structural features governing protein abundance remain elusive. Recent advances in high-throughput DNA synthesis and experimental systems enable us to obtain large amounts of data. Here, we evaluated six types of structural features using two large-scale datasets. We found that accessibility, which is the probability that a given region around the start codon has no base-paired nucleotides, showed the highest correlation with protein abundance in both datasets. Accessibility showed a significantly higher correlation (Spearman's ρ = 0.709) than the widely used minimum free energy (0.554) in one of the datasets. Interestingly, accessibility showed the highest correlation only when it was calculated by a log-linear model, indicating that the RNA structural model and how to utilize it are important. Furthermore, by combining the accessibility and activity of the Shine-Dalgarno sequence, we devised a method for predicting protein abundance more accurately than existing methods. We inferred that the log-linear model has a broader probabilistic distribution than the widely used Turner energy model, which contributed to more accurate quantification of ribosome accessibility to translation initiation sites.

SUBMITTER: Terai G

PROVIDER: S-EPMC7641306 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improving the prediction accuracy of protein abundance in Escherichia coli using mRNA accessibility.

Terai Goro G Asai Kiyoshi K

Nucleic acids research 20200801 14

RNA secondary structure around translation initiation sites strongly affects the abundance of expressed proteins in Escherichia coli. However, detailed secondary structural features governing protein abundance remain elusive. Recent advances in high-throughput DNA synthesis and experimental systems enable us to obtain large amounts of data. Here, we evaluated six types of structural features using two large-scale datasets. We found that accessibility, which is the probability that a given region ...[more]

PMID: 32504488

Similar Datasets

Project description:BACKGROUND: Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible. RESULTS: Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell. As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells. CONCLUSION: Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism.

Project description:UNLABELLED:Engineering microbial hosts for the production of fungible fuels requires mitigation of limitations posed on the production capacity. One such limitation arises from the inherent toxicity of solvent-like biofuel compounds to production strains, such as Escherichia coli. Here we show the importance of host engineering for the production of short-chain alcohols by studying the overexpression of genes upregulated in response to exogenous isopentenol. Using systems biology data, we selected 40 genes that were upregulated following isopentenol exposure and subsequently overexpressed them in E. coli. Overexpression of several of these candidates improved tolerance to exogenously added isopentenol. Genes conferring isopentenol tolerance phenotypes belonged to diverse functional groups, such as oxidative stress response (soxS, fpr, and nrdH), general stress response (metR, yqhD, and gidB), heat shock-related response (ibpA), and transport (mdlB). To determine if these genes could also improve isopentenol production, we coexpressed the tolerance-enhancing genes individually with an isopentenol production pathway. Our data show that expression of 6 of the 8 candidates improved the production of isopentenol in E. coli, with the methionine biosynthesis regulator MetR improving the titer for isopentenol production by 55%. Additionally, expression of MdlB, an ABC transporter, facilitated a 12% improvement in isopentenol production. To our knowledge, MdlB is the first example of a transporter that can be used to improve production of a short-chain alcohol and provides a valuable new avenue for host engineering in biogasoline production. IMPORTANCE:The use of microbial host platforms for the production of bulk commodities, such as chemicals and fuels, is now a focus of many biotechnology efforts. Many of these compounds are inherently toxic to the host microbe, which in turn places a limit on production despite efforts to optimize the bioconversion pathways. In order to achieve economically viable production levels, it is also necessary to engineer production strains with improved tolerance to these compounds. We demonstrate that microbial tolerance engineering using transcriptomics data can also identify targets that improve production. Our results include an exporter and a methionine biosynthesis regulator that improve isopentenol production, providing a starting point to further engineer the host for biogasoline production.

Dataset Information

Improving the prediction accuracy of protein abundance in Escherichia coli using mRNA accessibility.

Publications

Improving the prediction accuracy of protein abundance in Escherichia coli using mRNA accessibility.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets