Dataset Information

Genome-wide analysis of fitness data and its application to improve metabolic models.

ABSTRACT: BACKGROUND:Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence. RESULTS:We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction. CONCLUSION:Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data.

SUBMITTER: Vitkin E

PROVIDER: S-EPMC6180484 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Genome-wide analysis of fitness data and its application to improve metabolic models.

Vitkin Edward E Solomon Oz O Sultan Sharon S Yakhini Zohar Z

BMC bioinformatics 20181010 1

<h4>Background</h4>Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence.<h4>Results</h4>We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences ...[more]

PMID: 30305012

Similar Datasets

Project description:Saccharomyces cerevisiae undergoes robust oscillations to regulate its physiology for adaptation and survival under nutrient-limited conditions. Environmental cues can induce rhythmic metabolic alterations in order to facilitate the coordination of dynamic metabolic behaviors. Of such metabolic processes, the yeast metabolic cycle enables adaptation of the cells to varying nutritional status through oscillations in gene expression and metabolite production levels. In this process, yeast metabolism is altered between diverse cellular states based on changing oxygen consumption levels: quiescent (reductive charging [RC]), growth (oxidative [OX]), and proliferation (reductive building [RB]) phases. We characterized metabolic alterations during the yeast metabolic cycle using a variety of approaches. Gene expression levels are widely used for condition-specific metabolic simulations, whereas the use of epigenetic information in metabolic modeling is still limited despite the clear relationship between epigenetics and metabolism. This prompted us to investigate the contribution of epigenomic information to metabolic predictions for progression of the yeast metabolic cycle. In this regard, we determined altered pathways through the prediction of regulated reactions and corresponding model genes relying on differential chromatin accessibility levels. The predicted metabolic alterations were confirmed via data analysis and literature. We subsequently utilized RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) data sets in the contextualization of the yeast model. The use of ATAC-seq data considerably enhanced the predictive capability of the model. To the best of our knowledge, this is the first attempt to use genome-wide chromatin accessibility data in metabolic modeling. The preliminary results showed that epigenomic data sets can pave the way for more accurate metabolic simulations. IMPORTANCE Dynamic chromatin organization mediates the emergence of condition-specific phenotypes in eukaryotic organisms. Saccharomyces cerevisiae can alter its metabolic profile via regulation of genome accessibility and robust transcriptional oscillations under nutrient-limited conditions. Thus, both epigenetic information and transcriptomic information are crucial in the understanding of condition-specific metabolic behavior in this organism. Based on genome-wide alterations in chromatin accessibility and transcription, we investigated the yeast metabolic cycle, which is a remarkable example of coordinated and dynamic yeast behavior. In this regard, we assessed the use of ATAC-seq and RNA-seq data sets in condition-specific metabolic modeling. To our knowledge, this is the first attempt to use chromatin accessibility data in the reconstruction of context-specific metabolic models, despite the extensive use of transcriptomic data. As a result of comparative analyses, we propose that the incorporation of epigenetic information is a promising approach in the accurate prediction of metabolic dynamics.

Project description:BackgroundGenome-wide reconstructions of metabolism opened the way to thorough investigations of cell metabolism for health care and industrial purposes. However, the predictions offered by Flux Balance Analysis (FBA) can be strongly affected by the choice of flux boundaries, with particular regard to the flux of reactions that sink nutrients into the system. To mitigate possible errors introduced by a poor selection of such boundaries, a rational approach suggests to focus the modeling efforts on the pivotal ones.MethodsIn this work, we present a methodology for the automatic identification of the key fluxes in genome-wide constraint-based models, by means of variance-based sensitivity analysis. The goal is to identify the parameters for which a small perturbation entails a large variation of the model outcomes, also referred to as sensitive parameters. Due to the high number of FBA simulations that are necessary to assess sensitivity coefficients on genome-wide models, our method exploits a master-slave methodology that distributes the computation on massively multi-core architectures. We performed the following steps: (1) we determined the putative parameterizations of the genome-wide metabolic constraint-based model, using Saltelli's method; (2) we applied FBA to each parameterized model, distributing the massive amount of calculations over multiple nodes by means of MPI; (3) we then recollected and exploited the results of all FBA runs to assess a global sensitivity analysis.ResultsWe show a proof-of-concept of our approach on latest genome-wide reconstructions of human metabolism Recon2.2 and Recon3D. We report that most sensitive parameters are mainly associated with the intake of essential amino acids in Recon2.2, whereas in Recon 3D they are associated largely with phospholipids. We also illustrate that in most cases there is a significant contribution of higher order effects.ConclusionOur results indicate that interaction effects between different model parameters exist, which should be taken into account especially at the stage of calibration of genome-wide models, supporting the importance of a global strategy of sensitivity analysis.

Project description:Genome-scale metabolic models have been utilized extensively in the study and engineering of the organisms they describe. Here we present the analysis of a published dataset from pooled transposon mutant fitness experiments as an approach for improving the accuracy and gene-reaction associations of a metabolic model for Zymomonas mobilis ZM4, an industrially relevant ethanologenic organism with extremely high glycolytic flux and low biomass yield. Gene essentiality predictions made by the draft model were compared to data from individual pooled mutant experiments to identify areas of the model requiring deeper validation. Subsequent experiments showed that some of the discrepancies between the model and dataset were caused by polar effects, mis-mapped barcodes, or mutants carrying both wild-type and transposon disrupted gene copies-highlighting potential limitations inherent to data from individual mutants in these high-throughput datasets. Therefore, we analyzed correlations in fitness scores across all 492 experiments in the dataset in the context of functionally related metabolic reaction modules identified within the model via flux coupling analysis. These correlations were used to identify candidate genes for a reaction in histidine biosynthesis lacking an annotated gene and highlight metabolic modules with poorly correlated gene fitness scores. Additional genes for reactions involved in biotin, ubiquinone, and pyridoxine biosynthesis in Z. mobilis were identified and confirmed using mutant complementation experiments. These discovered genes, were incorporated into the final model, iZM4_478, which contains 747 metabolic and transport reactions (of which 612 have gene-protein-reaction associations), 478 genes, and 616 unique metabolites, making it one of the most complete models of Z. mobilis ZM4 to date. The methods of analysis that we applied here with the Z. mobilis transposon mutant dataset, could easily be utilized to improve future genome-scale metabolic reconstructions for organisms where these, or similar, high-throughput datasets are available.

Dataset Information

Genome-wide analysis of fitness data and its application to improve metabolic models.

Publications

Genome-wide analysis of fitness data and its application to improve metabolic models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets