Dataset Information

Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA.

ABSTRACT: BACKGROUND: RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified. RESULTS: Data for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of approximately 0.71 accuracy, approximately 0.64 sensitivity, and approximately 0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with approximately 0.74 accuracy, approximately 0.72 sensitivity, and approximately 0.81 specificity for the combined observations. CONCLUSIONS: Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.

SUBMITTER: Cummings MP

PROVIDER: S-EPMC521485 | biostudies-literature | 2004 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA.

Cummings Michael P MP Myers Daniel S DS

BMC bioinformatics 20040916

<h4>Background</h4>RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed ...[more]

PMID: 15373947

Similar Datasets

Project description:ObjectiveTo predict a woman's risk of postpartum hemorrhage at labor admission using machine learning and statistical models.MethodsPredictive models were constructed and compared using data from 10 of 12 sites in the U.S. Consortium for Safe Labor Study (2002-2008) that consistently reported estimated blood loss at delivery. The outcome was postpartum hemorrhage, defined as an estimated blood loss at least 1,000 mL. Fifty-five candidate risk factors routinely available on labor admission were considered. We used logistic regression with and without lasso regularization (lasso regression) as the two statistical models, and random forest and extreme gradient boosting as the two machine learning models to predict postpartum hemorrhage. Model performance was measured by C statistics (ie, concordance index), calibration, and decision curves. Models were constructed from the first phase (2002-2006) and externally validated (ie, temporally) in the second phase (2007-2008). Further validation was performed combining both temporal and site-specific validation.ResultsOf the 152,279 assessed births, 7,279 (4.8%, 95% CI 4.7-4.9) had postpartum hemorrhage. All models had good-to-excellent discrimination. The extreme gradient boosting model had the best discriminative ability to predict postpartum hemorrhage (C statistic: 0.93; 95% CI 0.92-0.93), followed by random forest (C statistic: 0.92; 95% CI 0.91-0.92). The lasso regression model (C statistic: 0.87; 95% CI 0.86-0.88) and logistic regression (C statistic: 0.87; 95% CI 0.86-0.87) had lower-but-good discriminative ability. The above results held with validation across both time and sites. Decision curve analysis demonstrated that, although all models provided superior net benefit when clinical decision thresholds were between 0% and 80% predicted risk, the extreme gradient boosting model provided the greatest net benefit.ConclusionPostpartum hemorrhage on labor admission can be predicted with excellent discriminative ability using machine learning and statistical models. Further clinical application is needed, which may assist health care providers to be prepared and triage at-risk women.

Project description:Upon infection of a new host, human immunodeficiency virus (HIV) replicates in the mucosal tissues and is generally undetectable in circulation for 1-2 weeks post-infection. Several interventions against HIV including vaccines and antiretroviral prophylaxis target virus replication at this earliest stage of infection. Mathematical models have been used to understand how HIV spreads from mucosal tissues systemically and what impact vaccination and/or antiretroviral prophylaxis has on viral eradication. Because predictions of such models have been rarely compared to experimental data, it remains unclear which processes included in these models are critical for predicting early HIV dynamics. Here we modified the "standard" mathematical model of HIV infection to include two populations of infected cells: cells that are actively producing the virus and cells that are transitioning into virus production mode. We evaluated the effects of several poorly known parameters on infection outcomes in this model and compared model predictions to experimental data on infection of non-human primates with variable doses of simian immunodifficiency virus (SIV). First, we found that the mode of virus production by infected cells (budding vs. bursting) has a minimal impact on the early virus dynamics for a wide range of model parameters, as long as the parameters are constrained to provide the observed rate of SIV load increase in the blood of infected animals. Interestingly and in contrast with previous results, we found that the bursting mode of virus production generally results in a higher probability of viral extinction than the budding mode of virus production. Second, this mathematical model was not able to accurately describe the change in experimentally determined probability of host infection with increasing viral doses. Third and finally, the model was also unable to accurately explain the decline in the time to virus detection with increasing viral dose. These results suggest that, in order to appropriately model early HIV/SIV dynamics, additional factors must be considered in the model development. These may include variability in monkey susceptibility to infection, within-host competition between different viruses for target cells at the initial site of virus replication in the mucosa, innate immune response, and possibly the inclusion of several different tissue compartments. The sobering news is that while an increase in model complexity is needed to explain the available experimental data, testing and rejection of more complex models may require more quantitative data than is currently available.

Dataset Information

Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA.

Publications

Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets