Dataset Information

Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation.

ABSTRACT: The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manufacture. mRNA vaccine acts by training the body to recognize and response to the proteins produced by disease-causing organisms such as viruses or bacteria. This type of vaccine is the fastest candidate to treat COVID-19 but it currently facing several limitations. In particular, it is a challenge to design stable mRNA molecules because of the inefficient in vivo delivery of mRNA, its tendency for spontaneous degradation and low protein expression levels. This work designed and implemented a sequence deep model based on bidirectional GRU and LSTM models applied on the Stanford COVID-19 mRNA vaccine dataset to predict the mRNA sequences responsible for degradation by predicting five reactivity values for every position in the sequence. Four of these values determine the likelihood of degradation with/without magnesium at high pH (pH 10) and high temperature (50 degrees Celsius) and the fifth reactivity value is used to determine the likely secondary structure of the RNA sample. The model relies on two types of features, namely numerical and categorical features, where the categorical features are extracted from the mRNA sequences, structure and predicted loop. These features are represented and encoded by numbers, and then, the features are extracted using embedding layer learning. There are five numerical features depending on the likelihood for each pair of nucleotides in the RNA. The model gives promising results because it predicts the five reactivity values with a validation mean columnwise root mean square error (MCRMSE) of 0.125 using LSTM model with augmentation and the codon encoding method. Codon encoding outperforms Base encoding in MCRMSE validation error using the LSTM model meanwhile Base encoding outperforms codon encoding due to less over-fitting and the difference between the training and validation loss error is 0.008.

SUBMITTER: Qaid TS

PROVIDER: S-EPMC8237341 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation.

Qaid Talal S TS Mazaar Hussein H Alqahtani Mohammed S MS Raweh Abeer A AA Alakwaa Wafaa W

PeerJ. Computer science 20210622

The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manu ...[more]

PMID: 34239977

Similar Datasets

Project description:In the last two years, the coronavirus disease 19 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been a scientific and social challenge worldwide. Vaccines have been the most effective intervention for reducing virus transmission and disease severity. However, virus genetic variants are still circulating among vaccinated individuals with different symptomatology disease cases. Understanding the protective or disease associated mechanisms in vaccinated individuals is relevant to advance in vaccine development and implementation. To address this objective, serum protein profiles were characterized by quantitative proteomics and data analysis algorithms in four cohorts of vaccinated individuals uninfected and SARS-CoV-2 infected with asymptomatic, nonsevere and severe disease symptomatology. The results showed that immunoglobulins were the most overrepresented proteins in infected cohorts when compared to PCR-negative individuals. The immunoglobulin profile varied between different infected cohorts and correlated with protective or disease associated capacity. Overrepresented immunoglobulins in PCR-positive individuals correlated with protective response against SARS-CoV-2, other viruses, and thrombosis in asymptomatic cases. In nonsevere cases, correlates of protection against SARS-CoV-2 and HBV together with risk of myasthenia gravis and allergy and autoantibodies were observed. Patients with severe symptoms presented risk for allergy, chronic idiopathic thrombocytopenic purpura, and autoantibodies. The analysis of underrepresented immunoglobulins in PCR-positive compared to PCR-negative individuals identified vaccine-induced protective epitopes in various coronavirus proteins including the Spike receptor-binding domain RBD. Non-immunoglobulin proteins were associated with COVID-19 symptoms and biological processes. These results evidence host-associated differences in response to vaccination and the possibility of improving vaccine efficacy against SARS-CoV-2.

Dataset Information

Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation.

Publications

Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets