Unknown

Dataset Information

0

Genomic Surveillance of COVID-19 Variants With Language Models and Machine Learning.


ABSTRACT: The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape, increased transmissibility or pathogenicity. Early prediction for emergence of new strains with these features is critical for pandemic preparedness. We present Strainflow, a supervised and causally predictive model using unsupervised latent space features of SARS-CoV-2 genome sequences. Strainflow was trained and validated on 0.9 million sequences for the period December, 2019 to June, 2021 and the frozen model was prospectively validated from July, 2021 to December, 2021. Strainflow captured the rise in cases 2 months ahead of the Delta and Omicron surges in most countries including the prediction of a surge in India as early as beginning of November, 2021. Entropy analysis of Strainflow unsupervised embeddings clearly reveals the explore-exploit cycles in genomic feature-space, thus adding interpretability to the deep learning based model. We also conducted codon-level analysis of our model for interpretability and biological validity of our unsupervised features. Strainflow application is openly available as an interactive web-application for prospective genomic surveillance of COVID-19 across the globe.

SUBMITTER: Nagpal S 

PROVIDER: S-EPMC9024110 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

altmetric image

Publications

Genomic Surveillance of COVID-19 Variants With Language Models and Machine Learning.

Nagpal Sargun S   Pal Ridam R   Ashima   Tyagi Ananya A   Tripathi Sadhana S   Nagori Aditya A   Ahmad Saad S   Mishra Hara Prasad HP   Malhotra Rishabh R   Kutum Rintu R   Sethi Tavpritesh T  

Frontiers in genetics 20220408


The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape, increased transmissibility or pathogenicity. Early prediction for emergence of new strains with these features is critical for pandemic preparedness. We present <i>Strainflow</i>, a supervised and causally predictive model using unsupervised latent space features of SARS-CoV-2 genome sequences. <i>Strainflow</i> was trained  ...[more]

Similar Datasets

| S-EPMC10392838 | biostudies-literature
| S-EPMC11333698 | biostudies-literature
| S-EPMC8357494 | biostudies-literature
| S-EPMC8771811 | biostudies-literature
| S-EPMC10360290 | biostudies-literature
| S-EPMC9890886 | biostudies-literature
| S-EPMC8420483 | biostudies-literature
| S-EPMC10881983 | biostudies-literature
| S-EPMC10576163 | biostudies-literature
| S-EPMC8034680 | biostudies-literature