Unknown

Dataset Information

0

Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning.


ABSTRACT: The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome including previously unappreciated features in spike, RdRp and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural and immunological features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses.

SUBMITTER: Park JJ 

PROVIDER: S-EPMC8598947 | biostudies-literature | 2021 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning.

Park Jonathan J JJ   Chen Sidi S  

Patterns (New York, N.Y.) 20211118 2


The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 ge  ...[more]

Similar Datasets

| S-EPMC11217540 | biostudies-literature
| S-EPMC9821958 | biostudies-literature
| S-EPMC8294595 | biostudies-literature
| S-EPMC10892746 | biostudies-literature
| S-EPMC11773898 | biostudies-literature
| S-EPMC7997310 | biostudies-literature
| S-EPMC9936926 | biostudies-literature
| S-EPMC11687024 | biostudies-literature
| S-EPMC10525290 | biostudies-literature
| S-EPMC10798494 | biostudies-literature