Unknown

Dataset Information

0

Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone.


ABSTRACT: Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, F1 scores, sensitivities and specificities of 0.95-0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity.

SUBMITTER: Kuzmin K 

PROVIDER: S-EPMC7500881 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone.

Kuzmin Kiril K   Adeniyi Ayotomiwa Ezekiel AE   DaSouza Arthur Kevin AK   Lim Deuk D   Nguyen Huyen H   Molina Nuria Ramirez NR   Xiong Lanqiao L   Weber Irene T IT   Harrison Robert W RW  

Biochemical and biophysical research communications 20200918 3


Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequ  ...[more]

Similar Datasets

| S-EPMC5695212 | biostudies-literature
| S-EPMC7433773 | biostudies-literature
| S-EPMC7815805 | biostudies-literature
| S-EPMC3846850 | biostudies-literature
| S-EPMC6884624 | biostudies-literature
| S-EPMC7125587 | biostudies-literature
| S-EPMC9962193 | biostudies-literature
| S-EPMC3358902 | biostudies-other
| S-EPMC8443755 | biostudies-literature
| S-EPMC3292016 | biostudies-literature