Unknown

Dataset Information

0

Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.


ABSTRACT: RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter the function of the encoded proteins. We applied the pipeline on publicly available SARS-CoV-2 datasets and demonstrated that the analysis pipeline accurately identified country or region-specific viral lineages and specific mutations that discriminate different lineages. The results presented here can help designing country-specific diagnostic strategies and prioritizing the mutations for functional interpretation and experimental validations.

SUBMITTER: Kandpal M 

PROVIDER: S-EPMC7514111 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.

Kandpal Manoj M   Davuluri Ramana V RV  

Statistics and applications 20200630 1


RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter  ...[more]

Similar Datasets

| S-EPMC7508310 | biostudies-literature
| S-EPMC10503461 | biostudies-literature
| S-EPMC3083704 | biostudies-literature
| S-EPMC6049094 | biostudies-literature
| S-EPMC3897925 | biostudies-literature
| S-EPMC3218317 | biostudies-literature
| S-EPMC1363357 | biostudies-literature
| S-EPMC6433899 | biostudies-literature
| S-EPMC7387429 | biostudies-literature
| S-EPMC5384728 | biostudies-literature