Dataset Information

Phylo-geo-network and haplogroup analysis of 611 novel coronavirus (SARS-CoV-2) genomes from India.

ABSTRACT: The novel coronavirus (SARS-CoV-2) from Wuhan China discovered in December 2019 has since developed into a global epidemic. Presently, we constructed and analyzed the phylo-geo-network of SARS-CoV-2 genomes from across India to understand the viral evolution in the country. A total of 611 full-length genomes from different states of India were extracted from the EpiCov repository of GISAID initiative on 6 June, 2020. Their alignment with the reference sequence (Wuhan, NCBI accession number NC_045512.2) uncovered 270 parsimony informative sites. Furthermore, 339 genomes were divided into 51 haplogroups. The network revealed the core haplogroup as that of reference sequence NC_045512.2 (Haplogroup A1) with 157 identical sequences present across 16 states. Remaining haplogroups had <10 identical sequences across a maximum of three states. Some states with fewer samples had more haplogroups. Forty-one haplogroups were localized exclusively to any one state. The two most common lineages are B6 and B1 (Pangolin) whereas clade A2a (Covidex) appears to be the most predominant in India. Because the pandemic is still emerging, the observations need to be monitored.

SUBMITTER: Laskar R

PROVIDER: S-EPMC7994317 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:The evolutional process of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) development remains inconclusive. This study compared the genome sequences of severe acute respiratory syndrome coronavirus (SARS-CoV), bat coronavirus RaTG13, and SARS-CoV-2. In total, the genomes of SARS-CoV-2 and RaTG13 were 77.9% and 77.7% identical to the genome of SARS-CoV, respectively. A total of 3.6% (1,068 bases) of the SARS-CoV-2 genome was derived from insertion and/or deletion (indel) mutations, and 18.6% (5,548 bases) was from point mutations from the genome of SARS-CoV. At least 35 indel sites were confirmed in the genome of SARS-CoV-2, in which 17 were with ≥10 consecutive bases long. Ten of these relatively long indels were located in the spike (S) gene, five in nonstructural protein 3 (Nsp3) gene of open reading frame (ORF) 1a, and one in ORF8 and noncoding region. Seventeen (48.6%) of the 35 indels were based on insertion-and-deletion mutations with exchanged gene sequences of 7-325 consecutive bases. Almost the complete ORF8 gene was replaced by a single 325 consecutive base-long indel. The distribution of these indels was roughly in accordance with the distribution of the rate of point mutation rate around the indels. The genome sequence of SARS-CoV-2 was 96.0% identical to that of RaTG13. There was no long insertion-and-deletion mutation between the genomes of RaTG13 and SARS-CoV-2. The findings of the uneven distribution of multiple indels and the presence of multiple long insertion-and-deletion mutations with exchanged consecutive base sequences in the viral genome may provide insights into SARS-CoV-2 development. IMPORTANCE The developmental mechanism of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remains inconclusive. This study compared the base sequence one-by-one between severe acute respiratory syndrome coronavirus (SARS-CoV) or bat coronavirus RaTG13 and SARS-CoV-2. The genomes of SARS-CoV-2 and RaTG13 were 77.9% and 77.7% identical to the genome of SARS-CoV, respectively. Seventeen of the 35 sites with insertion and/or deletion mutations between SARS-CoV-2 and SARS-CoV were based on insertion-and-deletion mutations with the replacement of 7-325 consecutive bases. Most of these long insertion-and-deletion sites were concentrated in the nonstructural protein 3 (Nsp3) gene of open reading frame (ORF) 1a, S1 domain of the spike protein, and ORF8 genes. Such long insertion-and-deletion mutations were not observed between the genomes of RaTG13 and SARS-CoV-2. The presence of multiple long insertion-and-deletion mutations in the genome of SARS-CoV-2 and their uneven distributions may provide further insights into the development of the virus.

Dataset Information

Phylo-geo-network and haplogroup analysis of 611 novel coronavirus (SARS-CoV-2) genomes from India.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets