Dataset Information

An integrated approach to determine the abundance, mutation rate and phylogeny of the SARS-CoV-2 genome.

ABSTRACT: The analysis of the SARS-CoV-2 genome datasets has significantly advanced our understanding of the biology and genomic adaptability of the virus. However, the plurality of advanced sequencing datasets-such as short and long reads-presents a formidable computational challenge to uniformly perform quantitative, variant or phylogenetic analysis, thus limiting its application in public health laboratories engaged in studying epidemic outbreaks. We present a computational tool, Infectious Pathogen Detector (IPD), to perform integrated analysis of diverse genomic datasets, with a customized analytical module for the SARS-CoV-2 virus. The IPD pipeline quantitates individual occurrences of 1060 pathogens and performs mutation and phylogenetic analysis from heterogeneous sequencing datasets. Using IPD, we demonstrate a varying burden (5.055-999655.7 fragments per million) of SARS-CoV-2 transcripts across 1500 short- and long-read sequencing SARS-CoV-2 datasets and identify 4634 SARS-CoV-2 variants (~3.05 variants per sample), including 449 novel variants, across the genome with distinct hotspot mutations in the ORF1ab and S genes along with their phylogenetic relationships establishing the utility of IPD in tracing the genome isolates from the genomic data (as accessed on 11 June 2020). The IPD predicts the occurrence and dynamics of variability among infectious pathogens-with a potential for direct utility in the COVID-19 pandemic and beyond to help automate the sequencing-based pathogen analysis and in responding to public health threats, efficaciously. A graphical user interface (GUI)-enabled desktop application is freely available for download for the academic users at http://www.actrec.gov.in/pi-webpages/AmitDutt/IPD/IPD.html and for web-based processing at http://ipd.actrec.gov.in/ipdweb/ to generate an automated report without any prior computational know-how.

SUBMITTER: Desai S

PROVIDER: S-EPMC7929363 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An integrated approach to determine the abundance, mutation rate and phylogeny of the SARS-CoV-2 genome.

Desai Sanket S Rashmi Sonal S Rane Aishwarya A Dharavath Bhasker B Sawant Aniket A Dutt Amit A

Briefings in bioinformatics 20210301 2

The analysis of the SARS-CoV-2 genome datasets has significantly advanced our understanding of the biology and genomic adaptability of the virus. However, the plurality of advanced sequencing datasets-such as short and long reads-presents a formidable computational challenge to uniformly perform quantitative, variant or phylogenetic analysis, thus limiting its application in public health laboratories engaged in studying epidemic outbreaks. We present a computational tool, Infectious Pathogen De ...[more]

PMID: 33479725

Similar Datasets

Project description:BACKGROUND: The outbreak of severe acute respiratory syndrome (SARS) caused a severe global epidemic in 2003 which led to hundreds of deaths and many thousands of hospitalizations. The virus causing SARS was identified as a novel coronavirus (SARS-CoV) and multiple genomic sequences have been revealed since mid-April, 2003. After a quiet summer and fall in 2003, the newly emerged SARS cases in Asia, particularly the latest cases in China, are reinforcing a wide-spread belief that the SARS epidemic would strike back. With the understanding that SARS-CoV might be with humans for years to come, knowledge of the evolutionary mechanism of the SARS-CoV, including its mutation rate and emergence time, is fundamental to battle this deadly pathogen. To date, the speed at which the deadly virus evolved in nature and the elapsed time before it was transmitted to humans remains poorly understood. RESULTS: Sixteen complete genomic sequences with available clinical histories during the SARS outbreak were analyzed. After careful examination of multiple-sequence alignment, 114 single nucleotide variations were identified. To minimize the effects of sequencing errors and additional mutations during the cell culture, three strategies were applied to estimate the mutation rate by 1) using the closely related sequences as background controls; 2) adjusting the divergence time for cell culture; or 3) using the common variants only. The mutation rate in the SARS-CoV genome was estimated to be 0.80 - 2.38 x 10-3 nucleotide substitution per site per year which is in the same order of magnitude as other RNA viruses. The non-synonymous and synonymous substitution rates were estimated to be 1.16 - 3.30 x 10-3 and 1.67 - 4.67 x 10-3 per site per year, respectively. The most recent common ancestor of the 16 sequences was inferred to be present as early as the spring of 2002. CONCLUSIONS: The estimated mutation rates in the SARS-CoV using multiple strategies were not unusual among coronaviruses and moderate compared to those in other RNA viruses. All estimates of mutation rates led to the inference that the SARS-CoV could have been with humans in the spring of 2002 without causing a severe epidemic.

Project description:The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has resulted in 92 million cases in a span of 1 year. The study focuses on understanding population-specific variations attributing its high rate of infections in specific geographical regions particularly in the United States. Rigorous phylogenomic network analysis of complete SARS-CoV-2 genomes (245) inferred five central clades named a (ancestral), b, c, d, and e (subtypes e1 and e2). Clade d and subclade e2 were found exclusively comprised of U.S. strains. Clades were distinguished by 10 co-mutational combinations in Nsp3, ORF8, Nsp13, S, Nsp12, Nsp2, and Nsp6. Our analysis revealed that only 67.46% of single nucleotide polymorphism (SNP) mutations were at the amino acid level. T1103P mutation in Nsp3 was predicted to increase protein stability in 238 strains except for 6 strains which were marked as ancestral type, whereas co-mutation (P409L and Y446C) in Nsp13 were found in 64 genomes from the United States highlighting its 100% co-occurrence. Docking highlighted mutation (D614G) caused reduction in binding of spike proteins with angiotensin-converting enzyme 2 (ACE2), but it also showed better interaction with the TMPRSS2 receptor contributing to high transmissibility among U.S. strains. We also found host proteins, MYO5A, MYO5B, and MYO5C, that had maximum interaction with viral proteins (nucleocapsid [N], spike [S], and membrane [M] proteins). Thus, blocking the internalization pathway by inhibiting MYO5 proteins which could be an effective target for coronavirus disease 2019 (COVID-19) treatment. The functional annotations of the host-pathogen interaction (HPI) network were found to be closely associated with hypoxia and thrombotic conditions, confirming the vulnerability and severity of infection. We also screened CpG islands in Nsp1 and N conferring the ability of SARS-CoV-2 to enter and trigger zinc antiviral protein (ZAP) activity inside the host cell.IMPORTANCE In the current study, we presented a global view of mutational pattern observed in SARS-CoV-2 virus transmission. This provided a who-infect-whom geographical model since the early pandemic. This is hitherto the most comprehensive comparative genomics analysis of full-length genomes for co-mutations at different geographical regions especially in U.S. strains. Compositional structural biology results suggested that mutations have a balance of opposing forces affecting pathogenicity suggesting that only a few mutations are effective at the translation level. Novel HPI analysis and CpG predictions elucidate the proof of concept of hypoxia and thrombotic conditions in several patients. Thus, the current study focuses the understanding of population-specific variations attributing a high rate of SARS-CoV-2 infections in specific geographical regions which may eventually be vital for the most severely affected countries and regions for sharp development of custom-made vindication strategies.

Project description:Background and objectivesTo understand how organisms evolve, it is fundamental to study how mutations emerge and establish. Here, we estimated the rate of mutation accumulation of SARS-CoV-2 in vitro and investigated the repeatability of its evolution when facing a new cell type but no immune or drug pressures.MethodologyWe performed experimental evolution with two strains of SARS-CoV-2, one carrying the originally described spike protein (CoV-2-D) and another carrying the D614G mutation that has spread worldwide (CoV-2-G). After 15 passages in Vero cells and whole genome sequencing, we characterized the spectrum and rate of the emerging mutations and looked for evidences of selection across the genomes of both strains.ResultsFrom the frequencies of the mutations accumulated, and excluding the genes with signals of selection, we estimate a spontaneous mutation rate of 1.3 × 10 -6 ± 0.2 × 10-6 per-base per-infection cycle (mean across both lineages of SARS-CoV-2 ± 2SEM). We further show that mutation accumulation is larger in the CoV-2-D lineage and heterogeneous along the genome, consistent with the action of positive selection on the spike protein, which accumulated five times more mutations than the corresponding genomic average. We also observe the emergence of mutators in the CoV-2-G background, likely linked to mutations in the RNA-dependent RNA polymerase and/or in the error-correcting exonuclease protein.Conclusions and implicationsThese results provide valuable information on how spontaneous mutations emerge in SARS-CoV-2 and on how selection can shape its genome toward adaptation to new environments. Lay Summary: Each time a virus replicates inside a cell, errors (mutations) occur. Here, via laboratory propagation in cells originally isolated from the kidney epithelium of African green monkeys, we estimated the rate at which the SARS-CoV-2 virus mutates-an important parameter for understanding how it can evolve within and across humans. We also confirm the potential of its Spike protein to adapt to a new environment and report the emergence of mutators-viral populations where mutations occur at a significantly faster rate.

Dataset Information

An integrated approach to determine the abundance, mutation rate and phylogeny of the SARS-CoV-2 genome.

Publications

An integrated approach to determine the abundance, mutation rate and phylogeny of the SARS-CoV-2 genome.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets