Unknown

Dataset Information

0

Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses.


ABSTRACT: In December 2019, a novel human-infecting coronavirus (SARS-CoV-2) was recognized in China. In a few months, SARS-CoV-2 has caused thousands of disease cases and deaths in several countries. Phylogenetic analyses indicated that SARS-CoV-2 clusters with SARS-CoV in the Sarbecovirus subgenus and viruses related to SARS-CoV-2 were identified from bats and pangolins. Coronaviruses have long and complex genomes with high plasticity in terms of gene content. To date, the coding potential of SARS-CoV-2 remains partially unknown. We thus used available sequences of bat and pangolin viruses to determine the selective events that shaped the genome structure of SARS-CoV-2 and to assess its coding potential. By searching for signals of significantly reduced variability at synonymous sites (dS), we identified six genomic regions, one of these corresponding to the programmed -1 ribosomal frameshift. The most prominent signal of dS reduction was observed within the E gene. A genome-wide analysis of conserved RNA structures indicated that this region harbors a putative functional RNA element that is shared with the SARS-CoV lineage. Additional signals of reduced dS indicated the presence of internal ORFs. Whereas the presence ORF9a (internal to N) was previously proposed by homology with a well characterized protein of SARS-CoV, ORF3h (for hypothetical, within ORF3a) was not previously described. The predicted product of ORF3h has 90% identity with the corresponding predicted product of SARS-CoV and displays features suggestive of a viroporin. Finally, analysis of the putative ORF10 revealed high dN/dS (3.82) in SARS-CoV-2 and related coronaviruses. In the SARS-CoV lineage, the ORF is predicted to encode a truncated protein and is neutrally evolving. These data suggest that ORF10 encodes a functional protein in SARS-CoV-2 and that positive selection is driving its evolution. Experimental analyses will be necessary to validate and characterize the coding and non-coding functional elements we identified.

SUBMITTER: Cagliani R 

PROVIDER: S-EPMC7199688 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses.

Cagliani Rachele R   Forni Diego D   Clerici Mario M   Sironi Manuela M  

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 20200505


In December 2019, a novel human-infecting coronavirus (SARS-CoV-2) was recognized in China. In a few months, SARS-CoV-2 has caused thousands of disease cases and deaths in several countries. Phylogenetic analyses indicated that SARS-CoV-2 clusters with SARS-CoV in the Sarbecovirus subgenus and viruses related to SARS-CoV-2 were identified from bats and pangolins. Coronaviruses have long and complex genomes with high plasticity in terms of gene content. To date, the coding potential of SARS-CoV-2  ...[more]

Similar Datasets

| S-EPMC7217285 | biostudies-literature
| S-EPMC7373990 | biostudies-literature
| S-EPMC3639795 | biostudies-literature
| EMPIAR-10999 | biostudies-other
| PRJNA1128642 | ENA
| S-EPMC7750375 | biostudies-literature
2020-05-08 | GSE149973 | GEO
| S-BSST379 | biostudies-other
| S-EPMC7348026 | biostudies-literature
| S-EPMC8104055 | biostudies-literature