Unknown

Dataset Information

0

Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics.


ABSTRACT: Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks. We show that purely mathematical procedures for site selection should be supervised by known phylogeny (i) to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties, and (ii) to avoid phylogenetic redundancy. We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites (as a proxy for phylogenetic-based criteria). This procedure demonstrates that, for short barcodes (e.g., 11 sites), there are thousands of informative site combinations that improve previous proposals. We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny, such that most representative genomes in these ancestral nodes are no longer in circulation. Consequently, coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.

SUBMITTER: Pardo-Seco J 

PROVIDER: S-EPMC7840454 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics.

Pardo-Seco Jacobo J   Gómez-Carballa Alberto A   Bello Xabier X   Martinón-Torres Federico F   Salas Antonio A  

Zoological research 20210101 1


Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks. We show that purely mathematical procedures for site selection should be supervised by known phylogeny (i) to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties, and (ii) to avoid phylogenetic redundancy. We propose a procedure that prevents information redundancy in site selection b  ...[more]

Similar Datasets

| S-EPMC8241501 | biostudies-literature
| S-EPMC7720652 | biostudies-literature
| S-EPMC7605265 | biostudies-literature
| S-BSST379 | biostudies-other
2017-08-01 | GSE101431 | GEO
| S-EPMC8548624 | biostudies-literature
| S-EPMC7584920 | biostudies-literature
| S-EPMC8323504 | biostudies-literature
| S-EPMC7444586 | biostudies-literature
| S-SCDT-10_1038-S44318-024-00061-0 | biostudies-other