Unknown

Dataset Information

0

Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences.


ABSTRACT: Accurate classification of HIV-1 group M lineages, henceforth referred to as subtyping, is essential for understanding global HIV-1 molecular epidemiology. Because most HIV-1 sequencing is done for genotypic resistance testing pol gene, we sought to develop a set of geographically-stratified pol sequences that represent HIV-1 group M sequence diversity. Representative pol sequences differ from representative complete genome sequences because not all CRFs have pol recombination points and because complete genome sequences may not faithfully reflect HIV-1 pol diversity. We developed a software pipeline that compiled 6,034 one-per-person complete HIV-1 pol sequences annotated by country and year belonging to 11 pure subtypes and 70 CRFs and selected a set of sequences whose average distance to the remaining sequences is minimized for each subtype/CRF and country to generate a Geographically-Stratified set of 716 Pol Subtype/CRF (GSPS) reference sequences. We provide extensive data on pol diversity within each subtype/CRF and country combination. The GSPS reference set will also be useful for HIV-1 pol subtyping.

SUBMITTER: Rhee SY 

PROVIDER: S-EPMC6067049 | biostudies-literature | 2018 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences.

Rhee Soo-Yon SY   Shafer Robert W RW  

Scientific data 20180731


Accurate classification of HIV-1 group M lineages, henceforth referred to as subtyping, is essential for understanding global HIV-1 molecular epidemiology. Because most HIV-1 sequencing is done for genotypic resistance testing pol gene, we sought to develop a set of geographically-stratified pol sequences that represent HIV-1 group M sequence diversity. Representative pol sequences differ from representative complete genome sequences because not all CRFs have pol recombination points and because  ...[more]

Similar Datasets

| S-EPMC3569284 | biostudies-literature
| S-EPMC3147966 | biostudies-literature
| S-EPMC2794860 | biostudies-literature
| S-EPMC4743977 | biostudies-literature
| S-EPMC3695425 | biostudies-literature
| S-EPMC2964586 | biostudies-literature
| S-EPMC4929356 | biostudies-literature
| S-EPMC3675511 | biostudies-literature
| S-EPMC7262644 | biostudies-literature
| S-EPMC3457179 | biostudies-literature