Unknown

Dataset Information

0

Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database.


ABSTRACT: BACKGROUND:Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. RESULTS:By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment. CONCLUSIONS:The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments.

SUBMITTER: Bell TG 

PROVIDER: S-EPMC5084120 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database.

Bell Trevor G TG   Yousif Mukhlid M   Kramvis Anna A  

SpringerPlus 20161028 1


<h4>Background</h4>Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses.<h4>Results</h4>By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 seque  ...[more]

Similar Datasets

| S-EPMC8359073 | biostudies-literature
| S-EPMC9882026 | biostudies-literature
| S-EPMC7426930 | biostudies-literature
| S-EPMC5580456 | biostudies-literature
| S-EPMC6036255 | biostudies-literature
| S-EPMC5572866 | biostudies-literature
2024-10-10 | PXD050548 | Pride
| S-EPMC147227 | biostudies-other
| S-EPMC31274 | biostudies-literature
| S-EPMC3898260 | biostudies-literature