Unknown

Dataset Information

0

Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database.


ABSTRACT:

Background

Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses.

Results

By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment.

Conclusions

The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments.

SUBMITTER: Bell TG 

PROVIDER: S-EPMC5084120 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database.

Bell Trevor G TG   Yousif Mukhlid M   Kramvis Anna A  

SpringerPlus 20161028 1


<h4>Background</h4>Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses.<h4>Results</h4>By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 seque  ...[more]

Similar Datasets

| S-EPMC8359073 | biostudies-literature
| S-EPMC9882026 | biostudies-literature
| S-EPMC7426930 | biostudies-literature
| S-EPMC5580456 | biostudies-literature
| S-EPMC6036255 | biostudies-literature
| S-EPMC5572866 | biostudies-literature
2024-10-10 | PXD050548 | Pride
| S-EPMC147227 | biostudies-other
| S-EPMC31274 | biostudies-literature
| S-EPMC3898260 | biostudies-literature