The Variability of Amino Acid Sequences in Hepatitis B Virus.
Ontology highlight
ABSTRACT: Hepatitis B virus (HBV) is an important human pathogen belonging to the Hepadnaviridae family, Orthohepadnavirus genus. Over 240 million people are infected with HBV worldwide. The reverse transcription during its genome replication leads to low fidelity DNA synthesis, which is the source of variability in the viral proteins. To investigate the variability quantitatively, we retrieved amino acid sequences of 5,167 records of all available HBV genotypes (A-J) from the Genbank database. The amino acid sequences encoded by the open reading frames (ORF) S/C/P/X in the HBV genome were extracted and subjected to alignment. We analyzed the variability of the lengths and the sequences of proteins as well as the frequencies of amino acids. It comprehensively characterized the variability and conservation of HBV proteins at the level of amino acids. Especially for the structural proteins, hepatitis B surface antigens (HBsAg), there are potential sites critical for virus assembly and immune recognition. Interestingly, the preS1 domains in HBsAg were variable at some positions of amino acid residues, which provides a potential mechanism of immune-escape for HBV, while the preS2 and S domains were conserved in the lengths of protein sequences. In the S domain, the cysteine residues and the secondary structures of the alpha-helix and beta-sheet were likely critical for the stable folding of all HBsAg components. Also, the preC domain and C-terminal domain of the core protein are highly conserved. However, the polymerases (HBpol) and the HBx were highly variable at the amino acid level. Our research provides a basis for understanding the conserved and important domains of HBV viral proteins, which could be potential targets for anti-virus therapy.
SUBMITTER: Cao J
PROVIDER: S-EPMC6420546 | biostudies-literature | 2019 Feb
REPOSITORIES: biostudies-literature
ACCESS DATA