Computational Evaluation of the Strict Master and Random Template Models of Endogenous Retrovirus Evolution.
Ontology highlight
ABSTRACT: Transposable elements (TEs) are DNA sequences that are able to replicate and move within and between host genomes. Their mechanism of replication is also shared with endogenous retroviruses (ERVs), which are also a type of TE that represent an ancient retroviral infection within animal genomes. Two models have been proposed to explain TE proliferation in host genomes: the strict master model (SMM), and the random template (or transposon) model (TM). In SMM only a single copy of a given TE lineage is able to replicate, and all other genomic copies of TEs are derived from that master copy. In TM, any element of a given family is able to replicate in the host genome. In this paper, we simulated ERV phylogenetic trees under variations of SMM and TM. To test whether current phylogenetic programs can recover the simulated ERV phylogenies, DNA sequence alignments were simulated and maximum likelihood trees were reconstructed and compared to the simulated phylogenies. Results indicate that visual inspection of phylogenetic trees alone can be misleading. However, if a set of statistical summaries is calculated, we are able to distinguish between models with high accuracy by using a data mining algorithm that we introduce here. We also demonstrate the use of our data mining algorithm with empirical data for the porcine endogenous retrovirus (PERV), an ERV that is able to replicate in human and pig cells in vitro.
Project description:BackgroundEndogenous retroviruses (ERVs), which blur the boundary between virus and transposable element, are genetic material derived from retroviruses and have important implications for evolution. This study examines the diversity and evolution of human endogenous retroviruses (HERVs) of the HERVL family, which has long terminal repeats (LTRs) named MLT2.ResultsBy probability-based sequence comparison, we uncover systematic annotation errors that conceal the true complexity and diversity of transposable elements (TEs) in the human genome. Our analysis identifies new subfamilies within the MLT2 group, proposes a refined classification scheme, and constructs new consensus sequences. We present an evolutionary analysis including phylogenetic trees that elucidate the relationships between these subfamilies and their contributions to human evolution. The results underscore the significance of accurate TE annotation in understanding genome evolution, highlighting the potential for misclassified TEs to impact interpretations of genomic studies.Availability and implementationNot applicable.
Project description:Retroviruses manifest a very rich ensemble of genome structures. The evolution of retroviruses varies enormously, with fixation rates varying by as much as a million fold. The emergence of novel genome structures follows remorselessly with the fixation of point mutations and is most apparent for the lentivirus subgroup that has burst on the scene recently. Accordingly, bio-logic suggests that new genome structures will emerge among the lentiviruses, most notably HIV-1.
Project description:Here, we present the complete genome sequence of a porcine endogenous retrovirus determined by Pacific Biosciences sequencing. A comparison of the genome of this isolate with those of other strains revealed the operation of a mechanism resulting in the selective accumulation of G and C bases in the viral DNA.
Project description:The neuronal gene Arc is essential for long-lasting information storage in the mammalian brain and has been implicated in various neurological disorders. However, little is known about Arc's evolutionary origins. Recent studies suggest that mammalian Arc originated from a vertebrate lineage of Ty3/gypsy retrotransposons, which are also ancestral to retroviruses. In particular, Arc contains homology to the Gag polyprotein that forms the viral capsid and is essential for viral infectivity. This surprising connection raises the intriguing possibility that Arc may share molecular characteristics of retroviruses.
Project description:The human genome harbors many distinct families of human endogenous retroviruses (HERVs) that stem from exogenous retroviruses that infected the germ line millions of years ago. Many HERV families remain to be investigated. We report in the present study the detailed characterization of the HERV-K14I and HERV-K14CI families as they are represented in the human genome. Most of the 68 HERV-K14I and 23 HERV-K14CI proviruses are severely mutated, frequently displaying uniform deletions of retroviral genes and long terminal repeats (LTRs). Both HERV families entered the germ line approximately 39 million years ago, as evidenced by homologous sequences in hominoids and Old World primates and calculation of evolutionary ages based on a molecular clock. Proviruses of both families were formed during a brief period. A majority of HERV-K14CI proviruses on the Y chromosome mimic a higher evolutionary age, showing that LTR-LTR divergence data can indicate false ages. Fully translatable consensus sequences encoding major retroviral proteins were generated. Most HERV-K14I loci lack an env gene and are structurally reminiscent of LTR retrotransposons. A minority of HERV-K14I variants display an env gene. HERV-K14I proviruses are associated with three distinct LTR families, while HERV-K14CI is associated with a single LTR family. Hybrid proviruses consisting of HERV-K14I and HERV-W sequences that appear to have produced provirus progeny in the genome were detected. Several HERV-K14I proviruses harbor TRPC6 mRNA portions, exemplifying mobilization of cellular transcripts by HERVs. Our analysis contributes essential information on two more HERV families and on the biology of HERV sequences in general.
Project description:The human endogenous retrovirus type-H (HERVH) family is expressed in the preimplantation embryo. A subset of these elements are specifically transcribed in pluripotent stem cells where they appear to exert regulatory activities promoting self-renewal and pluripotency. How HERVH elements achieve such transcriptional specificity remains poorly understood. To uncover the sequence features underlying HERVH transcriptional activity, we performed a phyloregulatory analysis of the long terminal repeats (LTR7) of the HERVH family, which harbor its promoter, using a wealth of regulatory genomics data. We found that the family includes at least eight previously unrecognized subfamilies that have been active at different timepoints in primate evolution and display distinct expression patterns during human embryonic development. Notably, nearly all HERVH elements transcribed in ESCs belong to one of the youngest subfamilies we dubbed LTR7up. LTR7 sequence evolution was driven by a mixture of mutational processes, including point mutations, duplications, and multiple recombination events between subfamilies, that led to transcription factor binding motif modules characteristic of each subfamily. Using a reporter assay, we show that one such motif, a predicted SOX2/3 binding site unique to LTR7up, is essential for robust promoter activity in induced pluripotent stem cells. Together these findings illuminate the mechanisms by which HERVH diversified its expression pattern during evolution to colonize distinct cellular niches within the human embryo.
Project description:Human endogenous retrovirus type K (HERV-K) transcripts are upregulated in the plasma of HIV-infected individuals and have been considered as targets for an HIV vaccine. We evaluated cynomolgus macaque endogenous retrovirus (CyERV) mRNA expression by RT-qPCR in PBMCs isolated from a cohort of animals previously utilized in a live attenuated SIV vaccine trial. CyERV env transcript levels decreased following vaccination (control and vaccine groups) and CyERV env and gag mRNA expression was decreased following acute SIV-infection, whereas during chronic SIV infection, CyERV transcript levels were indistinguishable from baseline. Reduced susceptibility to initial SIV infection, as measured by the number of SIV challenges required for infection, was associated with increased CyERV transcript levels in PBMCs. In vitro analysis revealed that SIV infection of purified CD4(+) T-cells did not alter CyERV gene expression. This study represents the first evaluation of ERV expression in cynomolgus macaques following SIV infection, in an effort to assess the utility of cynomolgus macaques as an animal model to evaluate ERVs as a target for an HIV/SIV vaccine. This non-human primate model system does not recapitulate what has been observed to date in the plasma of HIV-infected humans suggesting that further investigation at the cellular level is required to elucidate the impact of HIV/SIV infection on endogenous retrovirus expression.
Project description:Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.
Project description:More than eight percent of the human genome consists of human endogenous retroviruses (HERVs). Typically, the expression of HERVs is repressed, but varying activities of HERVs have been observed in diseases ranging from cancer to neuro-degeneration. Such activities can include the transcription of HERV-derived open reading frames, which can be translated into proteins. However, as a consequence of mutations that disrupt open reading frames, most HERV-like sequences have lost their protein-coding capacity. Nevertheless, these loci can still influence the expression of adjacent genes and, hence, mediate biological effects. Here, we present WebHERV (http://calypso.informatik.uni-halle.de/WebHERV/), a web server that enables the computational prediction of active HERV-like sequences in the human genome based on a comparison of genome coordinates of expressed sequences uploaded by the user and genome coordinates of HERV-like sequences stored in the specialized key-value store DRUMS. Using WebHERV, we predicted putative candidates of active HERV-like sequences in Hodgkin lymphoma (HL) cell lines, validated one of them by a modified SMART (switching mechanism at 5' end of RNA template) technique, and identified a new alternative transcription start site for cytochrome P450, family 4, subfamily Z, polypeptide 1 (CYP4Z1).