The parasite Trichomonas vaginalis expresses thousands of pseudogenes and long non-coding RNAs independently from functional neighbouring genes.
Ontology highlight
ABSTRACT: The human pathogen Trichomonas vaginalis is a parabasalian flagellate that is estimated to infect 3% of the world's population annually. With a 160 megabase genome and up to 60,000 genes residing in six chromosomes, the parasite has the largest genome among sequenced protists. Although it is thought that the genome size and unusual large coding capacity is owed to genome duplication events, the exact reason and its consequences are less well studied.Among transcriptome data we found thousands of instances, in which reads mapped onto genomic loci not annotated as genes, some reaching up to several kilobases in length. At first sight these appear to represent long non-coding RNAs (lncRNAs), however, about half of these lncRNAs have significant sequence similarities to genomic loci annotated as protein-coding genes. This provides evidence for the transcription of hundreds of pseudogenes in the parasite. Conventional lncRNAs and pseudogenes are expressed in Trichomonas through their own transcription start sites and independently from flanking genes in Trichomonas. Expression of several representative lncRNAs was verified through reverse-transcriptase PCR in different T.?vaginalis strains and case studies exclude the use of alternative start codons or stop codon suppression for the genes analysed.Our results demonstrate that T.?vaginalis expresses thousands of intergenic loci, including numerous transcribed pseudogenes. In contrast to yeast these are expressed independently from neighbouring genes. Our results furthermore illustrate the effect genome duplication events can have on the transcriptome of a protist. The parasite's genome is in a steady state of changing and we hypothesize that the numerous lncRNAs could offer a large pool for potential innovation from which novel proteins or regulatory RNA units could evolve.
SUBMITTER: Woehle C
PROVIDER: S-EPMC4223856 | biostudies-literature | 2014 Oct
REPOSITORIES: biostudies-literature
ACCESS DATA