Dataset Information

Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

ABSTRACT: Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. When transcribed, pseudogenes may encode proteins or enact RNA-intrinsic regulatory mechanisms. However, the extent, characteristics and functional relevance of the human pseudogene transcriptome are unclear. Short-read sequencing platforms have limited power to resolve and accurately quantify pseudogene transcripts owing to the high sequence similarity of pseudogenes and their parent genes. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes. Pseudogene transcripts are expressed in tissue-specific patterns, exhibit complex splicing patterns and contribute to the coding sequences of known genes. We survey pseudogene transcripts encoding intact open reading frames (ORFs), representing potential unannotated protein-coding genes, and demonstrate their efficient translation in cultured cells. To assess the impact of noncoding pseudogenes on the cellular transcriptome, we delete the nucleus-enriched pseudogene PDCL3P4 transcript from HAP1 cells and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the transcriptional landscape underpinning human biology and disease.

ORGANISM(S): Homo sapiens

PROVIDER: GSE160383 | GEO | 2021/04/26

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Approximately half of M. leprae’s transcriptome consists of inactive gene products. This has an impact on overall energy and resource consumption without potential benefit to this organism. However, multiple translational ‘silencing’ mechanisms are present, reducing additional energy and resource expenditure required for protein production from these transcripts. The Mycobacterium leprae genome has less than 50% coding capacity and 1,133 pseudogenes. Preliminary evidence suggests that some pseudogenes are expressed. Therefore, defining pseudogene transcriptional and translational potentials should increase our understanding of their impact on M. leprae physiology. To address this, M. leprae was purified from the granulomatous hind footpad tissue of four individual nu/nu nude mice six months post-infection. M. leprae whole genome DNA microarrays representing the 1,614 annotated ORFs and 1,133 identified pseudogenes, were obtained from the Leprosy Research Support and Maintenance of an Armadillo Colony Post-Genome Era, Part I: Leprosy Research Support Contract (NO1 AI-25469) at Colorado State University. To validate 20% of genes positive by microarray analysis, RT-PCR was performed. Results of this study Gene expression analysis identified transcripts from 49% of all M. leprae genes including 57% of all ORFs and 43% of all pseudogenes in the genome. Pseudogenes were randomly distributed throughout the chromosome. Factors resulting in pseudogene transcription included: 1) co-orientation of transcribed pseudogenes with transcribed ORFs within or exclusive of operon-like structures; 2) the paucity of intrinsic stem-loop transcriptional terminators between transcribed ORFs and downstream pseudogenes; and 3) predicted pseudogene promoters. Mechanisms for translational silencing of pseudogene transcripts included the lack of both translational start codons and strong Shine-Dalgarno sequences. Transcribed pseudogenes also contained multiple in-frame stop codons and high Ka/Ks ratios, compared to that of homologs in M. tuberculosis and ORFs in M. leprae. A pseudogene transcript containing an active promoter, strong SD site, a start codon, but containing two in frame stop codons yielded a protein product when expressed in E. coli. Approximately half of M. leprae's transcriptome consists of inactive gene products consuming energy and resources without potential benefit to M. leprae. Presently it is unclear what additional detrimental affect(s) this large number of inactive mRNAs has on the functional capability of this organism. Translation of these pseudogenes may play an important role in overall energy consumption and resultant pathophysiological characteristics of M. leprae. However, this study also demonstrated that multiple translational silencing mechanisms are present, reducing additional energy and resource expenditure required for protein production from the vast majority of these transcripts.

Project description:Approximately half of M. lepraeâ??s transcriptome consists of inactive gene products. This has an impact on overall energy and resource consumption without potential benefit to this organism. However, multiple translational â??silencingâ?? mechanisms are present, reducing additional energy and resource expenditure required for protein production from these transcripts. The Mycobacterium leprae genome has less than 50% coding capacity and 1,133 pseudogenes. Preliminary evidence suggests that some pseudogenes are expressed. Therefore, defining pseudogene transcriptional and translational potentials should increase our understanding of their impact on M. leprae physiology. To address this, M. leprae was purified from the granulomatous hind footpad tissue of four individual nu/nu nude mice six months post-infection. M. leprae whole genome DNA microarrays representing the 1,614 annotated ORFs and 1,133 identified pseudogenes, were obtained from the Leprosy Research Support and Maintenance of an Armadillo Colony Post-Genome Era, Part I: Leprosy Research Support Contract (NO1 AI-25469) at Colorado State University. To validate 20% of genes positive by microarray analysis, RT-PCR was performed. Results of this study Gene expression analysis identified transcripts from 49% of all M. leprae genes including 57% of all ORFs and 43% of all pseudogenes in the genome. Pseudogenes were randomly distributed throughout the chromosome. Factors resulting in pseudogene transcription included: 1) co-orientation of transcribed pseudogenes with transcribed ORFs within or exclusive of operon-like structures; 2) the paucity of intrinsic stem-loop transcriptional terminators between transcribed ORFs and downstream pseudogenes; and 3) predicted pseudogene promoters. Mechanisms for translational silencing of pseudogene transcripts included the lack of both translational start codons and strong Shine-Dalgarno sequences. Transcribed pseudogenes also contained multiple in-frame stop codons and high Ka/Ks ratios, compared to that of homologs in M. tuberculosis and ORFs in M. leprae. A pseudogene transcript containing an active promoter, strong SD site, a start codon, but containing two in frame stop codons yielded a protein product when expressed in E. coli. Approximately half of M. leprae's transcriptome consists of inactive gene products consuming energy and resources without potential benefit to M. leprae. Presently it is unclear what additional detrimental affect(s) this large number of inactive mRNAs has on the functional capability of this organism. Translation of these pseudogenes may play an important role in overall energy consumption and resultant pathophysiological characteristics of M. leprae. However, this study also demonstrated that multiple translational silencing mechanisms are present, reducing additional energy and resource expenditure required for protein production from the vast majority of these transcripts. The overall design of this study was to identify the transcriptome of M. leprae in the granulomatous tissue of the mouse hind foot pad 6 months post infection.

Dataset Information

Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets