Project description:Genomic sequences with high sequence similarity, such as parent-pseudogene pairs, cause short sequencing reads to align to multiple locations, thus complicating genomic analyses. However, their impact on transcriptomic analyses, including the estimation of gene expression and transcript annotation, has been less studied. Here, we investigated the impact of pseudogenes on transcriptomic analyses.
Project description:We analyzed transcriptomic data from infected and uninfected T-cells to identify pseudogenes and their parent genes showing differential expression in HIV-1 infection
Project description:Human genome encodes >14,000 pseudogenes that are evolutionary relics and have long been considered as nonfunctional genomic elements. Emerging evidence suggests that pseudogene can exert important regulatory function. However, function of most pseudogenes remains unknown. To fill this gap, we developed an integrated computational pipeline and performed to date the first set of pseudogene-focused CRISPRi screens in human cells. Our screens identified >100 pseudogenes that are important for cell fitness, with a more cell-type specific function compared to parent genes. In addition, we discovered a cancer-testis unitary pseudogene MGAT4EP that interacts with FOXA1, a key regulator in luminal A breast cancer.
Project description:We analyzed transcriptomic data from infected and uninfected T-cells to identify pseudogenes and their parent genes showing differential expression in HIV-1 infection H9 T-cell line was infected with NL4-3 strain of HIV-1 obtained by transfection of 293T cells. RNA from infected and uninfected cells was extracted 7 days post infection.
Project description:Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. When transcribed, pseudogenes may encode proteins or enact RNA-intrinsic regulatory mechanisms. However, the extent, characteristics and functional relevance of the human pseudogene transcriptome are unclear. Short-read sequencing platforms have limited power to resolve and accurately quantify pseudogene transcripts owing to the high sequence similarity of pseudogenes and their parent genes. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes. Pseudogene transcripts are expressed in tissue-specific patterns, exhibit complex splicing patterns and contribute to the coding sequences of known genes. We survey pseudogene transcripts encoding intact open reading frames (ORFs), representing potential unannotated protein-coding genes, and demonstrate their efficient translation in cultured cells. To assess the impact of noncoding pseudogenes on the cellular transcriptome, we delete the nucleus-enriched pseudogene PDCL3P4 transcript from HAP1 cells and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the transcriptional landscape underpinning human biology and disease.
Project description:Background: Canonical Nonsense Mediated Decay (NMD) is an important splicing-dependent process for mRNA surveillance in mammals. However, processed pseudogenes are not able to trigger NMD due to their lack of introns. It is largely unknown whether they have evolved other surveillance mechanisms. Results: Here, we find that the RNAs of pseudogenes, especially processed pseudogenes, have dramatically higher m6A levels than their cognate protein-coding genes, associated with de novo m6A peaks and motifs in human cells. Furthermore, pseudogenes have rapidly accumulated m6A motifs during evolution. The m6A sites of pseudogenes are evolutionarily younger than neutral sites and their m6A levels are increasing, supporting the idea that m6A on the RNAs of pseudogenes is under positive selection. We then find that the m6A RNA modification of processed, rather than unprocessed, pseudogenes promotes cytosolic RNA degradation and attenuates interference with the RNAs of their cognate protein-coding genes. We experimentally validate the m6A RNA modification of two processed pseudogenes, DSTNP2 and NAP1L4P1, which promotes the RNA degradation of both pseudogenes and their cognate protein-coding genes DSTN and NAP1L4. In addition, the m6A of DSTNP2 regulation of DSTN is partially dependent on the miRNA miR-362-5p. Conclusions: Our discovery reveals a novel evolutionary role of m6A RNA modification in cleaning up the unnecessary processed pseudogene transcripts to attenuate their interfering with the regulatory network of proteincoding genes.
Project description:Pseudogenes are defined as regions of the genome that resemble functional genes but contain disabling mutations and lack regulatory elements needed for transcription or translation. They are excellent markers for genome evolution and are emerging as crucial regulators of the development and disease, especially cancer. However, systematic functional characterization and evolution of pseudogene remain largely unexplored. In particular, the contribution of pseudogene to organ development is still unknown. Meanwhile, studies of pseudogene transcription, which is the first step for generating functional RNA, is precluded by the limited capacity of short-read sequencing. To address these issues, we systematically inferred the origin time and characterized the evolution pattern of pseudogenes. We leveraged PacBio full-length sequencing in combination with deep Illumina data as well as public developmental time-course RNA-seq, we dramatically expanded the analyzed samples and profiled genome-wide pseudogene expression paradigm. Additionally, we prioritized functional pseudogenes at multiple regulatory layers and determined their implications in disease and cancer biology.
Project description:The majority of bacterial genomes have high coding efficiencies, but there are an few genomes of the intracellular bacteria that have low gene density. The genome of the endosymbiont Sodalis glossinidius contains almost 50% pseudogenes containing mutations that putatively silence them at the genomic level. We have applied multiple omic strategies: combining single molecule DNA-sequencing and annotation; stranded RNA-sequencing and proteome analysis to better understand the transcriptional and translational landscape of Sodalis pseudogenes, and potential mechanisms for their control. Between 53% and 74% of the Sodalis transcriptome remains active in cell-free culture. Mean sense transcription from Coding Domain Sequences (CDS) is four-times greater than that from pseudogenes. Core-genome analysis of six Illumina sequenced Sodalis isolates from different host Glossina species shows pseudogenes make up ~40% of the 2,729 genes in the core genome, suggesting are stable and/or Sodalis is a recent introduction across the Glossina genus as a facultative symbiont. These data further shed light on the importance of transcriptional and translational control in deciphering host-microbe interactions, and demonstrate that pseudogenes are more complex than a simple degrading DNA sequence. For this reason, we show that combining genomics, transcriptomics and proteomics represents an important resource for studying prokaryotic genomes with a view to elucidating evolutionary adaptation to novel environmental niches.