Unknown

Dataset Information

0

Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.


ABSTRACT: Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts approximately 8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained "bombardment" over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for approximately 20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c.

SUBMITTER: Zhang Z 

PROVIDER: S-EPMC403796 | biostudies-literature | 2003 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.

Zhang Zhaolei Z   Harrison Paul M PM   Liu Yin Y   Gerstein Mark M  

Genome research 20031201 12


Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts approximately 8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to t  ...[more]

Similar Datasets

| S-EPMC540038 | biostudies-literature
| S-EPMC4805607 | biostudies-literature
| S-EPMC1087782 | biostudies-literature
| S-EPMC7020885 | biostudies-literature
| S-EPMC5016936 | biostudies-literature
| S-EPMC3996531 | biostudies-literature
| S-EPMC1456392 | biostudies-literature
| S-EPMC3296674 | biostudies-literature
| S-EPMC403797 | biostudies-literature
| S-EPMC2764836 | biostudies-literature