Dataset Information

Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.

ABSTRACT: The NCBI Reference Sequence (RefSeq) project and the NIH Mammalian Gene Collection (MGC) together define a set of approximately 30,000 nonredundant human mRNA sequences with identified coding regions representing 17,000 distinct loci. These high-quality mRNA sequences allow for the identification of transcribed regions in the human genome sequence, and many researchers accept them as the correct representation of each defined gene sequence. Computational comparison of these mRNA sequences and the recently published essentially finished human genome sequence reveals several thousand undocumented nonsynonymous substitution and frame shift discrepancies between the two resources. Additional analysis is undertaken to verify that the euchromatic human genome is sufficiently complete--containing nearly the whole mRNA collection, thus allowing for a comprehensive analysis to be undertaken. Many of the discrepancies will prove to be genuine polymorphisms in the human population, somatic cell genomic variants, or examples of RNA editing. It is observed that the genome sequence variant has significant additional support from other mRNAs and ESTs, almost four times more often than does the mRNA variant, suggesting that the genome sequence is more accurate. In approximately 15% of these cases, there is substantial support for both variants, suggestive of an undocumented polymorphism. An initial screening against a 24-individual genomic DNA diversity panel verified 60% of a small set of potential single nucleotide polymorphisms from which successful results could be obtained. We also find statistical evidence that a few of these discrepancies are due to RNA editing. Overall, these results suggest that the mRNA collections may contain a substantial number of errors. For current and future mRNA collections, it may be prudent to fully reconcile each genome sequence discrepancy, classifying each as a polymorphism, site of RNA editing or somatic cell variation, or genome sequence error.

SUBMITTER: Furey TS

PROVIDER: S-EPMC528917 | biostudies-literature | 2004 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.

Furey Terrence S TS Diekhans Mark M Lu Yontao Y Graves Tina A TA Oddy Lachlan L Randall-Maher Jennifer J Hillier LaDeana W LW Wilson Richard K RK Haussler David D

Genome research 20041001 10B

The NCBI Reference Sequence (RefSeq) project and the NIH Mammalian Gene Collection (MGC) together define a set of approximately 30,000 nonredundant human mRNA sequences with identified coding regions representing 17,000 distinct loci. These high-quality mRNA sequences allow for the identification of transcribed regions in the human genome sequence, and many researchers accept them as the correct representation of each defined gene sequence. Computational comparison of these mRNA sequences and th ...[more]

PMID: 15489323

Dataset Information

Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.

Publications

Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Genome sequence-independent identification of RNA editing sites.
| S-EPMC4382388 | biostudies-literature

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.
| S-EPMC154576 | biostudies-literature

RNA-programmed genome editing in human cells.
| S-EPMC3557905 | biostudies-other

RNA-guided genome editing a la carte.
| S-EPMC3674385 | biostudies-other

RCARE: RNA Sequence Comparison and Annotation for RNA Editing.
| S-EPMC4460956 | biostudies-literature

Engineering CRISPR-Cpf1 crRNAs and mRNAs to maximize genome editing efficiency.
| S-EPMC5562407 | biostudies-literature

Genome-wide identification and expression analysis of peach multiple organellar RNA editing factors reveals the roles of RNA editing in plant immunity.
| S-EPMC9746024 | biostudies-literature

Mapping of Micro-Tom BAC-End Sequences to the Reference Tomato Genome Reveals Possible Genome Rearrangements and Polymorphisms.
| S-EPMC3514829 | biostudies-literature

Refined Pichia pastoris reference genome sequence.
| S-EPMC5089815 | biostudies-literature

Genome-Wide Characterization of RNA Editing in Chicken Embryos Reveals Common Features among Vertebrates.
| S-EPMC4449034 | biostudies-literature