Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

RNA Subcellular Localization by Paired End diTag Sequencing from ENCODE/GIS


ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Yijun Ruan mailto:ruanyj@gis.a-star.edu.sg). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track is produced as part of the ENCODE Transcriptome Project. It shows the starts and ends of full length mRNA transcripts determined by GIS paired-end ditag (PET) sequencing using RNA extracts (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=rnaExtract) from different sub-cellular localizations (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=localization) in different cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType). The RNA-PET information provided in this track is composed of two different PET length versions based on how the PETs were extracted. The cloning-based PET (18 bp and 16 bp) is an earlier version and detailed information can be found from reference (Ng et al. 2006). The cloning-free PET (25 bp and 25 bp) is a recently modified version which uses Type II enzyme EcoP15I to generate a longer length of PET (unpublished), which results in a significant enhancement in both library construction and mapping efficiency. Both versions of PET templates were sequenced by Illumina platform at 2 x 36 bp Paired End sequencing. See the Methods and References sections below for more details. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Two different GIS RNA-PET protocols were used to generate the full length transcriptome PETs: one is based on a cloning-free RNA-PET library construction and sequencing strategy (unpublished), and the other is a cloning-based library construction (Ng et al. 2005) and recent Illumina paired end sequencing. Cloning-free RNA-PET (50 bp reads, 25 bp and 25 bp tag for each of the 5' and 3' ends)--Method: The cloning-free RNA-PET libraries were generated from polyA mRNA samples and constructed using a recently modified GIS protocol (unpublished). Total RNA in good quality was used as starting material and purified through MACs polyT column to obtain full length polyA mRNAs. Approximately 5 micrograms of enriched polyA mRNA were used for reverse transcription to convert polyA mRNA to full length cDNA. The obtained full length cDNA was modified and ligated with specific linker sequences, followed by circularization through ligation to generate circular cDNA molecules. The 25 bp tag from each end of the full length cDNA was extracted by type II enzyme EcoP15I digestion. The resulting PETs were ligated with sequencing adaptors at the both ends, amplified by PCR, and further purified as complex templates for paired end (PE) sequencing using Illumina platforms. Data: The sequenced RNA-PETs are unified in 25/25 bp length from each end of a cDNA. After filtering out redundant and noise tags, the unique PETs will proceed to analysis pipeline. Initially, the orientation of each tag will be screened out by the barcode built in the sequencing-template, then paired into a given orientation-PET. The orientation-determined RNA-PET is mapped onto reference genome allowing up to two mismatches. Majority of PETs are mapped on the known transcripts, or splice variants. A small portion of misaligned PETs, defined as discordant PETs, are mapped either too far from each tag, have wrong orientations, or mapped in different chromosomes, indicating exist some transcription variations which could be caused by genome structure variations: such as fusion, deletion, insertion, inversion, tandem repeat and translocation; or RNA trans-splicing etc. Cloning-based RNA-PET (34 bp reads, 18 bp and 16 bp tag for each of the 5' and 3' ends)--Method: The cloning-based RNA-PET (GIS-PET) libraries were generated from polyA RNA samples and constructed using the protocol described by Ng et al. (2005). Total RNA in good quality was used as starting material and further purified through MACs polyT column to enrich polyA mRNA and remove any contaminants (e.g., rRNA, tRNA, DNA, protein etc). Approximately 10 micrograms of polyA mRNA were then used for reverse transcription to convert polyA mRNA into full length cDNA. The obtained full length cDNA was modified with specific linker sequences, then, ligated to a GIS-developed (pGIS4) vector to form a complex full length cDNA library, which was cloned into E. coli. The plasmid DNA was then isolated from the library, followed by MmeI (a type II enzyme) digestion to generate a final length of 18 bp/16 bp ditags from each end of the full length cDNA. The single ditag (or called PET) was then ligated to form a diPET structure (a concatemer with two unrelated PET linked by a linker sequence) to facilitate Illuminaa Paired End sequencing. Data: The cloning-based RNA-PETs are unified in 18 bp and 16 bp length, respectively extracted from 5' and 3' end of each cDNA. The redundant reads were filtered out initially and unique ones were included for analysis. PET sequences were then mapped to (GRCh37, hg19, excluding mitochondirion, haplotypes, randoms and chromosome Y) reference genome using the following specific criteria (Ruan et al. 2007): A minimal continuous 16 bp match must exist for the 5' signature; the 3' signature must have a minimal continuous 14 bp match. Both 5' and 3' signatures must be present on the same chromosome. Their 5' to 3' orientation must be correct (5' signature followed by 3' signature). The maximal genomic span of a PET genomic alignment must be less than one million bp. PETs mapping to 2-10 locations are also included and may represent duplicated genes or pseudogenes in the genome. A majority of PETs mapped on the known transcripts or splice variants. A small portion of misaligned PETs, defined as discordant PETs, were mapped either too far from each other, mapped in the wrong orientation, or mapped to different chromosomes, indicating that some transcription variations exist which could be caused by genome structure variations: such as fusion, deletion, insertion, inversion, tandem repeat and translocation; or RNA trans-splicing etc. Clusters: To cluster the PETs the following procedure was applied: the mapping location of the 5' and 3' tag of a given PET was extended by 100 bp in both directions creating 5' and 3' search windows. If the 5' and 3' tags of a second PET mapped within the 5' and 3' search window of the first PET then the two PETs were clustered and the search windows were adjusted so that they contained the tag extensions of the second PET. PETs which subsequently mapped with their 5' and 3' tags within the adjusted 5' and 3' search window, respectively, were also assigned to this cluster and search window readjusted. This iterative process continued till no new PET was found to fall within the search window, at which stage all the found PETs are classified as belonging to a single cluster. This process is repeated till all PETs are assigned to a cluster. Verification: To assess overall PET quality and mapping specificity, the top ten most abundant PET clusters that mapped to well-characterized known genes were examined. Over 99% of the PETs represented full-length transcripts, and the majority fell within 10 bp of the known 5' and 3' boundaries of these transcripts. The PET mapping was further verified by confirming the existence of physical cDNA clones represented by the ditags. PCR primers were designed based on the PET sequences and amplified the corresponding cDNA inserts either from full length cDNA library (cloning-based PET) or from total RNA isolate (cloning-free PET) for sequencing confirmation.

ORGANISM(S): Homo sapiens

SUBMITTER: UCSC ENCODE DCC 

PROVIDER: E-GEOD-33600 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

altmetric image

Publications

Landscape of transcription in human cells.

Djebali Sarah S   Davis Carrie A CA   Merkel Angelika A   Dobin Alex A   Lassmann Timo T   Mortazavi Ali A   Tanzer Andrea A   Lagarde Julien J   Lin Wei W   Schlesinger Felix F   Xue Chenghai C   Marinov Georgi K GK   Khatun Jainab J   Williams Brian A BA   Zaleski Chris C   Rozowsky Joel J   Röder Maik M   Kokocinski Felix F   Abdelhamid Rehab F RF   Alioto Tyler T   Antoshechkin Igor I   Baer Michael T MT   Bar Nadav S NS   Batut Philippe P   Bell Kimberly K   Bell Ian I   Chakrabortty Sudipto S   Chen Xian X   Chrast Jacqueline J   Curado Joao J   Derrien Thomas T   Drenkow Jorg J   Dumais Erica E   Dumais Jacqueline J   Duttagupta Radha R   Falconnet Emilie E   Fastuca Meagan M   Fejes-Toth Kata K   Ferreira Pedro P   Foissac Sylvain S   Fullwood Melissa J MJ   Gao Hui H   Gonzalez David D   Gordon Assaf A   Gunawardena Harsha H   Howald Cedric C   Jha Sonali S   Johnson Rory R   Kapranov Philipp P   King Brandon B   Kingswood Colin C   Luo Oscar J OJ   Park Eddie E   Persaud Kimberly K   Preall Jonathan B JB   Ribeca Paolo P   Risk Brian B   Robyr Daniel D   Sammeth Michael M   Schaffer Lorian L   See Lei-Hoon LH   Shahab Atif A   Skancke Jorgen J   Suzuki Ana Maria AM   Takahashi Hazuki H   Tilgner Hagen H   Trout Diane D   Walters Nathalie N   Wang Huaien H   Wrobel John J   Yu Yanbao Y   Ruan Xiaoan X   Hayashizaki Yoshihide Y   Harrow Jennifer J   Gerstein Mark M   Hubbard Tim T   Reymond Alexandre A   Antonarakis Stylianos E SE   Hannon Gregory G   Giddings Morgan C MC   Ruan Yijun Y   Wold Barbara B   Carninci Piero P   Guigó Roderic R   Gingeras Thomas R TR  

Nature 20120901 7414


Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification  ...[more]

Similar Datasets

2011-11-10 | GSE33600 | GEO
2008-12-01 | GSE6694 | GEO
2008-11-30 | E-GEOD-6694 | biostudies-arrayexpress
2018-05-15 | PXD009266 | Pride
| PRJNA13186 | ENA
2021-01-01 | GSE160028 | GEO
2011-02-14 | E-GEOD-27221 | biostudies-arrayexpress
| PRJNA13970 | ENA
| PRJNA71829 | ENA
| PRJDB4472 | ENA