Dataset Information

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

ABSTRACT: Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.

SUBMITTER: Collins JE

PROVIDER: S-EPMC3460200 | biostudies-literature | 2012 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

Collins John E JE White Simon S Searle Stephen M J SM Stemple Derek L DL

Genome research 20120712 10

Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tiss ...[more]

PMID: 22798491

Dataset Information

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

Publications

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Covering all your bases: incorporating intron signal from RNA-seq data.
| S-EPMC7671406 | biostudies-literature

The Ensembl REST API: Ensembl Data for Any Language.
| S-EPMC4271150 | biostudies-literature

Improving CLIP-seq data analysis by incorporating transcript information.
| S-EPMC7745353 | biostudies-literature

Disease and phenotype data at Ensembl.
| S-EPMC3099348 | biostudies-literature

SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data.
| S-EPMC6501316 | biostudies-literature

An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data.
| S-EPMC6830085 | biostudies-literature

RNA-Seq Data: A Complexity Journey.
| S-EPMC4232570 | biostudies-literature

eQTL Mapping Using RNA-seq Data.
| S-EPMC3650863 | biostudies-literature

ARH-seq: identification of differential splicing in RNA-seq data.
| S-EPMC4132698 | biostudies-literature

Integrating Bacterial ChIP-seq and RNA-seq Data With SnakeChunks.
| S-EPMC7302399 | biostudies-literature