Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Structural annotation of equine protein-coding genes determined by mRNA sequencing

ABSTRACT: The horse, like a majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for a majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758,105 thirty five-base sequence tags, equaling 10.28 giga-basepairs of total sequence data. The tag alignments represent approximately 208X coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to define structural annotation for 11,356 genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples.

ORGANISM(S): Equus caballus

PROVIDER: GSE21925 | GEO | 2010/11/10

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Structural annotation of equine protein-coding genes determined by mRNA sequencing

Project description:The horse, like a majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for a majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758,105 thirty five-base sequence tags, equaling 10.28 giga-basepairs of total sequence data. The tag alignments represent approximately 208X coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to define structural annotation for 11,356 genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples. Examination of 8 equine RNA samples representing 6 distinct tissues

2010-11-10 | E-GEOD-21925 | biostudies-arrayexpress

Analysis of Unannotated Equine Transcripts Identified by mRNA Sequencing

Project description:Horse-specific genes are not readily identified from available equine EST/cDNA resources due to relatively limited coverage. In addition, equine gene sets predicted in silico by Ensembl and NCBI will not identify horse specific genes since they rely on homology-based projection of gene structure annotation from other species. In this study, RNA-seq of 8 equine RNA samples representing 6 distinct tissues was performed and used to improve and refine equine gene structure annotation. The samples and RNA were collected as part of the related study E-GEOD-21925 and are described in Coleman et al 2010. Anim Genet 41 Suppl 2: 121-30 (PMID: 21070285). The RNA from these samples was re-sequenced in this experiment. The tissues were i). the articular cartilage and synovial membrane samples from a 3-year-old male pony. The left carpal joints received four LPS injections (0.5 ng) over 8 days, while the right carpal joints received control injections of PBS. ii) A cerebellum sample was collected from a 2-year-old female thoroughbred. iii) A testis sample from a 4-year-old thoroughbred. iv) A placental villous sample collected immediately post-partum from a full-term female thoroughbred foal. v) A whole embryo sample was obtained from a 34-day-old male thoroughbred conceptus. The embryo, cerebellum, testis and placental samples were of apparent normal gross morphology.

2013-07-30 | E-GEOD-46858 | biostudies-arrayexpress

Analysis of Unannotated Equine Transcripts Identified by mRNA Sequencing

Project description:Sequencing of equine mRNA (RNA-seq) identified 428 putative transcripts which do not map to any previously annotated or predicted horse genes. Most of these encode the equine homologs of known protein-coding genes described in other species, yet the potential exists to identify novel and perhaps equine-specific gene structures. A set of 36 transcripts were prioritized for further study by filtering for levels of expression (depth of RNA-seq read coverage), distance from annotated features in the equine genome, the number of putative exons, and patterns of gene expression between tissues. From these, four were selected for further investigation based on predicted open reading frames of greater than or equal to 50 amino acids and lack of detectable homology to known genes across species. Sanger sequencing of RT-PCR amplicons from additional equine samples confirmed expression and structural annotation of each transcript. Functional predictions were made by conserved domain searches. A single transcript, expressed in the cerebellum, contains a putative kruppel-associated box (KRAB) domain, suggesting a potential function associated with zinc finger proteins and transcriptional regulation. Overall levels of conserved synteny and sequence conservation across a 1MB region surrounding each transcript were approximately 73% compared to the human, canine, and bovine genomes; however, the four loci display some areas of low conservation and sequence inversion in regions that immediately flank these previously unannotated equine transcripts. Taken together, the evidence suggests that these four transcripts are likely to be equine-specific.

2013-07-30 | GSE46858 | GEO

NimbleGen 42M data for the HuRef individual

Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual. The HuRef genomic DNA (from lymphoblastoid cell line) was co-hybridized with the female sample NA15510 (from lymphoblastoid cell line) from the Polymorphism Discovery Resources. The NimbleGen platform consists of 20 NimbleGen HD2 chips, each containing 2.1M probes, and each chip is further subdivided into 3 equal-sized subarrays containing about 726K probes. The probes target the NCBI Build 36 genome, with the exception that no homology filter was applied, allowing coverage of segmentally duplicated regions.

2013-02-12 | E-GEOD-20289 | biostudies-arrayexpress

Membrane enriched proteome of transfected SW480 cells

Project description:This project used membrane-enriched proteomics to compare the membrane proteome of SW480 human colorectal cancer subclones that had been transfected with the full length coding sequence of the beta-6 integrin subunit under an overexpression promoter (SW480OE) against SW480 cells that had been transfected with an ‘empty’ mock vector (SW480M). Cell lysate samples were enriched for membrane proteins using a modified sodium carbonate stripping method before undergoing peptide immobilised pH gradient isoelectric focusing (IPG-IEF). Two IPG-IEF pH ranges were used to increase proteomic coverage: pH3-10 (broad range; BR) and pH3.5-4.5 (narrow range; NR). Each IPG-IEF strip was cut into 24 equal fractions and the peptides eluted. IPG-IEF of each pH range for both cell lines was performed in triplicate, resulting in four triplicate sets of 24 fractions. Data processing and bioinformatics: Spectra were identified against the *Homo sapiens* protein database (derived from Ensembl, Swiss-Prot and NCBI) using X!Tandem and the Global Proteome Machine (GPM) Tornado XE (build date 2010.12.01). GPM searches were set to tryptic peptides under GPM’s default fragment ion mass tolerance of 0.4Da and did not include semi-cleavage. Searches included single acid polymorphisms and default GPM residue modifications (oxidation,dioxidation, carbamidomethyl, deamidation, acetylation).

2013-05-30 | PXD000230 | Pride

Global Transcriptome Characterization and Assembly of thermophilic ascomycete Chaetomium thermophilum

Project description:A correct genome annotation is fundamental for research in the field of molecular and structural biology. The annotation of the reference genome of Chaetomium thermophilum has been reported previously, but it is limited to open reading frames (ORFs) of genes and contains only a few noncoding transcripts. In this study, we identified and annotated by deep RNA sequencing full-length transcripts of C.thermophilum. We identified 7044 coding genes and a large number of noncoding genes (n=4567). Astonishingly, 23% of the coding genes are alternatively spliced. We identified 679 novel coding genes and corrected the structural organization of more than 50% of the previously annotated genes. Furthermore, we substantially extended the Gene Ontology (GO) and Enzyme Commission (EC) lists, which provide comprehensive search tools for potential industrial applications and basic research. The identified novel transcripts and improved annotation will help understanding the gene regulatory landscape in C.thermophilum. The analysis pipeline developed here can be used to build transcriptome assemblies and identify coding and noncoding RNAs of other species. The new genome annotation of the GTF file can be found here.

2019-12-31 | GSE116834 | GEO

RECURRENT SETBP1 MUTATIONS IN ATYPICAL CHRONIC MYELOID LEUKEMIA

Project description:RNA-Seq analysis of atypical chronic myeloid leukemia samples We sequenced leukemic mRNA from 13 Atypical Cronic Mieloid Leukemia (aCML) samples by Illumina GAIIx. Transcriptomic profiles, differentially expressed genes and pathway enrichment analysis were obtained comparing 7 SETBP1-mutated samples and 6 non-mutated (WT) samples by using TopHat aligner and SAMMate gene expression quantifier. We focused on the gene expression profile of known coding transcripts. A dataset of 20,907 protein-coding Ensembl Genes was obtained from the RNA-Seq by using the Human Ensembl GTF annotation file vs54 dowloaded from ftp://ftp.ensembl.org/pub/release-54/gtf/homo_sapiens/.

2012-12-07 | E-GEOD-42146 | biostudies-arrayexpress

Agilent custom 244K array CGH data for the HuRef individual

Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual. CGH experiments were performed with genomic DNA extracted from the HuRef and six HapMap lymphoblastoid cell lines, hybridized against the reference NA10851. A dye-swap experiment was performed for each experiment. The custom CGH microarray contains probes that target novel sequences that are not on the NCBI reference build 35. Instead, the probes target scaffold sequences that are unique to the Celera R27C assembly.

2013-02-12 | E-GEOD-20287 | biostudies-arrayexpress

Agilent custom 24M array CGH data for the HuRef individual

Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual. Agilent array CGH experiment was performed according to the manufacturer's directions on DNA extracted from lymphoblastoid cell lines. HuRef genomic DNA was co-hybridized with female sample NA15510 from the Polymorphism Discovery Resource. No replicate nor dye swap was done. The Agilent 24 million features CGH array set was designed with 23.5 million 60-mer oligonucleotide probes tiled along the NCBI Build 36 assembly.

2013-02-12 | E-GEOD-20288 | biostudies-arrayexpress

LongSAGE analysis significantly improves genome annotation

Project description:Owing to its increased tag length, LongSAGE tags are expected to be more reliable in direct assignment to genome sequences. Therefore, we evaluated the use of LongSAGE data in genome annotation by using our LongSAGE dataset of 202 015 tags (consisting of 41 718 unique tags), experimentally generated from mouse embryonic tail libraries. RESULTS: A fraction of LongSAGE tags could not be unambiguously assigned to its gene, due to the presence of widely conserved sequences downstream of particular CATG anchor sites. The presence of alternative forms of transcripts was confirmed in 45% of all detected genes. Surprisingly, a large fraction of LongSAGE tags with hits to the genome (66%) could not be assigned to any gene annotated in EnsEMBL. Among such cases, 2098 LongSAGE tags fell into a region containing a putative gene predicted by GenScan, providing experimental evidence for the presence of real genes, while 9112 genes were found out to be left out or wrongly annotated by the EnsEMBL pipeline. CONCLUSIONS: LongSAGE transcriptome data can significantly improve the genome annotation by identifying novel genes and alternative transcripts, even in the case of thus far best-characterized organisms like the mouse. Keywords: other Owing to its increased tag length, LongSAGE tags are expected to be more reliable in direct assignment to genome sequences. Therefore, we evaluated the use of LongSAGE data in genome annotation by using our LongSAGE dataset of 202 015 tags (consisting of 41 718 unique tags), experimentally generated from mouse embryonic tail libraries.

2005-07-20 | E-GEOD-2967 | biostudies-arrayexpress

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data