Dataset Information

Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

ABSTRACT: High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

SUBMITTER: Chen M

PROVIDER: S-EPMC4676012 | biostudies-literature | 2015 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

Chen Meili M Hu Yibo Y Liu Jingxing J Wu Qi Q Zhang Chenglin C Yu Jun J Xiao Jingfa J Wei Fuwen F Wu Jiayan J

Scientific reports 20151211

High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined refer ...[more]

PMID: 26658305

Dataset Information

Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

Publications

Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Full Genome Sequence of Giant Panda Rotavirus Strain CH-1.
| S-EPMC3587948 | biostudies-literature

Proteome validation for novel full-length genes in the giant panda
2015-12-23 | PXD002917 | Pride

Full-length transcriptome assembly from RNA-Seq data without a reference genome.
| S-EPMC3571712 | biostudies-literature

The sequence and de novo assembly of the giant panda genome.
| S-EPMC3951497 | biostudies-literature

A chromosome-level reference genome assembly and a full-length transcriptome assembly of the giant freshwater prawn (Macrobrachium rosenbergii).
| S-EPMC11373640 | biostudies-literature

Chromosome-level genome assembly for giant panda provides novel insights into Carnivora chromosome evolution.
| S-EPMC6898958 | biostudies-literature

Synthesis and assembly of full-length cyanophage A-4L genome.
| S-EPMC9803696 | biostudies-literature

Assembly of full-length macaque gene models
2021-06-01 | GSE132413 | GEO

An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs.
| S-EPMC3213133 | biostudies-literature

Cloud accelerated alignment and assembly of full-length single-cell RNA-seq data using Falco.
| S-EPMC6936136 | biostudies-literature