Dataset Information

CAARS: comparative assembly and annotation of RNA-Seq data.

ABSTRACT: MOTIVATION:RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS:We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION:CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Rey C

PROVIDER: S-EPMC6596894 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

CAARS: comparative assembly and annotation of RNA-Seq data.

Rey Carine C Veber Philippe P Boussau Bastien B Sémon Marie M

Bioinformatics (Oxford, England) 20190701 13

<h4>Motivation</h4>RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction.<h4>Results</h4>We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene ...[more]

PMID: 30452539

Dataset Information

CAARS: comparative assembly and annotation of RNA-Seq data.

Publications

CAARS: comparative assembly and annotation of RNA-Seq data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.
| S-EPMC3287467 | biostudies-literature

Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data.
| S-EPMC10498003 | biostudies-literature

Improving the Annotation of Arabidopsis lyrata Using RNA-Seq Data.
| S-EPMC4575116 | biostudies-literature

Transcript annotation of Chinese sturgeon (Acipenser sinensis) using Iso-seq and RNA-seq data.
| S-EPMC9950146 | biostudies-literature

SCSA: A Cell Type Annotation Tool for Single-Cell RNA-seq Data.
| S-EPMC7235421 | biostudies-literature

Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data.
| S-EPMC6505119 | biostudies-literature

Transcriptome assembly and quantification from Ion Torrent RNA-Seq data.
| S-EPMC4120146 | biostudies-literature

De novo assembly of bacterial transcriptomes from RNA-seq data.
| S-EPMC4316799 | biostudies-literature

Accurate assembly of multi-end RNA-seq data with Scallop2.
| S-EPMC9879047 | biostudies-literature

Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data.
| S-EPMC8602772 | biostudies-literature