Dataset Information

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

ABSTRACT: High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.

SUBMITTER: Tardaguila M

PROVIDER: S-EPMC5848618 | biostudies-literature | 2018 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

Tardaguila Manuel M de la Fuente Lorena L Marti Cristina C Pereira Cécile C Pardo-Palacios Francisco Jose FJ Del Risco Hector H Ferrell Marc M Mellado Maravillas M Macchietto Marissa M Verheggen Kenneth K Edelmann Mariola M Ezkurdia Iakes I Vazquez Jesus J Tress Michael M Mortazavi Ali A Martens Lennart L Rodriguez-Navarro Susana S Moreno-Manzano Victoria V Conesa Ana A

Genome research 20180301 3

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. W ...[more]

PMID: 29440222

Dataset Information

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

Publications

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages.
| S-EPMC8891264 | biostudies-literature

Time-Course Transcriptome Profiling of a Poxvirus Using Long-Read Full-Length Assay.
| S-EPMC8398953 | biostudies-literature

Long-read RNA-Seq analysis of the full-length F8 transcript in iPSCs
2023-04-24 | GSE229621 | GEO

Single-molecule long-read sequencing of the full-length transcriptome of Rhododendron lapponicum L.
| S-EPMC7174332 | biostudies-literature

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing.
| S-EPMC6214230 | biostudies-literature

High-throughput manual-quality annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing (CLS)
2017-01-20 | GSE93848 | GEO

Long read reference genome-free reconstruction of a full-length transcriptome from <i>Astragalus membranaceus</i> reveals transcript variants involved in bioactive compound biosynthesis.
| S-EPMC5573880 | biostudies-literature

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.
| S-EPMC5737654 | biostudies-literature

Full-Length Transcriptome Analysis of <i>Plasmodium falciparum</i> by Single-Molecule Long-Read Sequencing.
| S-EPMC7942025 | biostudies-literature

Full Length Transcriptome Highlights the Coordination of Plastid Transcript Processing.
| S-EPMC8537030 | biostudies-literature