Dataset Information

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

ABSTRACT: Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. "provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month". The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.

SUBMITTER: Ramirez-Gonzalez RH

PROVIDER: S-EPMC3938176 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

Ramirez-Gonzalez Ricardo H RH Leggett Richard M RM Waite Darren D Thanki Anil A Drou Nizar N Caccamo Mario M Davey Robert R

F1000Research 20131115

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individ ...[more]

PMID: 24627795

Dataset Information

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

Publications

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Next-generation sequencing in understanding complex neurological disease.
| S-EPMC3836167 | biostudies-literature

Next Generation Protein Sequencing
2017-04-03 | PXD003804 | Pride

Peptide Synthesis on a Next-Generation DNA Sequencing Platform.
| S-EPMC5183537 | biostudies-literature

Determining Performance Metrics for Targeted Next-Generation Sequencing Panels Using Reference Materials.
| S-EPMC6172655 | biostudies-literature

miREvo: an integrative microRNA evolutionary analysis platform for next-generation sequencing experiments.
| S-EPMC3410788 | biostudies-other

A microfluidic DNA library preparation platform for next-generation sequencing.
| S-EPMC3718812 | biostudies-literature

PEGR: a management platform for ChIP-based next generation sequencing pipelines.
| S-EPMC9161112 | biostudies-literature

Understanding primary aldosteronism: impact of next generation sequencing and expression profiling.
| S-EPMC4285708 | biostudies-literature

SeqVItA: Sequence Variant Identification and Annotation Platform for Next Generation Sequencing Data.
| S-EPMC6247818 | biostudies-literature

Clinical analysis of genome next-generation sequencing data using the Omicia platform.
| S-EPMC3828661 | biostudies-literature