Browse
Submit Data
Databases
API
Help

Dataset Information

3 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer.

ABSTRACT: Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as "demultiplexing". However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Results: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5' and 3' ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Conclusions: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github.

SUBMITTER: Wilkins OG

PROVIDER: S-EPMC8287537 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Ultraplex: An ultra-fast, flexible, all-in-one fastq demultiplexer

Project description:An Illumina sequencing lane for testing our demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than twenty minutes.

2021-05-06 | E-MTAB-10349 | biostudies-arrayexpress

Ultraplex: An ultra-fast, flexible, all-in-one fastq demultiplexer

Project description:Ultraplex: An ultra-fast, flexible, all-in-one fastq demultiplexer

| PRJEB44738 | ENA

Molecular demultiplexer as a terminator automaton.

Project description:Molecular logic gates are expected to play an important role on the way to information processing therapeutic agents, especially considering the wide variety of physical and chemical responses that they can elicit in response to the inputs applied. Here, we show that a 1:2 demultiplexer based on a Zn2+-terpyridine-Bodipy conjugate with a quenched fluorescent emission, is efficient in photosensitized singlet oxygen generation as inferred from trap compound experiments and cell culture data. However, once the singlet oxygen generated by photosensitization triggers apoptotic response, the Zn2+ complex then interacts with the exposed phosphatidylserine lipids in the external leaflet of the membrane bilayer, autonomously switching off singlet oxygen generation, and simultaneously switching on a bright emission response. This is the confirmatory signal of the cancer cell death by the action of molecular automaton and the confinement of unintended damage by excessive singlet oxygen production.

| S-EPMC5824880 | biostudies-literature

Rapid Stencil Mask Fabrication Enabled One-Step Polymer-Free Graphene Patterning and Direct Transfer for Flexible Graphene Devices.

Project description:We report a one-step polymer-free approach to patterning graphene using a stencil mask and oxygen plasma reactive-ion etching, with a subsequent polymer-free direct transfer for flexible graphene devices. Our stencil mask is fabricated via a subtractive, laser cutting manufacturing technique, followed by lamination of stencil mask onto graphene grown on Cu foil for patterning. Subsequently, micro-sized graphene features of various shapes are patterned via reactive-ion etching. The integrity of our graphene after patterning is confirmed by Raman spectroscopy. We further demonstrate the rapid prototyping capability of a stretchable, crumpled graphene strain sensor and patterned graphene condensation channels for potential applications in sensing and heat transfer, respectively. We further demonstrate that the polymer-free approach for both patterning and transfer to flexible substrates allows the realization of cleaner graphene features as confirmed by water contact angle measurements. We believe that our new method promotes rapid, facile fabrication of cleaner graphene devices, and can be extended to other two dimensional materials in the future.

| S-EPMC4846816 | biostudies-literature

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

Project description:FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

| S-EPMC2847217 | biostudies-literature

MZPAQ: a FASTQ data compression tool.

Project description:BackgroundDue to the technological progress in Next Generation Sequencing (NGS), the amount of genomic data that is produced daily has seen a tremendous increase. This increase has shifted the bottleneck of genomic projects from sequencing to computation and specifically storing, managing and analyzing the large amount of NGS data. Compression tools can reduce the physical storage used to save large amount of genomic data as well as the bandwidth used to transfer this data. Recently, DNA sequence compression has gained much attention among researchers.ResultsIn this paper, we study different techniques and algorithms used to compress genomic data. Most of these techniques take advantage of some properties that are unique to DNA sequences in order to improve the compression rate, and usually perform better than general-purpose compressors. By exploring the performance of available algorithms, we produce a powerful compression tool for NGS data called MZPAQ. Results show that MZPAQ outperforms state-of-the-art tools on all benchmark datasets obtained from a recent survey in terms of compression ratio. MZPAQ offers the best compression ratios regardless of the sequencing platform or the size of the data.ConclusionsCurrently, MZPAQ's strength is its higher compression ratio as well as its compatibility with all major sequencing platforms. MZPAQ is more suitable when the size of compressed data is crucial, such as long-term storage and data transfer. More efforts will be made in the future to target other aspects such as compression speed and memory utilization.

| S-EPMC6547476 | biostudies-literature

DigestiFlow: from BCL to FASTQ with ease

Project description:Abstract Summary Management of raw-sequencing data and its pre-processing (conversion into sequences and demultiplexing) remains a challenging topic for groups running sequencing devices. They face many challenges in such efforts and solutions ranging from manual management of spreadsheets to very complex and customized laboratory information management systems handling much more than just sequencing raw data. In this article, we describe the software package DigestiFlow that focuses on the management of Illumina flow cell sample sheets and raw data. It allows for automated extraction of information from flow cell data and management of sample sheets. Furthermore, it allows for the automated and reproducible conversion of Illumina base calls to sequences and the demultiplexing thereof using bcl2fastq and Picard Tools, followed by quality control report generation. Availability and implementation The software is available under the MIT license at https://github.com/bihealth/digestiflow-server. The client software components are available via Bioconda. Supplementary information Supplementary data are available at Bioinformatics online.

| S-EPMC7703778 | biostudies-literature

Frequency-division multiplexer and demultiplexer for terahertz wireless links.

Project description:The development of components for terahertz wireless communications networks has become an active and growing research field. However, in most cases these components have been studied using a continuous or broadband-pulsed terahertz source, not using a modulated data stream. This limitation may mask important aspects of the performance of the device in a realistic system configuration. We report the characterization of one such device, a frequency multiplexer, using modulated data at rates up to 10 gigabits per second. We also demonstrate simultaneous error-free transmission of two signals at different carrier frequencies, with an aggregate data rate of 50 gigabits per second. We observe that the far-field spatial variation of the bit error rate is different from that of the emitted power, due to a small nonuniformity in the angular detection sensitivity. This is likely to be a common feature of any terahertz communication system in which signals propagate as diffracting beams not omnidirectional broadcasts.There is growing interest in the development of components to facilitate wireless communications in the terahertz but the characterization of these systems involve an unmodulated input. Here the authors demonstrate multiplexing and demultiplexing of data streams in the terahertz range using a real data link.

| S-EPMC5620079 | biostudies-literature

One-step optogenetics with multifunctional flexible polymer fibers.

Project description:Optogenetic interrogation of neural pathways relies on delivery of light-sensitive opsins into tissue and subsequent optical illumination and electrical recording from the regions of interest. Despite the recent development of multifunctional neural probes, integration of these modalities in a single biocompatible platform remains a challenge. We developed a device composed of an optical waveguide, six electrodes and two microfluidic channels produced via fiber drawing. Our probes facilitated injections of viral vectors carrying opsin genes while providing collocated neural recording and optical stimulation. The miniature (<200 μm) footprint and modest weight (<0.5 g) of these probes allowed for multiple implantations into the mouse brain, which enabled opto-electrophysiological investigation of projections from the basolateral amygdala to the medial prefrontal cortex and ventral hippocampus during behavioral experiments. Fabricated solely from polymers and polymer composites, these flexible probes minimized tissue response to achieve chronic multimodal interrogation of brain circuits with high fidelity.

| S-EPMC5374019 | biostudies-literature

Light-weight reference-based compression of FASTQ data.

Project description:BackgroundThe exponential growth of next generation sequencing (NGS) data has posed big challenges to data storage, management and archive. Data compression is one of the effective solutions, where reference-based compression strategies can typically achieve superior compression ratios compared to the ones not relying on any reference.ResultsThis paper presents a lossless light-weight reference-based compression algorithm namely LW-FQZip to compress FASTQ data. The three components of any given input, i.e., metadata, short reads and quality score strings, are first parsed into three data streams in which the redundancy information are identified and eliminated independently. Particularly, well-designed incremental and run-length-limited encoding schemes are utilized to compress the metadata and quality score streams, respectively. To handle the short reads, LW-FQZip uses a novel light-weight mapping model to fast map them against external reference sequence(s) and produce concise alignment results for storage. The three processed data streams are then packed together with some general purpose compression algorithms like LZMA. LW-FQZip was evaluated on eight real-world NGS data sets and achieved compression ratios in the range of 0.111-0.201. This is comparable or superior to other state-of-the-art lossless NGS data compression algorithms.ConclusionsLW-FQZip is a program that enables efficient lossless FASTQ data compression. It contributes to the state of art applications for NGS data storage and transmission. LW-FQZip is freely available online at: http://csse.szu.edu.cn/staff/zhuzx/LWFQZip.

| S-EPMC4459677 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data