Dataset Information

Bartender: a fast and accurate clustering algorithm to count barcode reads.

ABSTRACT:

Motivation

Barcode sequencing (bar-seq) is a high-throughput, and cost effective method to assay large numbers of cell lineages or genotypes in complex cell pools. Because of its advantages, applications for bar-seq are quickly growing-from using neutral random barcodes to study the evolution of microbes or cancer, to using pseudo-barcodes, such as shRNAs or sgRNAs to simultaneously screen large numbers of cell perturbations. However, the computational pipelines for bar-seq clustering are not well developed. Available methods often yield a high frequency of under-clustering artifacts that result in spurious barcodes, or over-clustering artifacts that group distinct barcodes together. Here, we developed Bartender, an accurate clustering algorithm to detect barcodes and their abundances from raw next-generation sequencing data.

Results

In contrast with existing methods that cluster based on sequence similarity alone, Bartender uses a modified two-sample proportion test that also considers cluster size. This modification results in higher accuracy and lower rates of under- and over-clustering artifacts. Additionally, Bartender includes unique molecular identifier handling and a 'multiple time point' mode that matches barcode clusters between different clustering runs for seamless handling of time course data. Bartender is a set of simple-to-use command line tools that can be performed on a laptop at comparable run times to existing methods.

Availability and implementation

Bartender is available at no charge for non-commercial use at https://github.com/LaoZZZZZ/bartender-1.1.

Contact

sasha.levy@stonybrook.edu or song.wu@stonybrook.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Zhao L

PROVIDER: S-EPMC6049041 | biostudies-literature | 2018 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Bartender: a fast and accurate clustering algorithm to count barcode reads.

Zhao Lu L Liu Zhimin Z Levy Sasha F SF Wu Song S

Bioinformatics (Oxford, England) 20180301 5

<h4>Motivation</h4>Barcode sequencing (bar-seq) is a high-throughput, and cost effective method to assay large numbers of cell lineages or genotypes in complex cell pools. Because of its advantages, applications for bar-seq are quickly growing-from using neutral random barcodes to study the evolution of microbes or cancer, to using pseudo-barcodes, such as shRNAs or sgRNAs to simultaneously screen large numbers of cell perturbations. However, the computational pipelines for bar-seq clustering ar ...[more]

PMID: 29069318

Dataset Information

Bartender: a fast and accurate clustering algorithm to count barcode reads.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

Bartender: a fast and accurate clustering algorithm to count barcode reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

IterCluster: a barcode clustering algorithm for long fragment read analysis.
| S-EPMC7100596 | biostudies-literature

ASElux: an ultra-fast and accurate allelic reads counter.
| S-EPMC5905663 | biostudies-literature

NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads.
| S-EPMC4403973 | biostudies-literature

SpoTyping: fast and accurate in silico Mycobacterium spoligotyping from sequence reads.
| S-EPMC4756441 | biostudies-literature

Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
| S-EPMC3424124 | biostudies-literature

Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA.
| S-EPMC8127778 | biostudies-literature

Fast and accurate de novo genome assembly from long uncorrected reads.
| S-EPMC5411768 | biostudies-literature

FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads.
| S-EPMC3367211 | biostudies-literature

SPICi: a fast clustering algorithm for large biological networks.
| S-EPMC2853685 | biostudies-literature

FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.
| S-EPMC4474961 | biostudies-literature