Project description:Barcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Further- more, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping. This data repository contains the sequencing files associated with the droplet based scRNA-seq dataset in Griffiths et al. (2018). The data presented here should purely used for technical analysis, the biological motivation is nonetheless briefly described in the following: The mammary gland is a unique organ as it undergoes most of its development during puberty and adulthood. Characterising the hierarchy of the various mammary epithelial cells and how they are regulated in response to gestation, lactation and involution is important for understanding how breast cancer develops. Recent studies have used numerous markers to enrich, isolate and characterise the different epithelial cell compartments within the adult mammary gland. However, in all of these studies only a handful of markers were used to define and trace cell populations. Therefore, there is a need for an unbiased and comprehensive description of mammary epithelial cells within the gland at different developmental stages. To this end we used single cell RNA sequencing (scRNAseq) to determine the gene expression profile of individual mammary epithelial cells across four adult developmental stages; nulliparous, mid gestation, lactation and post weaning (full natural involution).
Project description:Barcode swapping results in the mislabelling of sequencing reads between multiplexed samples on patterned flow-cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays; however, the severity and consequences of barcode swapping remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in two plate-based single-cell RNA-sequencing datasets. We found that approximately 2.5% of reads were mislabelled between samples on the HiSeq 4000, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA-sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10x Genomics experiments, allowing the continued use of cutting-edge sequencing machines for these assays.
Project description:Here we test the impact of barcode swapping in prime-seq. To this end we isolated RNA from human iPSCs and mouse ESCs, processed them separately using prime-seq but pooled them for cDNA amplification.
Project description:Here we test the impact of over-/under-amplification on barcode swapping in prime-seq. To this end we isolated RNA from human iPSCs and mouse ESCs, processed them separately using prime-seq but pooled them for cDNA amplification.
Project description:T cells from OT-I mice were stimulated with 4 different peptide ligands for 6 hours and sorted into two 96-well plates. Cells on each plate were barcoded using a mutually exclusive 8-by-12 set of indexes, such that indexes present on plate 1 were completely absent from plate 2 and vice versa. Libraries from each plate were pooled in equimolar quantities and sequenced on an Illumina HiSeq 4000. Libraries were demultiplexed allowing for all pairs of indexes, including the expected combinations (pairs of barcodes used within each plate) and unexpected combinations (pairs containing one barcode from each plate). This was repeated after sequencing the same pool of libraries on the HiSeq 2500.
Project description:Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. We conduct chimeric experiments to evaluate the spot swapping effect in 10x Visium spatial transcriptomics protocol. We propose SpotClean to adjust for spot swapping and, in doing so, to increase the sensitivity and precision with which downstream analyses are conducted.
Project description:The detection of hypermethylation markers on cell-free DNA (cfDNA) in biological fluids is a promising and non-invasive approach for early diagnosis and monitoring of human diseases. However, it is challenging to detect hypermethylation markers in a high-throughput, sensitive, and cost-effective manner. Here we presented a multiplex 5-methylcytosine marker barcode counting (MMBC-seq) technique and reported its clinical application for cfDNA from peripheral plasma samples. We identified an MMBC cancer detection panel and developed a scoring system to differentiate cancer versus healthy controls. In a multiple-cancer case-control study, the panel achieved a sensitivity and specificity of 80.2% and 95.7% respectively (AUC 0.906, 95% CI 0.846-0.948). The results suggest that MMBC-seq has great potential to realize non-invasive, flexible and clinically applicable cancer detection.