Single Nucleotide Polymorphism (SNP) and Antibody-based Cell Sorting (SNACS): A tool for demultiplexing single-cell DNA sequencing data
Ontology highlight
ABSTRACT: Motivation Recently, single-cell DNA sequencing (scDNA-seq) and multi-modal profiling with the addition of cell-surface antibodies (scDAb-seq) have provided key insights into cancer heterogeneity. Scaling these technologies across large patient cohorts, however, is frequently cost and time prohibitive. Multiplexing, in which cells from unique patients and pooled into a single experiment, offers a possible solution. While multiplexing methods are available for scRNAseq, accurate demultiplexing in scDNAseq remains an unmet need. Results Here, we introduce SNACS: Single-Nucleotide Polymorphism (SNP) and Antibody-based Cell Sorting. SNACS relies on a combination of patient-level cell-surface identifiers and natural variation in genetic polymorphisms to demultiplex scDNAseq data. We demonstrated the performance of SNACS on a dataset consisting of multi-sample experiments from patients with leukemia where we knew truth from single-sample experiments from the same patients. Relative to demultiplexing methods derived from the scRNAseq literature, SNACS offered superior accuracy. Availability Implementation SNACS is available at https://github.com/olshena/SNACS.
Project description:Single cell RNA sequencing has enabled unprecedented insights into the molecular cues and cellular heterogeneity underlying human disease. However, the high costs and complexity of single cell methods remain a major obstacle for generating large scale human cohorts. Here we compare current state-of-the-art single cell multiplexing technologies, and provide a new widely applicable demultiplexing method, SNP-Fishing, that enables simple, robust high-throughput multiplexing leveraging genetic variability of patients.
Project description:Single cell RNA sequencing has enabled unprecedented insights into the molecular cues and cellular heterogeneity underlying human disease. However, the high costs and complexity of single cell methods remain a major obstacle for generating large scale human cohorts. Here we compare current state-of-the-art single cell multiplexing technologies, and provide a new widely applicable demultiplexing method, SNP-Fishing, that enables simple, robust high-throughput multiplexing leveraging genetic variability of patients.
Project description:Barcode-based multiplexing methods can be used to increase throughput and reduce batch effects in large single-cell genomics studies. To evaluate methods for demultiplexing barcode-multiplexed data, we generated a dataset by labeling samples separately with barcode-tagged antibodies, mixing those samples, and progressively overloading a droplet-based scRNA-seq system.
Project description:We first validated the robustness and high throughput capability of the GTO protocol in multiple cell lines. The key step to success of GTO is determining the optimum number of cycles for first amplification of cDNA. This step needs to be result in a balance between cDNA and gDNA reads so that the gene expression profiles generated are good while not comprimising on the CNA profile quality. This validation was done in SKBR3, A375 and BT549 cell lines. They were then compared to 315A control diploid cell line also processed by the GTO method. The scRNAseq and scDNAseq data generated by GTO was alsocompared to bulk RNAseq and DNAseq data generated in triplicates for each cell line by traditional methods. After the method was tested in cell lines, we next applied the method to mouse models to understand tumor biology. The mouse model we used here is an orthotopic transplantation model of pancreatic cancer in syngenic mice. EGFP positive KPC 1199 tumor cells were injected into the pancreas of adult wild-type mice to generate primary tumors in pancreas and metastases in liver, spleen, peritoneum and kidneys. These primary tumors and mets were then extracted and dissociated into single cell suspensions. The single cell suspensions were analyzed by flow cytometry with antibodies against CD45, Epcam and for endogeneous GFP to isolate single stromal and tumor cells. These cells were then processed by GTP protocol to analyze gDNA by CNV profiles and RNA through read counts from the same cell. To validate our data and also have controls, we generated scRNAseq gene expression data and scDNAseq CNA profiles from the 2D cells at the same passage number at which they were injected into the mice. The scRNAseq data was generated using Fluidigm C1 machine on theie medium IFC for mRNA seq according to user manual instructions. The scDNAseq data for CNA profiles was generated by sorting single nuclei into PCR plates and then using SeqXE kit from Sigma Aldrich for Whole genome amplification using their instructions again. The data analysis was done similar to scRNAseq or scDNAseq data from GTO.
Project description:We first validated the robustness and high throughput capability of the GTO protocol in multiple cell lines. The key step to success of GTO is determining the optimum number of cycles for first amplification of cDNA. This step needs to be result in a balance between cDNA and gDNA reads so that the gene expression profiles generated are good while not comprimising on the CNA profile quality. This validation was done in SKBR3, A375 and BT549 cell lines. They were then compared to 315A control diploid cell line also processed by the GTO method. The scRNAseq and scDNAseq data generated by GTO was alsocompared to bulk RNAseq and DNAseq data generated in triplicates for each cell line by traditional methods. After the method was tested in cell lines, we next applied the method to mouse models to understand tumor biology. The mouse model we used here is an orthotopic transplantation model of pancreatic cancer in syngenic mice. EGFP positive KPC 1199 tumor cells were injected into the pancreas of adult wild-type mice to generate primary tumors in pancreas and metastases in liver, spleen, peritoneum and kidneys. These primary tumors and mets were then extracted and dissociated into single cell suspensions. The single cell suspensions were analyzed by flow cytometry with antibodies against CD45, Epcam and for endogeneous GFP to isolate single stromal and tumor cells. These cells were then processed by GTP protocol to analyze gDNA by CNV profiles and RNA through read counts from the same cell. To validate our data and also have controls, we generated scRNAseq gene expression data and scDNAseq CNA profiles from the 2D cells at the same passage number at which they were injected into the mice. The scRNAseq data was generated using Fluidigm C1 machine on theie medium IFC for mRNA seq according to user manual instructions. The scDNAseq data for CNA profiles was generated by sorting single nuclei into PCR plates and then using SeqXE kit from Sigma Aldrich for Whole genome amplification using their instructions again. The data analysis was done similar to scRNAseq or scDNAseq data from GTO.
Project description:We first validated the robustness and high throughput capability of the GTO protocol in multiple cell lines. The key step to success of GTO is determining the optimum number of cycles for first amplification of cDNA. This step needs to be result in a balance between cDNA and gDNA reads so that the gene expression profiles generated are good while not comprimising on the CNA profile quality. This validation was done in SKBR3, A375 and BT549 cell lines. They were then compared to 315A control diploid cell line also processed by the GTO method. The scRNAseq and scDNAseq data generated by GTO was alsocompared to bulk RNAseq and DNAseq data generated in triplicates for each cell line by traditional methods. After the method was tested in cell lines, we next applied the method to mouse models to understand tumor biology. The mouse model we used here is an orthotopic transplantation model of pancreatic cancer in syngenic mice. EGFP positive KPC 1199 tumor cells were injected into the pancreas of adult wild-type mice to generate primary tumors in pancreas and metastases in liver, spleen, peritoneum and kidneys. These primary tumors and mets were then extracted and dissociated into single cell suspensions. The single cell suspensions were analyzed by flow cytometry with antibodies against CD45, Epcam and for endogeneous GFP to isolate single stromal and tumor cells. These cells were then processed by GTP protocol to analyze gDNA by CNV profiles and RNA through read counts from the same cell. To validate our data and also have controls, we generated scRNAseq gene expression data and scDNAseq CNA profiles from the 2D cells at the same passage number at which they were injected into the mice. The scRNAseq data was generated using Fluidigm C1 machine on theie medium IFC for mRNA seq according to user manual instructions. The scDNAseq data for CNA profiles was generated by sorting single nuclei into PCR plates and then using SeqXE kit from Sigma Aldrich for Whole genome amplification using their instructions again. The data analysis was done similar to scRNAseq or scDNAseq data from GTO.
Project description:We first validated the robustness and high throughput capability of the GTO protocol in multiple cell lines. The key step to success of GTO is determining the optimum number of cycles for first amplification of cDNA. This step needs to be result in a balance between cDNA and gDNA reads so that the gene expression profiles generated are good while not comprimising on the CNA profile quality. This validation was done in SKBR3, A375 and BT549 cell lines. They were then compared to 315A control diploid cell line also processed by the GTO method. The scRNAseq and scDNAseq data generated by GTO was alsocompared to bulk RNAseq and DNAseq data generated in triplicates for each cell line by traditional methods. After the method was tested in cell lines, we next applied the method to mouse models to understand tumor biology. The mouse model we used here is an orthotopic transplantation model of pancreatic cancer in syngenic mice. EGFP positive KPC 1199 tumor cells were injected into the pancreas of adult wild-type mice to generate primary tumors in pancreas and metastases in liver, spleen, peritoneum and kidneys. These primary tumors and mets were then extracted and dissociated into single cell suspensions. The single cell suspensions were analyzed by flow cytometry with antibodies against CD45, Epcam and for endogeneous GFP to isolate single stromal and tumor cells. These cells were then processed by GTP protocol to analyze gDNA by CNV profiles and RNA through read counts from the same cell. To validate our data and also have controls, we generated scRNAseq gene expression data and scDNAseq CNA profiles from the 2D cells at the same passage number at which they were injected into the mice. The scRNAseq data was generated using Fluidigm C1 machine on theie medium IFC for mRNA seq according to user manual instructions. The scDNAseq data for CNA profiles was generated by sorting single nuclei into PCR plates and then using SeqXE kit from Sigma Aldrich for Whole genome amplification using their instructions again. The data analysis was done similar to scRNAseq or scDNAseq data from GTO.
Project description:We first validated the robustness and high throughput capability of the GTO protocol in multiple cell lines. The key step to success of GTO is determining the optimum number of cycles for first amplification of cDNA. This step needs to be result in a balance between cDNA and gDNA reads so that the gene expression profiles generated are good while not comprimising on the CNA profile quality. This validation was done in SKBR3, A375 and BT549 cell lines. They were then compared to 315A control diploid cell line also processed by the GTO method. The scRNAseq and scDNAseq data generated by GTO was alsocompared to bulk RNAseq and DNAseq data generated in triplicates for each cell line by traditional methods. After the method was tested in cell lines, we next applied the method to mouse models to understand tumor biology. The mouse model we used here is an orthotopic transplantation model of pancreatic cancer in syngenic mice. EGFP positive KPC 1199 tumor cells were injected into the pancreas of adult wild-type mice to generate primary tumors in pancreas and metastases in liver, spleen, peritoneum and kidneys. These primary tumors and mets were then extracted and dissociated into single cell suspensions. The single cell suspensions were analyzed by flow cytometry with antibodies against CD45, Epcam and for endogeneous GFP to isolate single stromal and tumor cells. These cells were then processed by GTP protocol to analyze gDNA by CNV profiles and RNA through read counts from the same cell. To validate our data and also have controls, we generated scRNAseq gene expression data and scDNAseq CNA profiles from the 2D cells at the same passage number at which they were injected into the mice. The scRNAseq data was generated using Fluidigm C1 machine on theie medium IFC for mRNA seq according to user manual instructions. The scDNAseq data for CNA profiles was generated by sorting single nuclei into PCR plates and then using SeqXE kit from Sigma Aldrich for Whole genome amplification using their instructions again. The data analysis was done similar to scRNAseq or scDNAseq data from GTO.
Project description:Single cell RNA sequencing (scRNAseq) has emerged as an essential technique in biology. Given the complexity and cost of experiments it is critical that they are well planned and executed. An essential component to scRNAseq is the ability to have biological replicates. This adds to the potential cost and complexity of the experiment, but is essential as it has been shown that false discoveries are possible when lacking information pertaining to cell origin. One option to overcome the financial and experimental constraints of biological replicates in scRNAseq data is to pool samples. Upon pooling it is essential to then understand the sample origin of each cell. Experiments in humans have shown that when pooling multiple genetically distinct individuals into one sample the genetic diversity (i.e., single nucleotide polymorphisms (SNP)) can be used to assign cells to their sample origin. This approach is called SNP-based demultiplexing. The question of whether lab species, like Pleurodeles waltl, and various other non-traditional model species harbor enough genetic diversity to enable such approaches is unclear. To formally test the whether SNP-based demultiplexing is possible across various species we designed this experiment in which three spleens from three different Pleurodeles waltl from animals expressing unique fluorescent genes (one female tgTol2(CAG:Nucbow CAG:Cytbow)Simon (5.67g weight, 10.8cm snout-to-tail length), one male tgSceI(CAG:loxP-GFP-loxP-Cherry)Simon (5.36g, 11.1cm), and one female tgTol2(CAG:loxP-Cherry-loxP-H2B::YFP)Simon (6.25g, 10.8cm)) were collected. Spleens were filtered through 70um nylon filters followed by FACS, gating on fluorescent positive cells and excluding erythrocytes. After FACS cells were pooled into one tube and subjected to 10x Genomics 3' v3.1 scRNAseq aiming for 10,000 cells. Spleens were filtered through 70um nylon filters followed by FACS gating on fluorescent positive cells and excluding erythrocytes. Samples were prepared and sequenced according to 10x Genomics recommendations. After sequencing, using reads mapping to fluorescent genes to identify animal origin we then compared the ability of SNP-based demultiplexing to properly assign cells to the correct animal.
Project description:Cell atlas projects and high-throughput perturbation screens require single-cell sequencing at a scale that is challenging with current technology. To enable cost-effective single-cell sequencing for millions of individual cells, we developed “single-cell combinatorial fluidic indexing” (scifi). The scifi-RNA-seq assay combines one-step combinatorial pre-indexing of entire transcriptomes inside permeabilized cells with subsequent single-cell RNA-seq using microfluidics. Pre-indexing allows us to load multiple cells per droplet and bioinformatically demultiplex their individual expression profiles. Thereby, scifi-RNA-seq massively increases the throughput of droplet-based single-cell RNA-seq, and it provides a straightforward way of multiplexing thousands of samples in a single experiment. Compared to multi-round combinatorial indexing, scifi-RNA-seq provides an easier, faster, and more efficient workflow. In contrast to cell hashing methods, which flag and discard droplets containing more than one cell, scifi-RNA-seq resolves and retains individual transcriptomes from overloaded droplets.