Project description:Read counting and unique molecular identifier (UMI) counting are the principal gene expression quantification schemes used in single-cell RNA-sequencing (scRNA-seq) analysis. By using multiple scRNA-seq datasets, we reveal distinct distribution differences between these schemes and conclude that the negative binomial model is a good approximation for UMI counts, even in heterogeneous populations. We further propose a novel differential expression analysis algorithm based on a negative binomial model with independent dispersions in each group (NBID). Our results show that this properly controls the FDR and achieves better power for UMI counts when compared to other recently developed packages for scRNA-seq analysis.
Project description:Single cell RNA-seq of the human alveolar rhabdomyosarcoma cell line Rh41. We also inlcude a bulk RNA-seq study of unsorted and sorted cells using CD44 as a marker
Project description:Purpose: To analyze the sensitivity and specificity of the AmpFISH method, we sequenced the NIH3T3 cell line via UMI-RNAseq experiments. Methods:NIH3T3 cells were grown in DMEM (Dulbecco’s Modified Eagle Medium, Gibco) supplemented with 10% FBS (Fetal Bovine Serum, Sigma), 50U/ml Penicillin and 50 mg/ml streptomycin (Gibco,cat.no.15070) at 37℃ with 5% CO2. Cells were treated with 0.25% trypsin solution (HyClone, No.SH42605.01) when they reached ~106 cells/ml. Then, the cells were washed with 1X PBS, and then mixed with 1ml TRIzol solution (ThermoFish, No.15596029), and snap-frozen with dry ice. Total RNA was qualitatively and quantitatively evaluated as follows: (1) the RNA sample was initially qualitatively evaluated using 1% agarose gel electrophoresis for possible contamination and degradation; (2) RNA purity and concentration were then examined using NanoPhotometer spectrophotometer; (3) RNA integrity and quantity were finally measured using RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system. After library preparation and pooling of different samples, the samples were subjected to Illumina sequencing. The libraries were sequenced using the Illumina NovaSeq 6000 Platform for 6G raw data and generated 150nt pair-end reads. UMI sequences on each read were identified by UMI-tools (1.0.0), and reads with UMIs were used for the subsequent analysis. To identify the duplicated reads, UMIs were initially removed from the UMI reads, and the remaining parts of each read were mapped to the reference genome using Hisat2. Reads that mapped to the same location on the reference genome were identified as duplicated reads. Then, the UMIs on each read were recalled, and the duplicated reads with the same UMI were identified as non-natural duplications, which were subsequently removed from the processed data. HTSeq v0.6.1 was used to count the read numbers mapped to each gene. Then, the FPKM of each gene was calculated based on the length of the gene, and the read count was mapped to the gene. Conclusions:AmpFISH provides convenient and versatile tools for sensitive RNA/DNA detection and to gain useful information on cellular molecules using simple workflows
Project description:Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate 11 methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using a batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except that of SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite an occasionally inflated FDR (up to 0.2). Finally we make the following recommendations for scRNA-seq DE analysis: 1) incorporate known batch variables instead of using batch-corrected data; and 2) employ SVA for latent batch correction. However, better methods are still needed to fully unleash the power of scRNA-seq.
Project description:We sequenced single cells coming from three developmental stages of chicken forelimb. We identified different cell populations with distinct transcriptional profiles. The supplementary file contains processed UMI count matrices, which also include meta data of each cell, e.g. cluster.
Project description:Comparison of single cell RNAseq and single nucleus RNAseq on four healthy human liver caudate lobes, with cell-types validated using one slice of a fifth healthy human for VISIUM Spatial Transcriptomics. Raw UMI count tables can be found here: https://figshare.com/projects/Human_Liver_SC_vs_SN_paper/98981 Processed Seurat Objects can be found here: https://www.dropbox.com/sh/sso15ehqmrrh6mk/AACKHOsSlZW0_Zy9cbCkOmMfa?dl=0
Project description:We developed a targeted chromosome conformation capture (4C) approach that uses unique molecular identifiers (UMI) to derive high complexity quantitative chromosome contact profiles with controlled signal to noise ratios. We demonstrate that the method improves the sensitivity and specificity for detection of long-range chromosomal interactions, and that it allows the design of interaction screens with predictable statistical power. UMI-4C robustly quantifies contact intensity changes between cell types and conditions, opening the way toward incorporation of long-range interactions in quantitative models of gene regulation. We constructed UMI-4C profiles of 13 different genomic loci (viewpoints) in five different cell lines, in order to study the 3D chromatin contact maps of these selected loci. The coordinates for these viewpoints are: G1p1 chrX:48646542; baitG1_3_5kb chrX:48641393; bait_50kb chrX:48595987; bait_165kb chrX:48476525; ANK1 chr8:41654693; hbb_3HS chr11:5221346; hbb_HBB chr11:5248714; hbb_HBBP1_G1 chr11:5266532; HBB_HBE chr11:5292159; HBB_HS2 chr11:5301345; HBB_HS3 chr11:5306690; HBB_HS5 chr11:5313539; HBB_HBD chr11:5256597