Project description:The integration of lineage tracing with scRNA-seq has transformed our understanding of gene expression dynamics during development, regeneration, and disease. However, lineage tracing is technically demanding and most existing scRNA-seq datasets are devoid of lineage information. By analyzing our own (mouse embryonic stem cells;mESCs) and public lineage-annotated scRNA-seq datastes, we could identify and characterize genes displaying conserved expression levels over cell divisions in multiple cell types. This resulted in the development of Gene Expression Memory-based Lineage Inference (GEMLI), a computational pipeline allowing to predict cell lineages over several cell divisions solely from scRNA-seq datasets.
Project description:The detection of hypermethylation markers on cell-free DNA (cfDNA) in biological fluids is a promising and non-invasive approach for early diagnosis and monitoring of human diseases. However, it is challenging to detect hypermethylation markers in a high-throughput, sensitive, and cost-effective manner. Here we presented a multiplex 5-methylcytosine marker barcode counting (MMBC-seq) technique and reported its clinical application for cfDNA from peripheral plasma samples. We identified an MMBC cancer detection panel and developed a scoring system to differentiate cancer versus healthy controls. In a multiple-cancer case-control study, the panel achieved a sensitivity and specificity of 80.2% and 95.7% respectively (AUC 0.906, 95% CI 0.846-0.948). The results suggest that MMBC-seq has great potential to realize non-invasive, flexible and clinically applicable cancer detection.
Project description:Barcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Further- more, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping. This data repository contains the sequencing files associated with the droplet based scRNA-seq dataset in Griffiths et al. (2018). The data presented here should purely used for technical analysis, the biological motivation is nonetheless briefly described in the following: The mammary gland is a unique organ as it undergoes most of its development during puberty and adulthood. Characterising the hierarchy of the various mammary epithelial cells and how they are regulated in response to gestation, lactation and involution is important for understanding how breast cancer develops. Recent studies have used numerous markers to enrich, isolate and characterise the different epithelial cell compartments within the adult mammary gland. However, in all of these studies only a handful of markers were used to define and trace cell populations. Therefore, there is a need for an unbiased and comprehensive description of mammary epithelial cells within the gland at different developmental stages. To this end we used single cell RNA sequencing (scRNAseq) to determine the gene expression profile of individual mammary epithelial cells across four adult developmental stages; nulliparous, mid gestation, lactation and post weaning (full natural involution).
Project description:<p><strong>BACKGROUND:</strong> Genomic prediction (GP) based on single nucleotide polymorphisms (SNP) has become a broadly used tool to increase the gain of selection in plant breeding. However, using predictors that are biologically closer to the phenotypes such as transcriptome and metabolome may increase the prediction ability in GP. The objectives of this study were to (i) assess the prediction ability for three phenotypic traits using different omic datasets including sequence variants (SV), deleterious SV (dSV), tolerant SV (tSV), expression presence/absence variation (ePAV), gene expression (GE), transcript expression (TE), and metabolites (M) as single predictors in comparison to those using a SNP array; (ii) investigate the improvement in prediction ability when combining multiple omic datasets information to predict phenotypic variation in barley breeding programs; (iii) explore the relationship between genes and metabolites to unravel the metabolic pathway of the three above mentioned phenotypic traits.</p><p><strong>RESULTS:</strong> The prediction ability from genomic best linear unbiased prediction (GBLUP) for the three traits using dSV information was higher than when using tSV, all SV information, or the SNP array. Any predictors from the transcriptome (GE, TE, as well as ePAV) and metabolome provided higher prediction abilities compared to the SNP array and SV on average across the three traits. In addition, some (di)-similarity existed between different omic datasets, and therefore provided complementary biological perspectives to phenotypic variation. Optimal combining the information of dSV, TE, ePAV, as well as metabolites into GP models could improve the prediction ability over that of the single predictors alone.</p><p><strong>CONCLUSIONS:</strong> The use of integrated omic datasets in GP model is highly recommended. Furthermore, we evaluated a cost-effective approach generating 3’end mRNA sequencing with transcriptome data extracted from seedling without losing prediction ability in comparison to the full-length mRNA sequencing, paving the path for the use of such prediction methods in commercial breeding programs.</p>
Project description:Background: One of the main fields of lung cancer research is identifying patients who are at high risk of post-resection recurrence. Individual recurrence risk evaluation by accurate but simple and reproducible method is needed for the clinical practice. Results: The log-rank test and further selection by our criteria of assayability generated 87 genes from microarray data with significant level 5%. Of these, by PTQ-PCR, the expression of most significant 18 genes was obtained. Using these gene expression information and clinical parameters, by stepwise variable selection method, the recurrence prediction model, which composed of 6 genes (CALB1, MMP7, SLC1A7, GSTA1, CCL19, IFI44) and pStage and cell differentiation, were developed. Validation into the two independent cohorts showed good results of the proposed model (p=0.0314, 0.0305, respectively). The predicted median recurrence-free survival times for each patient were reflected real ones well. Conclusions: Our method of individualized recurrence risk prediction is accurate, technically simple and reproducible to be used in clinical practice. Therefore, it would be useful in customizing the lung cancer management strategies. Keywords: Recurrence Free Survival Analysis
Project description:Immune checkpoint inhibitors (ICIs) drastically improve therapeutic outcomes for lung cancer, but accurate prediction of individual patient responses to ICIs remains a challenge. We performed a genome-wide analysis of 5-hydroxymethylcytosine (5hmC) in plasma cell-free DNA (cfDNA) samples from 83 lung cancer patients. Using machine learning approaches, we developed a 5hmC signature to predict ICI treatment response and calculated a weighted-predictive score (wp-score) based on the 5hmC levels of signature genes in each sample. A low wp-score was significantly correlated with longer progression-free survival across three independent patient sample sets, and demonstrated superior predictive capability to tumor programmed death-ligand 1. Moreover, we identified novel 5hmC-associated genes and signaling pathways integral to ICI treatment response in lung cancer. Our study suggests that cfDNA 5hmC analysis is a minimally invasive, innovative strategy for guiding treatment selection in lung cancer patients.