Project description:Single-cell transcriptomics enables the definition of diverse human immune cell types across multiple tissues and disease contexts. Further deeper biological understanding requires comprehensive integration of multiple single-cell omics (transcriptomic, proteomic, and cell-receptor repertoire). To improve the identification of diverse cell types and the accuracy of cell-type classification in multi-omics single-cell datasets, we developed SuPERR, a novel analysis workflow to increase the resolution and accuracy of clustering and allow for the discovery of previously hidden cell subsets. In addition, SuPERR accurately removes cell doublets and prevents widespread cell-type misclassification by incorporating information from cell-surface proteins and immunoglobulin transcript counts. This approach uniquely improves the identification of heterogeneous cell types and states in the human immune system, including rare subsets of antibody-secreting cells in the bone marrow.
Project description:Many DDB1-CUL4 associated factors (DCAFs) have been identified and serve as substrate receptors. Although the oncogenic role of CUL4A has been well established, specific DCAFs involved in cancer development remain largely unknown. Here we infer the potential impact of 19 well-defined DCAFs in human lung adenocarcinomas (LuADCs) using integrative omics analyses, and discover that mRNA levels of DTL, DCAF4, 12 and 13 are consistently elevated whereas VBRBP is reduced in LuADCs compared to normal lung tissues. The transcriptional levels of DCAFs are significantly correlated with their gene copy number variations. SKIP2, DTL, DCAF6, 7, 8, 13 and 17 are frequently gained whereas VPRBP, PHIP, DCAF10, 12 and 15 are frequently lost. We find that only transcriptional level of DTL is robustly, significantly and negatively correlated with overall survival across independent datasets. Moreover, DTL-correlated genes are enriched in cell cycle and DNA repair pathways. We also identified that the levels of 25 proteins were significantly associated with DTL overexpression in LuADCs, which include significant decreases in protein level of the tumor supressor genes such as PDCD4, NKX2-1 and PRKAA1. Our results suggest that different CUL4-DCAF axis plays the distinct roles in LuADC development with possible relevance for therapeutic target development.
Project description:Small cell lung cancer (SCLC) is highly invasive and lethal. Here we performed RNA-Seq and whoe exome sequencing (WES) in 19 Chinese SCLC clinical tumor specimens.
Project description:The integrative personal omics profile (iPOP) is a pioneering study that combines genomics, transcriptomics, proteomics, metabolomics and autoantibody profiles from a single individual over a 14-month period. The observation period includes two episodes of viral infection: a human rhinovirus and a respiratory syncytial virus. The profile studies give an informative snapshot into the biological functioning of an organism. We hypothesize that pathway expression levels are associated with disease status. To test this hypothesis, we use biological pathways to integrate metabolomics and proteomics iPOP data. The approach computes the pathways' differential expression levels at each time point, while taking into account the pathway structure and the longitudinal design. The resulting pathway levels show strong association with the disease status. Further, we identify temporal patterns in metabolite expression levels. The changes in metabolite expression levels also appear to be consistent with the disease status. The results of the integrative analysis suggest that changes in biological pathways may be used to predict and monitor the disease. The iPOP experimental design, data acquisition and analysis issues are discussed within the broader context of personal profiling.
Project description:BackgroundLung adenocarcinoma (LUAD) contains a variety of genomic and epigenomic abnormalities; the effective tumor markers related to these abnormalities need to be further explored.MethodsClustering analysis was performed based on DNA methylation (MET), DNA copy number variation (CNV), and mRNA expression data, and the differences in survival and tumor immune microenvironment (TIME) between subtypes were compared. Further, we evaluated the signatures in terms of both prognostic value and immunological characteristics.ResultsThere was a positive correlation between MET and CNV in LUAD. Integrative analysis of multi-omics data from 443 samples determined molecular subtypes, iC1 and iC2. The fractions of CD8+ T cells and activated CD4+ T cells were higher, the fraction of Tregs was lower, and the expression level of programmed death-ligand 1 (PD-L1) was higher in iC2 with a poor prognosis showing a higher TIDE score. We selected PTTG1, SLC2A1, and FAM83A as signatures of molecular subtypes to build a prognostic risk model and divided patients into high-risk group and low-risk group representing poor prognosis and good prognosis, respectively, which were validated in 180 patients with LUAD. Further, the low-risk group with lower TIDE score had more infiltrating immune cells. In 100 patients with LUAD, the high-risk group with an immunosuppressive state had a higher expression of PD-L1 and lower counts of CD8+ T cells and dendritic cells.ConclusionsThese findings demonstrated that combined multi-omics data could determine molecular subtypes with significant differences of prognosis and TIME in LUAD and suggested potent utility of the signatures to guide immunotherapy.
Project description:MotivationIntegrative analysis of multi-omics data from different high-throughput experimental platforms provides valuable insight into regulatory mechanisms associated with complex diseases, and gains statistical power to detect markers that are otherwise overlooked by single-platform omics analysis. In practice, a significant portion of samples may not be measured completely due to insufficient tissues or restricted budget (e.g. gene expression profile are measured but not methylation). Current multi-omics integrative methods require complete data. A common practice is to ignore samples with any missing platform and perform complete case analysis, which leads to substantial loss of statistical power.MethodsIn this article, inspired by the popular Integrative Bayesian Analysis of Genomics data (iBAG), we propose a full Bayesian model that allows incorporation of samples with missing omics data.ResultsSimulation results show improvement of the new full Bayesian approach in terms of outcome prediction accuracy and feature selection performance when sample size is limited and proportion of missingness is large. When sample size is large or the proportion of missingness is low, incorporating samples with missingness may introduce extra inference uncertainty and generate worse prediction and feature selection performance. To determine whether and how to incorporate samples with missingness, we propose a self-learning cross-validation (CV) decision scheme. Simulations and a real application on child asthma dataset demonstrate superior performance of the CV decision scheme when various types of missing mechanisms are evaluated.Availability and implementationFreely available on the GitHub at https://github.com/CHPGenetics/FBM.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
Project description:MotivationIt is more and more common to perform multi-omics analyses to explore the genome at diverse levels and not only at a single level. Through integrative statistical methods, multi-omics data have the power to reveal new biological processes, potential biomarkers and subgroups in a cohort. Matrix factorization (MF) is an unsupervised statistical method that allows a clustering of individuals, but also reveals relevant omics variables from the various blocks.ResultsHere, we present PIntMF (Penalized Integrative Matrix Factorization), an MF model with sparsity, positivity and equality constraints. To induce sparsity in the model, we used a classical Lasso penalization on variable and individual matrices. For the matrix of samples, sparsity helps in the clustering, while normalization (matching an equality constraint) of inferred coefficients is added to improve interpretation. Moreover, we added an automatic tuning of the sparsity parameters using the famous glmnet package. We also proposed three criteria to help the user to choose the number of latent variables. PIntMF was compared with other state-of-the-art integrative methods including feature selection techniques in both synthetic and real data. PIntMF succeeds in finding relevant clusters as well as variables in two types of simulated data (correlated and uncorrelated). Next, PIntMF was applied to two real datasets (Diet and cancer), and it revealed interpretable clusters linked to available clinical data. Our method outperforms the existing ones on two criteria (clustering and variable selection). We show that PIntMF is an easy, fast and powerful tool to extract patterns and cluster samples from multi-omics data.Availability and implementationAn R package is available at https://github.com/mpierrejean/pintmf.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Non-small cell lung cancer (NSCLC) is one of the most common malignancies worldwide. The development of high-throughput single-cell RNA-sequencing (RNA-seq) technology and the advent of multi-omics have provided a solid basis for a systematic understanding of the heterogeneity in cancers. Although numerous studies have revealed the molecular features of NSCLC, it is important to identify and validate the molecular biomarkers related to specific NSCLC phenotypes at single-cell resolution. In this study, we analyzed and validated single-cell RNA-seq data by integrating multi-level omics data to identify key metabolic features and prognostic biomarkers in NSCLC. High-throughput single-cell RNA-seq data, including 4887 cellular gene expression profiles from NSCLC tissues, were analyzed. After pre-processing, the cells were clustered into 12 clusters using the t-SNE clustering algorithm, and the cell types were defined according to the marker genes. Malignant epithelial cells exhibit individual differences in molecular features and intra-tissue metabolic heterogeneity. We found that oxidative phosphorylation (OXPHOS) and glycolytic pathway activity are major contributors to intra-tissue metabolic heterogeneity of malignant epithelial cells and T cells. Furthermore, we constructed T-cell differentiation trajectories and identified several key genes that regulate the cellular phenotype. By screening for genes associated with T-cell differentiation using the Lasso algorithm and Cox risk regression, we identified four prognostic marker genes for NSCLC. In summary, our study revealed metabolic features and prognostic markers of NSCLC at single-cell resolution, which provides novel findings on molecular biomarkers and signatures of cancers.