Dataset Information

Proteome-wide copy-number estimation from transcriptomics

ABSTRACT: Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics-transcriptomics for 4366 genes in 369 cell lines. The approach starts with a protein's median copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model linking mRNAs to protein. For dozens of cell lines and primary samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, empirical protein-to-mRNA ratios, and a proteogenomic DREAM challenge winner. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein complexes, suggesting mechanistic relationships. We use the method to identify a viral-receptor abundance threshold for coxsackievirus B3 susceptibility from 1489 systems-biology infection models parameterized by protein inference. When applied to 796 RNA-seq profiles of breast cancer, inferred copy-number estimates collectively reclassify 26-29% of luminal tumors. By adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility of contemporary proteomics.

SUBMITTER: Dr Andrew, J. Sweatt

PROVIDER: S-SCDT-10_1038-S44320-024-00064-3 | biostudies-other |

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:Carbapenemase production is one of the leading mechanisms of carbapenem resistance in Gram-negative bacteria. An increase in carbapenemase gene (blaCarb) copies is an important mechanism of carbapenem resistance. No currently available bioinformatics tools allow for reliable detection and reporting of carbapenemase gene copy numbers. Here, we describe the carbapenemase-encoding gene copy number estimator (CCNE), a ready-to-use bioinformatics tool that was developed to estimate blaCarb copy numbers from whole-genome sequencing data. Its performance on Klebsiella pneumoniae carbapenemase gene (blaKPC) copy number estimation was evaluated by simulation and quantitative PCR (qPCR), and the results were compared with available algorithms. CCNE has two components, CCNE-acc and CCNE-fast. CCNE-acc detects blaCarb copy number in a comprehensive and high-accuracy way, while CCNE-fast rapidly screens blaCarb copy numbers. CCNE-acc achieved the best accuracy (100%) and the lowest root mean squared error (RMSE; 0.07) in simulated noise data sets, compared to the assembly-based method (23.4% accuracy, 1.697 RMSE) and the OrthologsBased method (78.9% accuracy, 0.395 RMSE). In the qPCR validation, a high consistency was observed between the blaKPC copy number determined by qPCR and that determined with CCNE. Reverse transcription-qPCR transcriptional analysis of 40 isolates showed that blaKPC expression was positively correlated with the blaKPC copy numbers detected by CCNE (P < 0.001). An association study of 357 KPC-producing K. pneumoniae isolates and their antimicrobial susceptibility identified a significant association between the estimated blaKPC copy number and MICs of imipenem (P < 0.001) and ceftazidime-avibactam (P < 0.001). Overall, CCNE is a useful genomic tool for the analysis of antimicrobial resistance genes copy number; it is available at https://github.com/biojiang/ccne. IMPORTANCE Globally disseminated carbapenem-resistant Enterobacterales is an urgent threat to public health. The most common carbapenem resistance mechanism is the production of carbapenemases. Carbapenemase-producing isolates often exhibit a wide range of carbapenem MICs. Higher carbapenem MICs have been associated with treatment failure. The increase of carbapenemase gene (blaCarb) copy numbers contributes to increased carbapenem MICs. However, blaCarb gene copy number detection is not routinely conducted during a genomic analysis, in part due to the lack of optimal bioinformatics tools. In this study, we describe a ready-to-use tool we developed and designated the carbapenemase-encoding gene copy number estimator (CCNE) that can be used to estimate the blaCarb copy number directly from whole-genome sequencing data, and we extended the data to support the analysis of all known blaCarb genes and some other antimicrobial resistance genes. Furthermore, CCNE can be used to interrogate the correlations between genotypes and susceptibility phenotypes and to improve our understanding of antimicrobial resistance mechanisms.

Project description:Genomic instability is a hallmark of cancer often associated with poor patient outcome and resistance to targeted therapy. Assessment of genomic instability in bulk tumor or biopsy can be complicated due to sample availability, surrounding tissue contamination, or tumor heterogeneity. The Epic Sciences circulating tumor cell (CTC) platform utilizes a non-enrichment based approach for the detection and characterization of rare tumor cells in clinical blood samples. Genomic profiling of individual CTCs could provide a portrait of cancer heterogeneity, identify clonal and sub-clonal drivers, and monitor disease progression. To that end, we developed a single cell Copy Number Variation (CNV) Assay to evaluate genomic instability and CNVs in patient CTCs. For proof of concept, prostate cancer cell lines, LNCaP, PC3 and VCaP, were spiked into healthy donor blood to create mock patient-like samples for downstream single cell genomic analysis. In addition, samples from seven metastatic castration resistant prostate cancer (mCRPC) patients were included to evaluate clinical feasibility. CTCs were enumerated and characterized using the Epic Sciences CTC Platform. Identified single CTCs were recovered, whole genome amplified, and sequenced using an Illumina NextSeq 500. CTCs were then analyzed for genome-wide copy number variations, followed by genomic instability analyses. Large-scale state transitions (LSTs) were measured as surrogates of genomic instability. Genomic instability scores were determined reproducibly for LNCaP, PC3, and VCaP, and were higher than white blood cell (WBC) controls from healthy donors. A wide range of LST scores were observed within and among the seven mCRPC patient samples. On the gene level, loss of the PTEN tumor suppressor was observed in PC3 and 5/7 (71%) patients. Amplification of the androgen receptor (AR) gene was observed in VCaP cells and 5/7 (71%) mCRPC patients. Using an in silico down-sampling approach, we determined that DNA copy number and genomic instability can be detected with as few as 350K sequencing reads. The data shown here demonstrate the feasibility of detecting genomic instabilities at the single cell level using the Epic Sciences CTC Platform. Understanding CTC heterogeneity has great potential for patient stratification prior to treatment with targeted therapies and for monitoring disease evolution during treatment.

Dataset Information

Proteome-wide copy-number estimation from transcriptomics

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets