Project description:Recent progress in unbiased metagenomic next-generation sequencing (mNGS) allows simultaneous examination of microbial and host genetic material in a single test. Leveraging affordable bronchoalveolar lavage fluid (BALF) mNGS data, we employed machine learning to create a diagnostic approach distinguishing lung cancer from pulmonary infections, conditions prone to misdiagnosis in clinical settings. This prospective study analyzed BALF-mNGS data from lung cancer and pulmonary infection patients, delineating differences in DNA/RNA microbial composition, bacteriophage abundances, and host responses, including gene expression, transposable element levels, immune cell composition, and tumor fraction derived from copy number variation (CNV). Integrating these metrics into a host/microbe metagenomics-driven machine learning model (Model VI) demonstrated robustness, achieving an AUC of 0.87 (95% CI = 0.857-0.883), sensitivity = 73.8%, and specificity = 84.5% in the training cohort, and an AUC of 0.831 (95% CI = 0.819-0.843), sensitivity = 67.1%, and specificity = 94.4% in the validation cohort for distinguishing lung cancer from pulmonary infections. The application of a rule-in and rule-out strategy-based composite predictive model significantly enhances accuracy (ACC) in distinguishing between lung cancer and tuberculosis (ACC=0.913), fungal infection (ACC=0.955), and bacterial infection (ACC=0.836). These findings highlight the potential of cost-effective mNGS-based analysis as a valuable tool for early differentiation between lung cancer and pulmonary infections, offering significant benefits through a single comprehensive testing.
Project description:In this study we developed metaproteomics based methods for quantifying taxonomic composition of microbiomes (microbial communities). We also compared metaproteomics based quantification to other quantification methods, namely metagenomics and 16S rRNA gene amplicon sequencing. The metagenomic and 16S rRNA data can be found in the European Nucleotide Archive (Study number: PRJEB19901). For the method development and comparison of the methods we analyzed three types of mock communities with all three methods. The communities contain between 28 to 32 species and strains of bacteria, archaea, eukaryotes and bacteriophage. For each community type 4 biological replicate communities were generated. All four replicates were analyzed by 16S rRNA sequencing and metaproteomics. Three replicates of each community type were analyzed with metagenomics. The "C" type communities have same cell/phage particle number for all community members (C1 to C4). The "P" type communities have the same protein content for all community members (P1 to P4). The "U" (UNEVEN) type communities cover a large range of protein amounts and cell numbers (U1 to U4). We also generated proteomic data for four pure cultures to test the specificity of the protein inference method. This data is also included in this submission.