Project description:Recent progress in unbiased metagenomic next-generation sequencing (mNGS) allows simultaneous examination of microbial and host genetic material in a single test. Leveraging affordable bronchoalveolar lavage fluid (BALF) mNGS data, we employed machine learning to create a diagnostic approach distinguishing lung cancer from pulmonary infections, conditions prone to misdiagnosis in clinical settings. This prospective study analyzed BALF-mNGS data from lung cancer and pulmonary infection patients, delineating differences in DNA/RNA microbial composition, bacteriophage abundances, and host responses, including gene expression, transposable element levels, immune cell composition, and tumor fraction derived from copy number variation (CNV). Integrating these metrics into a host/microbe metagenomics-driven machine learning model (Model VI) demonstrated robustness, achieving an AUC of 0.87 (95% CI = 0.857-0.883), sensitivity = 73.8%, and specificity = 84.5% in the training cohort, and an AUC of 0.831 (95% CI = 0.819-0.843), sensitivity = 67.1%, and specificity = 94.4% in the validation cohort for distinguishing lung cancer from pulmonary infections. The application of a rule-in and rule-out strategy-based composite predictive model significantly enhances accuracy (ACC) in distinguishing between lung cancer and tuberculosis (ACC=0.913), fungal infection (ACC=0.955), and bacterial infection (ACC=0.836). These findings highlight the potential of cost-effective mNGS-based analysis as a valuable tool for early differentiation between lung cancer and pulmonary infections, offering significant benefits through a single comprehensive testing.
Project description:<p>Despite improved diagnostics, pulmonary pathogens in immunocompromised children frequently evade detection, leading to significant mortality. In this study, we performed RNA and DNA-based metagenomic next generation sequencing (mNGS) on 41 lower respiratory samples collected from 34 children. We identified a rich cross-domain pulmonary microbiome containing bacteria, fungi, RNA viruses, and DNA viruses in each patient. Potentially pathogenic bacteria were ubiquitous among samples but could be distinguished as possible causes of disease by parsing for outlier organisms. Potential pathogens were detected in half of samples previously negative by clinical diagnostics. Ongoing investigation is needed to determine the pathogenic significance of outlier microbes in the lungs of immunocompromised children with pulmonary disease. Metatranscriptomic (RNA) sequencing libraries are reported in the manuscript and are included for this release.</p>
Project description:The short length of miRNAs results in a high dynamic range of melting temperatures and therefore impedes a proper selection of detection probes or optimized PCR primers. While miRNA microarrays allow for massive parallel and accurate relative measurement of all known miRNAs, they have so far been less useful as an assay for absolute quantification. Here we developed a new method based not only to the hybridization process that presents the limits before described, but integrating the hybridization to an enzymatic reaction. Moreover we introduced spike-in in the hybridization-enzymatic reaction allowing the quantification of miRNAs respect to them, canceling biases related to sequence, labeling, or hybridization. An alternative method for the absolute miRNA quantization was recently proposed by Bissels (Absolute quantification of microRNAs by using a universal reference. RNA). It was based on the Absolute quantification of microRNAs by using a universal reference consisting of 954 synthetic human, mouse, rat, and viral miRNAs, with each individual oligoribonucleotide present in equimolar concentrations with tested miRNAs. Thereby, any single miRNA detected on a microarray can be quantified by directly comparing its signal intensity with the one obtained by the same miRNA sequence present in the universal reference adjusting for biases related to sequence, labeling, hybridization, or signal detection. Our method allowed the detection of a comparable concentration of miRNA (10-18 moles to 10-14 moles in a linear range) (see Figure), but allows controlling the hybridization quality and reproducibility basing on the results of the interpolation of the spike-in dependent curve. Moreover, our method does not influenced by phenomena imputable to different labeling process due to different sequences because labeling was due only to the incorporation of biotin-d(A) if the hybridized miRNA acted as primer for the klenow enzyme. This method allowed the discussion of miRNA genes expression in 14 different tissues relating it with tissue anatomical proximity and functional similarity.
Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.
Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.
Project description:Primary outcome(s): Comparative evaluation of the existing cfDNA quantification method based on clinical samples and this method
Study Design: Observational Study Model : Cohort, Time Perspective : Prospective, Enrollment : 20, Biospecimen Retention : Collect & Archive- Sample without DNA, Biospecimen Description : serum
Project description:Purpose: The goals of this study is to determine the best method of gene expression quantification (RNA-seq, Microarray, NanoString) and amplification kits adapted to low-input and/or low-quality RNA samples (FFPE samples) Methods: Mouse bladder cancer cell line (mouse bladder cancer cell line, BC57) and mouse normal mouse normal urothelium were fixed in formalin and embedded in paraffin (FFPE), andfesh frozen (FF) in liquid nitrogen. The total RNA of these 4 samples were tested by 3 technologies (NanoString, RNA-seq and Microarray) and the results were compared to its reference (high-quality and high-input RNA of mouse bladder cancer cell line and mouse normal mouse normal urothelium). For NanoString with low-input RNA samples, each sample was tested by NanoString quantification after amplification by SMARTer Stranded Total RNA-Seq Kit - Pico Input mammelian and Ovation SoLo NuGEN RNA-seq System, and NanoString based on PCR approach with three input quantities: 50pg, 250pg and 2ng of total RNA, except for NanoString quantification after amplification by SMARTer Stranded Total RNA-Seq Kit - Pico Input mammelian kit for which the minimum recommended quantity was 250pg of total RNA. NanoString direct quantification was also done for FF and FFPE samples at high amount (50ng of total RNA) and results obtained from FF samples were considered as the reference. To determine which is the method for NanoString technology, low-input and low-quality RNA samples, we performed NanoString control quality metrics, principal component analysis, and a differential analysis between the mouse bladder cancer cell lines and the mouse normal mouse normal urothelium for each input quantity, amplification method and method of sample preservation (FF or FFPE). Results: The NanoString based PCR based approach is recommended for quantification of gene expression of FFPE and FF samples from 250pg of total RNA. However, NanoString quantification after amplification by SMARTer Stranded Total RNA-Seq Kit - Pico Input mammelian and Ovation SoLo NuGEN RNA-seq System is not recommended for FF and FFPE from low-input samples.