Dataset Information

Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer.

ABSTRACT: Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.

SUBMITTER: Ai L

PROVIDER: S-EPMC5354752 | biostudies-literature | 2017 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer.

Ai Luoyan L Tian Haiying H Chen Zhaofei Z Chen Huimin H Xu Jie J Fang Jing-Yuan JY

Oncotarget 20170201 6

Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 ...[more]

PMID: 28061434

Similar Datasets

Project description:Background and aimsInvestigation of microbe-metabolite relationships in the gut is needed to understand and potentially reduce colorectal cancer (CRC) risk.MethodsMicrobiota and metabolomics profiling were performed on lyophilized feces from 42 CRC cases and 89 matched controls. Multivariable logistic regression was used to identify statistically independent associations with CRC. First principal coordinate-component pair (PCo1-PC1) and false discovery rate (0.05)-corrected P-values were calculated for 116,000 Pearson correlations between 530 metabolites and 220 microbes in a sex*case/control meta-analysis.ResultsOverall microbe-metabolite PCo1-PC1 was more strongly correlated in cases than in controls (Rho 0.606 vs 0.201, P = 0.01). CRC was independently associated with lower levels of Clostridia, Lachnospiraceae, p-aminobenzoate and conjugated linoleate, and with higher levels of Fusobacterium, Porphyromonas, p-hydroxy-benzaldehyde, and palmitoyl-sphingomyelin. Through postulated effects on cell shedding (palmitoyl-sphingomyelin), inflammation (conjugated linoleate), and innate immunity (p-aminobenzoate), metabolites mediated the CRC association with Fusobacterium and Porphyromonas by 29% and 34%, respectively. Overall, palmitoyl-sphingomyelin correlated directly with abundances of Enterobacteriaceae (Gammaproteobacteria), three Actinobacteria and five Firmicutes. Only Parabacteroides correlated inversely with palmitoyl-sphingomyelin. Other lipids correlated inversely with Alcaligenaceae (Betaproteobacteria). Six Bonferroni-significant correlations were found, including low indolepropionate and threnoylvaline with Actinobacteria and high erythronate and an uncharacterized metabolite with Enterobacteriaceae.ConclusionsFeces from CRC cases had very strong microbe-metabolite correlations that were predominated by Enterobacteriaceae and Actinobacteria. Metabolites mediated a direct CRC association with Fusobacterium and Porphyromonas, but not an inverse association with Clostridia and Lachnospiraceae. This study identifies complex microbe-metabolite networks that may provide insights on neoplasia and targets for intervention.

Project description:BackgroundHigh-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis.ResultsThe task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Moreover, if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data to build models. As far as we were concerned, there is no common benchmark data available for chRNAs detection. The definition of a classification baseline is lacking in the related literature too. In this work we are moving towards benchmark data and an evaluation of the fidelity of supervised classifiers in the prediction of chRNAs.ConclusionsWe proposed a modelization strategy that can be used to increase the tools performances in context of chRNA classification based on a simulated data generator, that permit to continuously integrate new complex chimeric events. The pipeline incorporated a genome mutation process and simulated RNA-seq data. The reads within distinct depth were aligned and analysed by CRAC that integrates genomic location and local coverage, allowing biological predictions at the read scale. Additionally, these reads were functionally annotated and aggregated to form chRNAs events, making it possible to evaluate ML methods (classifiers) performance in both levels of reads and events. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95 % (ACC=94 %, Kappa=0.87 %). The resulting classification models were also tested on real RNA-seq data from a set of twenty-seven patients with acute myeloid leukemia (AML).

Project description:BackgroundColorectal cancer (CRC) is one of the most common cancers. In recent studies, the gut microbiota has been reported to be potentially involved in aggravating or favoring CRC development. However, little is known about the microbiota composition in CRC patients after treatment. In this study, we explored the fecal microbiota composition to obtain a periscopic view of gut microbial communities. We analyzed microbial 16S rRNA genes from 107 fecal samples of Chinese individuals from three groups, including 33 normal controls (NC), 38 CRC patients (Fa), and 36 CRC post-surgery patients (Fb).ResultsSpecies richness and diversity were decreased in the Fa and Fb groups compared with that of the NC group. Partial least squares discrimination analysis showed clustering of samples according to disease with an obvious separation between the Fa and NC, and Fb and NC groups, as well as a partial separation between the Fa and Fb groups. Based on linear discriminant analysis effect size analysis and a receiver operating characteristic model, Fusobacterium was suggested as a potential biomarker for CRC screening. Additionally, we found that surgery greatly reduced the bacterial diversity of microbiota in CRC patients. Some commensal beneficial bacteria of the intestinal canal, such as Faecalibacterium and Prevotella, were decreased, whereas the drug-resistant Enterococcus was visibly increased in CRC post-surgery group. Meanwhile, we observed a declining tendency of Fusobacterium in the majority of follow-up CRC patients who were still alive approximately 3 y after surgery. We also observed that beneficial bacteria dramatically decreased in CRC patients that recidivated or died after surgery. This revealed that important bacteria might be associated with prognosis.ConclusionsThe fecal bacterial diversity was diminished in CRC patients compared with that in NC. Enrichment and depletion of several bacterial strains associated with carcinomas and inflammation were detected in CRC samples. Fusobacterium might be a potential biomarker for early screening of CRC in Chinese or Asian populations. In summary, this study indicated that fecal microbiome-based approaches could be a feasible method for detecting CRC and monitoring prognosis post-surgery.

Dataset Information

Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer.

Publications

Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets