Dataset Information

Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach.

ABSTRACT:

Background

The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology.

Objective

The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied.

Methods

We built predictive models using different subsets of factors, selected according to their importance in predicting patient classification. We then evaluated each independent model and also a combination of them, leading to a better predictive model.

Results

Our data mining approach identified genetic patterns that escaped detection using conventional statistics. More specifically, the partial decision trees and ensemble models increased the classification accuracy of hepatitis C virus outcome compared with conventional methods.

Conclusions

Data mining can be used more extensively in biomedicine, facilitating knowledge building and management of human diseases.

SUBMITTER: Frias M

PROVIDER: S-EPMC7946589 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach.

Frias Mario M Moyano Jose M JM Rivero-Juarez Antonio A Luna Jose M JM Camacho Ángela Á Fardoun Habib M HM Machuca Isabel I Al-Twijri Mohamed M Rivero Antonio A Ventura Sebastian S

Journal of medical Internet research 20210224 2

<h4>Background</h4>The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology.<h4>Objective</h4>The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied.<h4>Methods</h4>We built predictive models using different subsets of factors, selected according to their importance in predicting patient class ...[more]

PMID: 33624609

Similar Datasets

Project description:Several studies have demonstrated that chronic hepatitis delta virus (HDV) infection is associated with a worsening of hepatitis B virus (HBV) infection and increased risk of hepatocellular carcinoma (HCC). However, there is limited data on the role of HDV in the oncogenesis of HCC. This study is aimed at assessing the potential mechanisms of HDV-associated hepatocarcinogenesis, especially to screen and identify key genes and pathways possibly involved in the pathogenesis of HCC. We selected three microarray datasets: GSE55092 contains 39 cancer specimens and 81 paracancer specimens from 11 HBV-associated HCC patients, GSE98383 contains 11 cancer specimens and 24 paracancer specimens from 5 HDV-associated HCC patients, and 371 HCC patients with the RNA-sequencing data combined with their clinical data from the Cancer Genome Atlas (TCGA). Afterwards, 948 differentially expressed genes (DEGs) closely related to HDV-associated HCC were obtained using the R package and filtering with a Venn diagram. We then performed gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis to determine the biological processes (BP), cellular component (CC), molecular function (MF), and KEGG signaling pathways most enriched for DEGs. Additionally, we performed Weighted Gene Coexpression Network Analysis (WGCNA) and protein-to-protein interaction (PPI) network construction with 948 DEGs, from which one module was identified by WGCNA and three modules were identified by the PPI network. Subsequently, we validated the expression of 52 hub genes from the PPI network with an independent set of HCC dataset stored in the Gene Expression Profiling Interactive Analysis (GEPIA) database. Finally, seven potential key genes were identified by intersecting with key modules from WGCNA, including 3 reported genes, namely, CDCA5, CENPH, and MCM7, and 4 novel genes, namely, CDC6, CDC45, CDCA8, and MCM4, which are associated with nucleoplasm, cell cycle, DNA replication, and mitotic cell cycle. The CDCA8 and stage of HCC were the independent factors associated with overall survival of HDV-associated HCC. All the related findings of these genes can help gain a better understanding of the role of HDV in the underlying mechanism of HCC carcinogenesis.

Project description:BackgroundIn Uzbekistan, routine serologic testing has not been available to differentiate etiologies of acute viral hepatitis (AVH). To determine the age groups most affected by hepatitis E virus (HEV) during documented AVH epidemics, trends in AVH-associated mortality rate (MR) per 100,000 over a 15-year period and reported incidence of AVH over a 35-year period were examined.MethodsReported AVH incidence data from 1971 to 2005 and AVH-associated mortality data from 1981 to 1995 were examined. Serologic markers for infection with hepatitis viruses A, B, D, and E were determined from a sample of hospitalized patients with AVH from an epidemic period (1987) and from a sample of pregnant women with AVH from a non-epidemic period (1992).ResultsTwo multi-year AVH outbreaks were identified: one during 1975-1976, and one during 1985-1987. During 1985-1987, AVH-associated MRs were 12.3-17.8 per 100,000 for the general population. Highest AVH-associated MRs occurred among children in the first 3 years of life (40-190 per 100,000) and among women aged 20-29 (15-21 per 100,000). During 1988-1995 when reported AVH morbidity was much lower in the general population, AVH-associated MRs were markedly lower among these same age groups. In 1988, AVH-associated MRs were higher in rural (21 per 100,000) than in urban (8 per 100,000) populations (RR 2.6; 95% CI 1.16-5.93; p < 0.05). Serologic evidence of acute HEV infection was found in 280 of 396 (71%) patients with AVH in 1987 and 12 of 99 (12%) pregnant patients with AVH in 1992.ConclusionIn the absence of the availability of confirmatory testing, inferences regarding probable hepatitis epidemic etiologies can sometimes be made using surveillance data, comparing AVH incidence with AVH-associated mortality with an eye to population-based viral hepatitis control measures. Data presented here implicate HEV as the probable etiology of high mortality observed in pregnant women and in children less than 3 years of age in Uzbekistan during 1985-1987. High mortality among pregnant women but not among children less than 3 years has been observed in previous descriptions of epidemic hepatitis E. The high mortality among younger children observed in an AVH outbreak associated with hepatitis E merits corroboration in future outbreaks.

Dataset Information

Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach.

Background

Objective

Methods

Results

Conclusions

Publications

Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets