Dataset Information

Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach.

ABSTRACT: Probabilistic topic models, have been widely deployed for various applications such as learning disease or tissue subtypes. Yet, learning the parameters of such models is usually an ill-posed problem and may result in losing valuable information about disease severity. A common approach is to add a discriminative loss term to the generative model's loss in order to learn a representation that is also predictive of disease severity. However, finding a balance between these two losses is not straightforward. We propose an alternative way in this paper. We develop a framework which allows for incorporating external covariates into the generative model's approximate posterior. These covariates can have more discriminative power for disease severity compared to the representation that we extract from the posterior distribution. For instance, they can be features extracted from a neural network which predicts disease severity from CT images. Effectively, we enforce the generative model's approximate posterior to reside in the subspace of these discriminative covariates. We illustrate our method's application on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD), a highly heterogeneous disease. We aim at identifying tissue subtypes by using a variant of topic model as a generative model. We quantitatively evaluate the predictive performance of the inferred subtypes and demonstrate that our method outperforms or performs on par with some reasonable baselines. We also show that some of the discovered subtypes are correlated with genetic measurements, suggesting that the identified subtypes may characterize the disease's underlying etiology.

SUBMITTER: Saeedi A

PROVIDER: S-EPMC8797254 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach.

Saeedi Ardavan A Yadollahpour Payman P Singla Sumedha S Pollack Brian B Wells William W Sciurba Frank F Batmanghelich Kayhan K

Proceedings of machine learning research 20210101

Probabilistic topic models, have been widely deployed for various applications such as learning disease or tissue subtypes. Yet, learning the parameters of such models is usually an ill-posed problem and may result in losing valuable information about disease severity. A common approach is to add a discriminative loss term to the generative model's loss in order to learn a representation that is also predictive of disease severity. However, finding a balance between these two losses is not strai ...[more]

PMID: 35098143

Similar Datasets

Project description:BACKGROUND:High Content Screening (HCS) has become an important tool for toxicity assessment, partly due to its advantage of handling multiple measurements simultaneously. This approach has provided insight and contributed to the understanding of systems biology at cellular level. To fully realize this potential, the simultaneously measured multiple endpoints from a live cell should be considered in a probabilistic relationship to assess the cell's condition to response stress from a treatment, which poses a great challenge to extract hidden knowledge and relationships from these measurements. METHOD:In this work, we applied a text mining method of Latent Dirichlet Allocation (LDA) to analyze cellular endpoints from in vitro HCS assays and related to the findings to in vivo histopathological observations. We measured multiple HCS assay endpoints for 122 drugs. Since LDA requires the data to be represented in document-term format, we first converted the continuous value of the measurements to the word frequency that can processed by the text mining tool. For each of the drugs, we generated a document for each of the 4 time points. Thus, we ended with 488 documents (drug-hour) each having different values for the 10 endpoints which are treated as words. We extracted three topics using LDA and examined these to identify diagnostic topics for 45 common drugs located in vivo experiments from the Japanese Toxicogenomics Project (TGP) observing their necrosis findings at 6 and 24 hours after treatment. RESULTS:We found that assay endpoints assigned to particular topics were in concordance with the histopathology observed. Drugs showing necrosis at 6 hour were linked to severe damage events such as Steatosis, DNA Fragmentation, Mitochondrial Potential, and Lysosome Mass. DNA Damage and Apoptosis were associated with drugs causing necrosis at 24 hours, suggesting an interplay of the two pathways in these drugs. Drugs with no sign of necrosis we related to the Cell Loss and Nuclear Size assays, which is suggestive of hepatocyte regeneration. CONCLUSIONS:The evidence from this study suggests that topic modeling with LDA can enable us to interpret relationships of endpoints of in vitro assays along with an in vivo histological finding, necrosis. Effectiveness of this approach may add substantially to our understanding of systems biology.

Project description:BackgroundSocial media platforms allow individuals to openly gather, communicate, and share information about their interactions with health care services, becoming an essential supplemental means of understanding patient experience.ObjectiveWe aimed to identify common discussion topics related to health care experience from the public's perspective and to determine areas of concern from patients' perspectives that health care providers should act on.MethodsThis study conducted a spatiotemporal analysis of the volume, sentiment, and topic of patient experience-related posts on the Weibo platform developed by Sina Corporation. We applied a supervised machine learning approach including human annotation and machine learning-based models for topic modeling and sentiment analysis of the public discourse. A multiclassifier voting method based on logistic regression, multinomial naïve Bayes, and random forest was used.ResultsA total of 4008 posts were manually classified into patient experience topics. A patient experience theme framework was developed. The accuracy, precision, recall, and F-measure of the method integrating logistic regression, multinomial naïve Bayes, and random forest for patient experience themes were 0.93, 0.95, 0.80, 0.77, and 0.84, respectively, indicating a satisfactory prediction. The sentiment analysis revealed that negative sentiment posts constituted the highest proportion (3319/4008, 82.81%). Twenty patient experience themes were discussed on the social media platform. The majority of the posts described the interpersonal aspects of care (2947/4008, 73.53%); the five most frequently discussed topics were "health care professionals' attitude," "access to care," "communication, information, and education," "technical competence," and "efficacy of treatment."ConclusionsHospital administrators and clinicians should consider the value of social media and pay attention to what patients and their family members are communicating on social media. To increase the utility of these data, a machine learning algorithm can be used for topic modeling. The results of this study highlighted the interpersonal and functional aspects of care, especially the interpersonal aspects, which are often the "moment of truth" during a service encounter in which patients make a critical evaluation of hospital services.

Dataset Information

Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach.

Publications

Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets