Dataset Information

Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.

ABSTRACT: BACKGROUND:Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization. OBJECTIVE:Traditional approaches in ATS for biomedical text suffer from fundamental issues such as an inability to capture clinical context, quality of evidence, and purpose-driven selection of passages for the summary. We aimed to circumvent these limitations through achieving precise, succinct, and coherent information extraction from credible published biomedical resources, and to construct a simplified summary containing the most informative content that can offer a review particular to clinical needs. METHODS:In our proposed approach, we introduce a novel framework, termed Biomed-Summarizer, that provides quality-aware Patient/Problem, Intervention, Comparison, and Outcome (PICO)-based intelligent and context-enabled summarization of biomedical text. Biomed-Summarizer integrates the prognosis quality recognition model with a clinical context-aware model to locate text sequences in the body of a biomedical article for use in the final summary. First, we developed a deep neural network binary classifier for quality recognition to acquire scientifically sound studies and filter out others. Second, we developed a bidirectional long-short term memory recurrent neural network as a clinical context-aware classifier, which was trained on semantically enriched features generated using a word-embedding tokenizer for identification of meaningful sentences representing PICO text sequences. Third, we calculated the similarity between query and PICO text sequences using Jaccard similarity with semantic enrichments, where the semantic enrichments are obtained using medical ontologies. Last, we generated a representative summary from the high-scoring PICO sequences aggregated by study type, publication credibility, and freshness score. RESULTS:Evaluation of the prognosis quality recognition model using a large dataset of biomedical literature related to intracranial aneurysm showed an accuracy of 95.41% (2562/2686) in terms of recognizing quality articles. The clinical context-aware multiclass classifier outperformed the traditional machine-learning algorithms, including support vector machine, gradient boosted tree, linear regression, K-nearest neighbor, and naïve Bayes, by achieving 93% (16127/17341) accuracy for classifying five categories: aim, population, intervention, results, and outcome. The semantic similarity algorithm achieved a significant Pearson correlation coefficient of 0.61 (0-1 scale) on a well-known BIOSSES dataset (with 100 pair sentences) after semantic enrichment, representing an improvement of 8.9% over baseline Jaccard similarity. Finally, we found a highly positive correlation among the evaluations performed by three domain experts concerning different metrics, suggesting that the automated summarization is satisfactory. CONCLUSIONS:By employing the proposed method Biomed-Summarizer, high accuracy in ATS was achieved, enabling seamless curation of research evidence from the biomedical literature to use for clinical decision-making.

SUBMITTER: Afzal M

PROVIDER: S-EPMC7647812 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.

Afzal Muhammad M Alam Fakhare F Malik Khalid Mahmood KM Malik Ghaus M GM

Journal of medical Internet research 20201023 10

<h4>Background</h4>Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization.<h4>Objective</h4>Traditional approaches in ATS for biomedical text suffer f ...[more]

PMID: 33095174

Similar Datasets

Project description:Preeclampsia (PE) is a hypertensive complication affecting 8-10% of US pregnancies annually. While there is no cure for PE, aspirin may reduce complications for those at high risk for PE. Furthermore, PE disproportionately affects racial minorities, with a higher burden of morbidity and mortality. Previous studies have shown early prediction of PE would allow for prevention. We approached the prediction of PE using a new method based on a cost-sensitive deep neural network (CSDNN) by considering the severe imbalance and sparse nature of the data, as well as racial disparities. We validated our model using large extant rich data sources that represent a diverse cohort of minority populations in the US. These include Texas Public Use Data Files (PUDF), Oklahoma PUDF, and the Magee Obstetric Medical and Infant (MOMI) databases. We identified the most influential clinical and demographic features (predictor variables) relevant to PE for both general populations and smaller racial groups. We also investigated the effectiveness of multiple network architectures using three hyperparameter optimization algorithms: Bayesian optimization, Hyperband, and random search. Our proposed models equipped with focal loss function yield superior and reliable prediction performance compared with the state-of-the-art techniques with an average area under the curve (AUC) of 66.3% and 63.5% for the Texas and Oklahoma PUDF respectively, while the CSDNN model with weighted cross-entropy loss function outperforms with an AUC of 76.5% for the MOMI data. Furthermore, our CSDNN model equipped with focal loss function leads to an AUC of 66.7% for Texas African American and 57.1% for Native American. The best results are obtained with 62.3% AUC with CSDNN with weighted cross-entropy loss function for Oklahoma African American, 58% AUC with DNN and balanced batch for Oklahoma Native American, and 72.4% AUC using either CSDNN with weighted cross-entropy loss function or CSDNN with focal loss with balanced batch method for MOMI African American dataset. Our results provide the first evidence of the predictive power of clinical databases for PE prediction among minority populations.

Project description:Intracranial hemorrhage (ICH) occurs when a blood vessel ruptures in the brain. This leads to significant morbidity and mortality, the likelihood of which is predicated on the size of the bleeding event. X-ray computed tomography (CT) scans allow clinicians and researchers to qualitatively and quantitatively diagnose hemorrhagic stroke, guide interventions and determine inclusion criteria of patients in clinical trials. There is no currently available open source, validated tool to quickly segment hemorrhage. Using an automated pipeline and 2D and 3D deep neural networks, we show that we can quickly and accurately estimate ICH volume with high agreement with time-consuming manual segmentation. The training and validation datasets include significant heterogeneity in terms of pathology, such as the presence of intraventricular (IVH) or subdural hemorrhages (SDH) as well as variable image acquisition parameters. We show that deep neural networks trained with an appropriate anatomic context in the network receptive field, can effectively perform ICH segmentation, but those without enough context will overestimate hemorrhage along the skull and around calcifications in the ventricular system. We trained with all data from a multi-center phase II study (n = 112) achieving a best mean and median Dice coefficient of 0.914 and 0.919, a volume correlation of 0.979 and an average volume difference of 1.7 ml and root mean squared error of 4.7 ml in 500 out-of-sample scans from the corresponding multi-center phase III study. 3D networks with appropriate anatomic context outperformed both 2D and random forest models. Our results suggest that deep neural network models, when carefully developed can be incorporated into the workflow of an ICH clinical trial series to quickly and accurately segment ICH, estimate total hemorrhage volume and minimize segmentation failures. The model, weights and scripts for deployment are located at https://github.com/msharrock/deepbleed . This is the first publicly available neural network model for segmentation of ICH, the only model evaluated with the presence of both IVH and SDH and the only model validated in the workflow of a series of clinical trials.

Dataset Information

Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.

Publications

Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets