Dataset Information

Automatically determining cause of death from verbal autopsy narratives.

ABSTRACT:

Background

A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person's death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone.

Methods

After preprocessing and spelling correction, our method extracts word frequency counts from the narratives and uses them as input to four different machine learning classifiers: naïve Bayes, random forest, support vector machines, and a neural network.

Results

For individual CoD classification, our best classifier achieves a sensitivity of.770 for adult deaths for 15 CoD categories (as compared to the current best reported sensitivity of.57), and.662 with 48 WHO categories. When predicting the CoD distribution at the population level, our best classifier achieves.962 cause-specific mortality fraction accuracy for 15 categories and.908 for 48 categories, which is on par with leading CoD distribution estimation methods.

Conclusions

Our narrative-based machine learning classifier performs as well as classifiers based on structured data at the individual level. Moreover, our method demonstrates that VA narratives provide important information that can be used by a machine learning system for automated CoD classification. Unlike the structured questionnaire-based methods, this method can be applied to any verbal autopsy dataset, regardless of the collection process or country of origin.

SUBMITTER: Jeblee S

PROVIDER: S-EPMC6617656 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automatically determining cause of death from verbal autopsy narratives.

Jeblee Serena S Gomes Mireille M Jha Prabhat P Rudzicz Frank F Hirst Graeme G

BMC medical informatics and decision making 20190709 1

<h4>Background</h4>A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person's death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone.<h4>Methods</h4>After preprocessing and spelling correction, our method extracts word freque ...[more]

PMID: 31288814

Similar Datasets

Project description:BackgroundThe South African national cause of death validation (NCODV 2017/18) project collected a national sample of verbal autopsies (VA) with cause of death (COD) assignment by physician-coded VA (PCVA) and computer-coded VA (CCVA).ObjectiveThe performance of three CCVA algorithms (InterVA-5, InSilicoVA and Tariff 2.0) in assigning a COD was compared with PCVA (reference standard).MethodsSeven performance metrics assessed individual and population level agreement of COD assignment by age, sex and place of death subgroups. Positive predictive value (PPV), sensitivity, overall agreement, kappa, and chance corrected concordance (CCC) assessed individual level agreement. Cause-specific mortality fraction (CSMF) accuracy and Spearman's rank correlation assessed population level agreement.ResultsA total of 5386 VA records were analysed. PCVA and CCVAs all identified HIV/AIDS as the leading COD. CCVA PPV and sensitivity, based on confidence intervals, were comparable except for HIV/AIDS, TB, maternal, diabetes mellitus, other cancers, and some injuries. CCVAs performed well for identifying perinatal deaths, road traffic accidents, suicide and homicide but poorly for pneumonia, other infectious diseases and renal failure. Overall agreement between CCVAs and PCVA for the top single cause (48.2-51.6) indicated comparable weak agreement between methods. Overall agreement, for the top three causes showed moderate agreement for InterVA (70.9) and InSilicoVA (73.8). Agreement based on kappa (-0.05-0.49)and CCC (0.06-0.43) was weak to none for all algorithms and groups. CCVAs had moderate to strong agreement for CSMF accuracy, with InterVA-5 highest for neonates (0.90), Tariff 2.0 highest for adults (0.89) and males (0.84), and InSilicoVA highest for females (0.88), elders (0.83) and out-of-facility deaths (0.85). Rank correlation indicated moderate agreement for adults (0.75-0.79).ConclusionsWhilst CCVAs identified HIV/AIDS as the leading COD, consistent with PCVA, there is scope for improving the algorithms for use in South Africa.

Project description:BackgroundVerbal autopsy (VA) has been a crucial tool in ascertaining population-level cause of death (COD) estimates, specifically in countries where medical certification of COD is relatively limited. The World Health Organization has released an updated instrument (Verbal Autopsy Instrument 2022) that supports electronic data collection methods along with analytical software for assigning COD. This questionnaire encompasses the primary signs and symptoms associated with prevalent diseases across all age groups. Traditional methods have primarily involved paper-based questionnaires and physician-coded approaches for COD assignment, which is time-consuming and resource-intensive. Although computer-coded algorithms have advanced the COD assignment process, data collection in densely populated countries like India remains a logistical challenge.ObjectiveThis study aimed to develop an Android-based mobile app specifically tailored for streamlining VA data collection by leveraging the existing Indian public health workforce. The app has been designed to integrate real-time data collection by frontline health workers and seamless data transmission and digital reporting of COD by physicians. This process aimed to enhance the efficiency and accuracy of COD assignment through VA.MethodsThe app was developed using Android Studio, the primary integrated development environment for developing Android apps using Java. The front-end interface was developed using XML, while SQLite and MySQL were employed to streamline complete data storage on the local and server databases, respectively. The communication between the app and the server was facilitated through a PHP application programming interface to synchronize data from the local to the server database. The complete prototype was specifically built to reduce manual intervention and automate VA data collection.ResultsThe app was developed to align with the current Indian public health system for district-level COD estimation. By leveraging this mobile app, the average duration required for VA data collection to ascertainment of COD, which typically ranges from 6 to 8 months, is expected to decrease by approximately 80%, reducing it to about 1-2 months. Based on annual caseload projections, the smallest administrative public health unit, health and wellness centers, is anticipated to handle 35-40 VA cases annually, while medical officers at primary health centers are projected to manage 150-200 physician-certified VAs each year. The app's data collection and transmission efficiency were further improved based on feedback from user and subject area experts.ConclusionsThe development of a unified mobile app could streamline the VA process, enabling the generation of accurate national and subnational COD estimates. This mobile app can be further piloted and scaled to different regions to integrate the automated VA model into the existing public health system for generating comprehensive mortality statistics in India.

Dataset Information

Automatically determining cause of death from verbal autopsy narratives.

Background

Methods

Results

Conclusions

Publications

Automatically determining cause of death from verbal autopsy narratives.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets