Dataset Information

Active learning: a step towards automating medical concept extraction.

ABSTRACT:

Objective

This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined.

Materials and methods

The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab.

Results

The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled.

Conclusion

Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation.

SUBMITTER: Kholghi M

PROVIDER: S-EPMC7784313 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Active learning: a step towards automating medical concept extraction.

Kholghi Mahnoosh M Sitbon Laurianne L Zuccon Guido G Nguyen Anthony A

Journal of the American Medical Informatics Association : JAMIA 20150807 2

<h4>Objective</h4>This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined.<h4>Materials and methods</h4>The comparative performance of an active learning framework and a fully supervised approach were inv ...[more]

PMID: 26253132

Similar Datasets

Project description:Quantitative susceptibility mapping (QSM) estimates the underlying tissue magnetic susceptibility from MRI gradient-echo phase signal and typically requires several processing steps. These steps involve phase unwrapping, brain volume extraction, background phase removal and solving an ill-posed inverse problem relating the tissue phase to the underlying susceptibility distribution. The resulting susceptibility map is known to suffer from inaccuracy near the edges of the brain tissues, in part due to imperfect brain extraction, edge erosion of the brain tissue and the lack of phase measurement outside the brain. This inaccuracy has thus hindered the application of QSM for measuring susceptibility of tissues near the brain edges, e.g., quantifying cortical layers and generating superficial venography. To address these challenges, we propose a learning-based QSM reconstruction method that directly estimates the magnetic susceptibility from total phase images without the need for brain extraction and background phase removal, referred to as autoQSM. The neural network has a modified U-net structure and is trained using QSM maps computed by a two-step QSM method. 209 healthy subjects with ages ranging from 11 to 82 years were employed for patch-wise network training. The network was validated on data dissimilar to the training data, e.g., in vivo mouse brain data and brains with lesions, which suggests that the network generalized and learned the underlying mathematical relationship between magnetic field perturbation and magnetic susceptibility. Quantitative and qualitative comparisons were performed between autoQSM and other two-step QSM methods. AutoQSM was able to recover magnetic susceptibility of anatomical structures near the edges of the brain including the veins covering the cortical surface, spinal cord and nerve tracts near the mouse brain boundaries. The advantages of high-quality maps, no need for brain volume extraction, and high reconstruction speed demonstrate autoQSM's potential for future applications.

Project description:BackgroundThe IUCN Red List of Threatened SpeciesTM (hereafter the Red List) is an important global resource for conservation that supports conservation planning, safeguarding critical habitat and monitoring biodiversity change (Rodrigues et al. 2006). However, a major shortcoming of the Red List is that most of the world's described species have not yet been assessed and published on the Red List (Bachman et al. 2019Eisenhauer et al. 2019). Conservation efforts can be better supported if the Red List is expanded to achieve greater coverage of mega-diverse groups of organisms such as plants, fungi and invertebrates. There is, therefore, an urgent need to speed up the Red List assessment and documentation workflow.One reason for this lack of species coverage is that a manual and relatively time-consuming procedure is usually employed to assess and document species. A recent update of Red List documentation standards (IUCN 2013) reduced the data requirements for publishing non-threatened or 'Least Concern' species on the Red List. The majority of the required fields for Least Concern plant species can be found in existing open-access data sources or can be easily calculated. There is an opportunity to consolidate these data and analyses into a simple application to fast-track the publication of Least Concern assessments for plants. There could be as many as 250,000 species of plants (60%) likely to be categorised as Least Concern (Bachman et al. 2019), for which automatically generated assessments could considerably reduce the outlay of time and valuable resources for Red Listing, allowing attention and resources to be dedicated to the assessment of those species most likely to be threatened.New informationWe present a web application, Rapid Least Concern, that addresses the challenge of accelerating the generation and documentation of Least Concern Red List assessments. Rapid Least Concern utilises open-source datasets, such as the Global Biodiversity Information Facility (GBIF) and Plants of the World Online (POWO) through a simple web interface. Initially, the application is intended for use on plants, but it could be extended to other groups, depending on the availability of equivalent datasets for these groups.Rapid Least Concern users can assess a single species or upload a list of species that are assessed in a batch operation. The batch operation can either utilise georeferenced occurrence data from GBIF or occurrence data provided by the user. The output includes a series of CSV files and a point map file that meet the minimum data requirements for a Least Concern Red List assessment (IUCN 2013). The CSV files are compliant with the IUCN Red List SIS Connect system that transfers the data files to the IUCN database and, pending quality control checks and review, publication on the Red List.We outline the knowledge gap this application aims to fill and describe how the application works. We demonstrate a use-case for Rapid Least Concern as part of an ongoing initiative to complete a global Red List assessment of all native species for the United Kingdom Overseas Territory of Bermuda.

Project description:Within computational neuroscience, the algorithmic and neural basis of structure learning remains poorly understood. Concept learning is one primary example, which requires both a type of internal model expansion process (adding novel hidden states that explain new observations), and a model reduction process (merging different states into one underlying cause and thus reducing model complexity via meta-learning). Although various algorithmic models of concept learning have been proposed within machine learning and cognitive science, many are limited to various degrees by an inability to generalize, the need for very large amounts of training data, and/or insufficiently established biological plausibility. Using concept learning as an example case, we introduce a novel approach for modeling structure learning-and specifically state-space expansion and reduction-within the active inference framework and its accompanying neural process theory. Our aim is to demonstrate its potential to facilitate a novel line of active inference research in this area. The approach we lay out is based on the idea that a generative model can be equipped with extra (hidden state or cause) "slots" that can be engaged when an agent learns about novel concepts. This can be combined with a Bayesian model reduction process, in which any concept learning-associated with these slots-can be reset in favor of a simpler model with higher model evidence. We use simulations to illustrate this model's ability to add new concepts to its state space (with relatively few observations) and increase the granularity of the concepts it currently possesses. We also simulate the predicted neural basis of these processes. We further show that it can accomplish a simple form of "one-shot" generalization to new stimuli. Although deliberately simple, these simulation results highlight ways in which active inference could offer useful resources in developing neurocomputational models of structure learning. They provide a template for how future active inference research could apply this approach to real-world structure learning problems and assess the added utility it may offer.

Dataset Information

Active learning: a step towards automating medical concept extraction.

Objective

Materials and methods

Results

Conclusion

Publications

Active learning: a step towards automating medical concept extraction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets