Dataset Information

Using natural language processing to extract mammographic findings.

ABSTRACT:

Objective

Structured data on mammographic findings are difficult to obtain without manual review. We developed and evaluated a rule-based natural language processing (NLP) system to extract mammographic findings from free-text mammography reports.

Materials and methods

The NLP system extracted four mammographic findings: mass, calcification, asymmetry, and architectural distortion, using a dictionary look-up method on 93,705 mammography reports from Group Health. Status annotations and anatomical location annotation were associated to each NLP detected finding through association rules. After excluding negated, uncertain, and historical findings, affirmative mentions of detected findings were summarized. Confidence flags were developed to denote reports with highly confident NLP results and reports with possible NLP errors. A random sample of 100 reports was manually abstracted to evaluate the accuracy of the system.

Results

The NLP system correctly coded 96-99 out of our sample of 100 reports depending on findings. Measures of sensitivity, specificity and negative predictive values exceeded 0.92 for all findings. Positive predictive values were relatively low for some findings due to their low prevalence.

Discussion

Our NLP system was implemented entirely in SAS Base, which makes it portable and easy to implement. It performed reasonably well with multiple applications, such as using confidence flags as a filter to improve the efficiency of manual review. Refinements of library and association rules, and testing on more diverse samples may further improve its performance.

Conclusion

Our NLP system successfully extracts clinically useful information from mammography reports. Moreover, SAS is a feasible platform for implementing NLP algorithms.

SUBMITTER: Gao H

PROVIDER: S-EPMC4408241 | biostudies-literature | 2015 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using natural language processing to extract mammographic findings.

Gao Hongyuan H Aiello Bowles Erin J EJ Carrell David D Buist Diana S M DS

Journal of biomedical informatics 20150203

<h4>Objective</h4>Structured data on mammographic findings are difficult to obtain without manual review. We developed and evaluated a rule-based natural language processing (NLP) system to extract mammographic findings from free-text mammography reports.<h4>Materials and methods</h4>The NLP system extracted four mammographic findings: mass, calcification, asymmetry, and architectural distortion, using a dictionary look-up method on 93,705 mammography reports from Group Health. Status annotation ...[more]

PMID: 25661260

Similar Datasets

Project description:IntroductionCommon goals for procedural sedation are to control pain and ensure the patient is not moving to an extent that is impeding safe progress or completion of the procedure. Clinicians perform regular assessments of the adequacy of procedural sedation in accordance with these goals to inform their decision-making around sedation titration and also for documentation of the care provided. Natural language processing could be applied to real-time transcriptions of audio recordings made during procedures in order to classify sedation states that involve movement and pain, which could then be integrated into clinical documentation systems. The aim of this study was to determine whether natural language processing algorithms will work with sufficient accuracy to detect sedation states during procedural sedation.DesignA prospective observational study was conducted.MethodsAudio recordings from consenting participants undergoing elective procedures performed in the interventional radiology suite at a large academic hospital were transcribed using an automated speech recognition model. Sentences of transcribed text were used to train and evaluate several different NLP pipelines for a text classification task. The NLP pipelines we evaluated included a simple Bag-of-Words (BOW) model, an ensemble architecture combining a linear BOW model and a "token-to-vector" (Tok2Vec) component, and a transformer-based architecture using the RoBERTa pre-trained model.ResultsA total of 15,936 sentences from transcriptions of 82 procedures was included in the analysis. The RoBERTa model achieved the highest performance among the three models with an area under the ROC curve (AUC-ROC) of 0.97, an F1 score of 0.87, a precision of 0.86, and a recall of 0.89. The Ensemble model showed a similarly high AUC-ROC of 0.96, but lower F1 score of 0.79, precision of 0.83, and recall of 0.77. The BOW approach achieved an AUC-ROC of 0.97 and the F1 score was 0.7, precision was 0.83 and recall was 0.66.ConclusionThe transformer-based architecture using the RoBERTa pre-trained model achieved the best classification performance. Further research is required to confirm the that this natural language processing pipeline can accurately perform text classifications with real-time audio data to allow for automated sedation state assessments.Clinical relevanceAutomating sedation state assessments using natural language processing pipelines would allow for more timely documentation of the care received by sedated patients, and, at the same time, decrease documentation burden for clinicians. Downstream applications can also be generated from the classifications, including for example real-time visualizations of sedation state, which may facilitate improved communication of the adequacy of the sedation between clinicians, who may be performing supervision remotely. Also, accumulation of sedation state assessments from multiple procedures may reveal insights into the efficacy of particular sedative medications or identify procedures where the current approach for sedation and analgesia is not optimal (i.e. a significant amount of time spent in "pain" or "movement" sedation states).

Project description:BackgroundDepression is a prevalent global mental health disorder with substantial individual and societal impact. Natural language processing (NLP), a branch of artificial intelligence, offers the potential for improving depression screening by extracting meaningful information from textual data, but there are challenges and ethical considerations.ObjectiveThis literature review aims to explore existing NLP methods for detecting depression, discuss successes and limitations, address ethical concerns, and highlight potential biases.MethodsA literature search was conducted using Semantic Scholar, PubMed, and Google Scholar to identify studies on depression screening using NLP. Keywords included "depression screening," "depression detection," and "natural language processing." Studies were included if they discussed the application of NLP techniques for depression screening or detection. Studies were screened and selected for relevance, with data extracted and synthesized to identify common themes and gaps in the literature.ResultsNLP techniques, including sentiment analysis, linguistic markers, and deep learning models, offer practical tools for depression screening. Supervised and unsupervised machine learning models and large language models like transformers have demonstrated high accuracy in a variety of application domains. However, ethical concerns related to privacy, bias, interpretability, and lack of regulations to protect individuals arise. Furthermore, cultural and multilingual perspectives highlight the need for culturally sensitive models.ConclusionsNLP presents opportunities to enhance depression detection, but considerable challenges persist. Ethical concerns must be addressed, governance guidance is needed to mitigate risks, and cross-cultural perspectives must be integrated. Future directions include improving interpretability, personalization, and increased collaboration with domain experts, such as data scientists and machine learning engineers. NLP's potential to enhance mental health care remains promising, depending on overcoming obstacles and continuing innovation.

Project description:BackgroundThe medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained.MethodsFor this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list.ResultsThe set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences.ConclusionThe global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information.

Dataset Information

Using natural language processing to extract mammographic findings.

Objective

Materials and methods

Results

Discussion

Conclusion

Publications

Using natural language processing to extract mammographic findings.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets