Dataset Information

Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents.

ABSTRACT:

Unlabelled

OBJECTIVE To describe a new medication information extraction system-Textractor-developed for the 'i2b2 medication extraction challenge'. The development, functionalities, and official evaluation of the system are detailed.

Design

Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algorithms, while other modules use regular expressions, rules, and dictionaries, and one module embeds MetaMap Transfer.

Measurements

The official evaluation was based on a reference standard of 251 discharge summaries annotated by all teams participating in the challenge. The metrics used were recall, precision, and the F(1)-measure. They were calculated with exact and inexact matches, and were averaged at the level of systems and documents.

Results

The reference metric for this challenge, the system-level overall F(1)-measure, reached about 77% for exact matches, with a recall of 72% and a precision of 83%. Performance was the best with route information (F(1)-measure about 86%), and was good for dosage and frequency information, with F(1)-measures of about 82-85%. Results were not as good for durations, with F(1)-measures of 36-39%, and for reasons, with F(1)-measures of 24-27%.

Conclusion

The official evaluation of Textractor for the i2b2 medication extraction challenge demonstrated satisfactory performance. This system was among the 10 best performing systems in this challenge.

SUBMITTER: Meystre SM

PROVIDER: S-EPMC2995680 | biostudies-literature | 2010 Sep-Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents.

Meystre Stéphane M SM Thibault Julien J Shen Shuying S Hurdle John F JF South Brett R BR

Journal of the American Medical Informatics Association : JAMIA 20100901 5

<h4>Unlabelled</h4>OBJECTIVE To describe a new medication information extraction system-Textractor-developed for the 'i2b2 medication extraction challenge'. The development, functionalities, and official evaluation of the system are detailed.<h4>Design</h4>Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algo ...[more]

PMID: 20819864

Similar Datasets

Project description:BackgroundFor ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind.ResultsEnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings.Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations.ConclusionEnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.

Project description:BackgroundA lack of safety data on postpartum medication use presents a potential barrier to breastfeeding and may result in infant exposure to medications in breastmilk. The type and extent of medication use by lactating women requires investigation.MethodsData were collected from the CHILD Cohort Study which enrolled pregnant women across Canada between 2008 and 2012. Participants completed questionnaires regarding medications and non-prescription medications used and breastfeeding status at 3, 6 and 12 months postpartum. Medications, along with self-reported reasons for medication use, were categorized by ontologies [hierarchical controlled vocabulary] as part of a large-scale curation effort to enable more robust investigations of reasons for medication use.ResultsA total of 3542 mother-infant dyads were recruited to the CHILD study. Breastfeeding rates were 87.4%, 75.3%, 45.5% at 3, 6 and 12 months respectively. About 40% of women who were breastfeeding at 3 months used at least one prescription medication during the first three months postpartum; this proportion decreased over time to 29.5% % at 6 months and 32.8% at 12 months. The most commonly used prescription medication by breastfeeding women was domperidone at 3 months (9.0%, n = 229/2540) and 6 months (5.6%, n = 109/1948), and norethisterone at 12 months (4.1%, n = 48/1180). The vast majority of domperidone use by breastfeeding women (97.3%) was for lactation purposes which is off-label (signifying unapproved use of an approved medication). Non-prescription medications were more often used among breastfeeding than non-breastfeeding women (67.6% versus 48.9% at 3 months, p < 0.0001), The most commonly used non-prescription medications were multivitamins and Vitamin D at 3, 6 and 12 months postpartum.ConclusionsIn Canada, medication use is common postpartum; 40% of breastfeeding women use prescription medications in the first 3 months postpartum. A diverse range of medications were used, with many women taking more than one prescription and non-prescription medicines. The most commonly used prescription medication by breastfeeding women were domperidone for off-label lactation support, signalling a need for more data on the efficacy of domperidone for this indication. This data should inform research priorities and communication strategies developed to optimize care during lactation.

Dataset Information

Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents.

Unlabelled

Design

Measurements

Results

Conclusion

Publications

Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets