Integrating existing natural language processing tools for medication extraction from discharge summaries

Son Doan, Lisa Bastarache, Sergio Klimkowski, Joshua C. Denny, Hua Xu

Research output: Contribution to journalArticle

38 Citations (Scopus)

Abstract

Objective: To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration. Design: We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams. Measurements: The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge. Results: Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information. Conclusions: The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance.

Original languageEnglish (US)
Pages (from-to)528-531
Number of pages4
JournalJournal of the American Medical Informatics Association
Volume17
Issue number5
DOIs
StatePublished - Sep 1 2010
Externally publishedYes

Fingerprint

Natural Language Processing
Names
Pharmaceutical Preparations

ASJC Scopus subject areas

  • Health Informatics

Cite this

Integrating existing natural language processing tools for medication extraction from discharge summaries. / Doan, Son; Bastarache, Lisa; Klimkowski, Sergio; Denny, Joshua C.; Xu, Hua.

In: Journal of the American Medical Informatics Association, Vol. 17, No. 5, 01.09.2010, p. 528-531.

Research output: Contribution to journalArticle

@article{0d8407aa6fcb432b82da5674c4a53f78,
title = "Integrating existing natural language processing tools for medication extraction from discharge summaries",
abstract = "Objective: To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration. Design: We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams. Measurements: The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge. Results: Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information. Conclusions: The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance.",
author = "Son Doan and Lisa Bastarache and Sergio Klimkowski and Denny, {Joshua C.} and Hua Xu",
year = "2010",
month = "9",
day = "1",
doi = "10.1136/jamia.2010.003855",
language = "English (US)",
volume = "17",
pages = "528--531",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Integrating existing natural language processing tools for medication extraction from discharge summaries

AU - Doan, Son

AU - Bastarache, Lisa

AU - Klimkowski, Sergio

AU - Denny, Joshua C.

AU - Xu, Hua

PY - 2010/9/1

Y1 - 2010/9/1

N2 - Objective: To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration. Design: We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams. Measurements: The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge. Results: Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information. Conclusions: The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance.

AB - Objective: To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration. Design: We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams. Measurements: The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge. Results: Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information. Conclusions: The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance.

UR - http://www.scopus.com/inward/record.url?scp=78149492053&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149492053&partnerID=8YFLogxK

U2 - 10.1136/jamia.2010.003855

DO - 10.1136/jamia.2010.003855

M3 - Article

VL - 17

SP - 528

EP - 531

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -