| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Paper |
Affiliations of the authors: Department of Biomedical Informatics, Columbia University (GBM, GH); and Medical Informatics Services, NewYork-Presbyterian Hospital (GH), New York, NY.
Correspondence and reprints: George Hripcsak, MD, MS, Department of Biomedical Informatics, Columbia University, 622 West 168th Street, Vanderbilt Clinic, 5th Floor, New York, NY 10032; e-mail: <hripcsak{at}columbia.edu>.
Received for publication: 01/19/05; accepted for publication: 03/20/05.
| Abstract |
|---|
|
|
|---|
Design: An adverse event detection system for discharge summaries using the NLP system MedLEE was constructed to identify 45 NYPORTS event types. The system was first applied to a random sample of 1,000 manually reviewed charts. The system then processed all inpatient cases with electronic discharge summaries for two years. All system-identified events were reviewed, and performance was compared with traditional reporting.
Measurements: System sensitivity, specificity, and predictive value, with manual review serving as the gold standard.
Results: The system correctly identified 16 of 65 events in 1,000 charts. Of 57,452 total electronic discharge summaries, the system identified 1,590 events in 1,461 cases, and manual review verified 704 events in 652 cases, resulting in an overall sensitivity of 0.28 (95% confidence interval [CI]: 0.170.42), specificity of 0.985 (CI: 0.9840.986), and positive predictive value of 0.45 (CI: 0.420.47) for detecting cases with events and an average specificity of 0.9996 (CI: 0.99960.9997) per event type. Traditional event reporting detected 322 events during the period (sensitivity 0.09), of which the system identified 110 as well as 594 additional events missed by traditional methods.
Conclusion: NLP is an effective technique for detecting a broad range of adverse events in text documents and outperformed traditional and previous automated adverse event detection methods.
| Introduction |
|---|
|
|
|---|
Health care policy makers and practitioners will need information technology coupled with improved data collection to improve patient safety.9 Computerized systems for event detection rely on signals suggestive of adverse events both in the case of impending events (for prevention) and of events that have occurred (for management).10 For example, a discharge diagnosis of myocardial infarction in a patient with an unrelated surgical admission diagnosis might indicate an adverse event. Event detection systems reduce the cost of chart review by identifying those cases that are most appropriate for review.6 Successful systems require sufficient positive predictive value to avoid needless chart review and sufficient sensitivity to gather a meaningful number of events.
Most adverse event detection systems exploit numeric or coded data derived from patient registration, pharmacy orders, admission and discharge diagnoses, clinical laboratory results, and ancillary information systems.11,12,13,14,15,16 Investigators have studied adverse event detection from the perspective of adverse drug events, dangerous laboratory values, failure to follow critical paths, and other events. Although these adverse detection systems often perform well, they are limited because they require clinical data that are in coded format.
Unfortunately, most institutions lack a detailed record of their patients' care in coded electronic format. Symptoms, physical findings, and clinical reasoning are recorded as narrative text in notes but are unavailable in coded form. The lack of coded information limits the performance of event detection systems and limits the breadth of events that they can detect.
Narrative clinical notes such as discharge summaries, operative reports, clinic notes, and nursing notes are increasingly available in electronic form either through transcription or direct data entry. Investigators have begun to exploit these documents for event detection by looking for notes with relevant words ("trigger words") such as "iatrogenic," "error," or "perforation."17,18 This technique helps, but its predictive value remains low, largely because it is difficult to distinguish whether a clinician is saying that a condition is present, is absent, or was present in the past. Natural language processing is an automated technique that converts narrative documents into a coded form that is appropriate for computer-based analysis.
Natural language processing has been used successfully for several specific domains of medicine19,20,21,22,23,24 and for the detection of specific adverse events, such as falls and nosocomial infections.10,25 It is unclear, however, whether natural language processors can detect a wide range of complex adverse events accurately enough to assist health care institutions meaningfully. In this study, we built an event detection system for electronic discharge summaries using an existing, noncommercial natural language processor, MedLEE,26 in an effort to detect a broad range of adverse events.
| Background |
|---|
|
|
|---|
The certainty and status fields indicate that the diagnosis is unsure ("moderate" certainty) and that if the myocardial infarction did occur, it occurred in the past. A detailed overview of MedLEE has been published.21,26
The New York Patient Occurrence Reporting and Tracking System (NYPORTS) is a mandatory adverse event reporting framework instituted in 1996 for all health care institutions in New York State.27 We used the criteria for each of the 45 patient-related hospital-based adverse event types defined in NYPORTS (Appendix 1); they represent a broad range of adverse events.
Many NYPORTS adverse event types are complex. For example, NYPORTS event type 751 includes falls in the hospital resulting in an x-rayproven fracture, a subdural or epidural hematoma, cerebral contusion, traumatic subarachnoid hemorrhage, or internal organ trauma. The event type excludes falls that occur outside of the institution or that result in only soft tissue injuries. NYPORTS event type 604 includes perioperative myocardial infarction within 48 hours of an operative procedure. The procedure must not be cardiac related, birth related, an abdominal aortic aneurysm rupture, or a multiple trauma.
| Methods |
|---|
|
|
|---|
The adverse event detection system28 comprised the MedLEE natural language processor21,26 and a set of criteria that mapped each MedLEE-coded discharge summary to the adverse events that occurred during the admission. The inclusion and exclusion criteria for each event were implemented as a computer query, which is a short program that includes logic and terms from MedLEE's vocabulary. MedLEE converted each discharge summary to a coded form, and the 45 computer queries converted that coded form to a list of events that appeared to have occurred during each admission. The computer queries were developed iteratively; we tested them on discharge summaries from the years 1990 to 1995 (before implementation of NYPORTS), modified the queries to improve performance, and retested them on the cohort.
System Evaluation
Manual chart review served as the gold standard. We assessed the reliability of the reviewers on 100 cases as follows. Two reviewers, a physician coauthor (GBM) and an informatician independent of this study, identified NYPORTS events in 100 cases selected randomly but stratified so that about 40% had events. The reviewers' raw agreement was 0.97, and chance-corrected agreement (kappa) was 0.94. This high agreement justified the use of a single reviewer per case.
Reliability of the data sources was assessed on 1,000 randomly selected cases in which the physician identified NYPORTS events using (1) the discharge summaries alone, (2) the full electronic chart, and (3) for a subset of 100, the combined electronic and paper charts. Electronic charts included discharge summaries, operative reports, pathology reports, laboratory results, radiology results, registration data including coded diagnoses and procedures, residents' transfer of service notes, and other ancillary notes, but they contained few admission notes, progress notes, or nursing notes. The paper chart supplied the latter missing notes. We calculated the agreement among the three data sources.
Performance of the system was assessed with the same 1,000 random cases from 1996 and 2000 used for the full data reliability dataset. These cases were used to obtain an unbiased and direct estimate of sensitivity and specificity of the system for identifying cases that had NYPORTS events. The system identified apparent events based on discharge summaries. The physician manually reviewed the electronic chart for each case and determined which NYPORTS events had clearly occurred in the case.
System performance was then assessed using all electronic discharge summaries from 1996 and 2000 to get a more precise estimate of the positive predictive value and performance on individual event types. The physician reviewed those discharge summaries that the system identified as having events. An identification was considered correct only if the system selected the correct NYPORTS event type.
Finally, to assess how the system might work in practice, we compared the events that were detected by the system and confirmed by the physician reviewer with the events that were actually detected during those years using traditional event detection techniques. In 1996 and 2000, hospital personnel reported candidate NYPORTS events in one of three ways: (1) direct phone calls from practitioners, patients, and other hospital personnel; (2) incident reports from practitioners; and (3) report forms completed by case management personnel in conjunction with utilization review. Hospital personnel then determined the veracity of candidate NYPORTS events by manual screening of the electronic chart and, if needed, the paper chart.
The institutional review board approved the study and waived informed consent for this retrospective review.
| Results |
|---|
|
|
|---|
System Performance on 1,000 Cases
Table 1 shows the performance of the system for detecting cases with at least one adverse event, based on the 1,000 case set. "True events" are those identified by manual review of the electronic chart, and "apparent events" are those identified by the system. The system correctly identified 15 of 53 cases with events. Table 2 shows the performance of the system for detecting individual events, based on the 1,000 case set. The system correctly identified 16 of 65 true events and incorrectly identified 49. Event specificity (0.9996 in Table 2) exceeds case specificity (0.982 in Table 1) because case specificity is subject to the sum of the false-positive rates of all the event types, whereas event specificity represents the average specificity expected for an investigator interested in a single NYPORTS event type.
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
The current system, when compared with other adverse event detection systems using text documents, is unique in its ability to both recognize a broad range of events and identify the specific event type in each case. Thus, it enables highly focused manual review to detect a significant fraction of events at minimal cost.
Most previous studies of automated adverse event detection from narrative documents used simple text search techniques and achieved limited success. In two studies of adverse drug event detection in the outpatient setting using automated text searching in clinic notes, the text search method performed well compared with other automated methods but achieved positive predictive values of only 7%13 and 12%.5 In a different study, text searching in discharge summaries, residents' transfer of service notes, and outpatient visit notes using the search terms "mistake," "error," "incorrect," and "iatrogenic" to find medical errors identified a broad range of medical errors and had positive predictive values ranging from 3.4% to 24.4%.17 The system did not distinguish among the event types, however, and its sensitivity was less than 4%. In a study of text searching on discharge summaries to identify a broad range of events, the system returned 59% of discharge summaries with a predictive value of 52%.18 Because the prevalence of these nonspecific events in the underlying sample was 45%, however, the predictive value was only moderately higher than would be achieved by random sampling. Our system identified specific event types, with average prevalence per event type of less than 1%, and it still achieved a positive predictive value of 44% per event.
In addition, a recent report by Forster et al.29 described the validation of an adverse event detection instrument for discharge summaries using term searching. In contrast to the current study, which contains a direct reliability study, that report used an established instrument. The authors reported a positive predictive value of 0.41, a sensitivity of 0.23, and a specificity of 0.92. The predictive value of 0.41 must be interpreted in light of the high underlying prevalence of adverse events, which was 20% (48 of 245) in the reported case sample using a broad definition of adverse events. In addition to achieving a comparable predictive value with rare and specific events, our system achieved a better specificity and identified the exact event type.
Our reliability studies, which were conducted to verify the rater and data sources, revealed that NYPORTS events were straightforward for clinicians to identify with manual review and that discharge summaries contain most NYPORTS adverse events. Although the raters had little difficulty with manual review, query development for these events was a long and intricate task for system developers. Queries were developed in an iterative manner with many rounds often necessary to decrease both false negatives and false positives. Because of the large amount of complexity surrounding these adverse event definitions with respect to inclusion and exclusion criteria, however, mimicking the natural reasoning of a clinician within an automated query was difficult.
For example, an area being actively investigated by others,30 which was particularly difficult in this project, was reasoning with respect to time. While MedLEE does have some time representations for dates and other simple time structures, its current capabilities with respect to these issues are limited. Certain time reasoning could be inferred, such as an event occurring after another event using collocation information in the text. Many other time-reasoning issues, however, were not easily modeled in the queries. For instance, five postoperative NYPORTS events require that the event occurred within 48 hours of the procedure (events 601 to 605, see Appendix 1). Modeling a time difference of 48 hours with the coded data from MedLEE was difficult. The addition of other data sources, in addition to other text documents, to augment the system could potentially improve time reasoning as well as improve overall data modeling for the event detection system.
Although the system was successful in detecting NYPORTS events, there are important adverse event types that the NYPORTS structure does not include or sometimes explicitly excludes. For instance, the NYPORTS adverse event criteria for iatrogenic pneumothorax include solely those pneumothoraces due to an intravascular catheter and exclude other iatrogenic causes, including thoracocentesis or lung biopsy. For this reason, the system would need modification if the goal were to obtain all possible adverse events of potential interest.
While the overall performance of the system was excellent compared with that of other text-processing adverse event detection systems, system performance at the event or query level varied somewhat by event type. Many event types had a low event prevalence (Appendix 1), so the performance for individual event types could not be determined accurately. Nevertheless, certain queries were more difficult to implement in an automated fashion than others, resulting in variable system performance. Another central issue, in addition to issues with time reasoning, was handling event criteria not typically contained explicitly in the discharge summary. This required indirect modeling in the query (e.g., the use of conscious sedation was indirectly modeled by detecting procedures that typically use conscious sedation). The addition of other data sources could potentially enhance system performance by directly supplying this inferred information.
One potential source of bias in this study was that only patients with electronic discharge summaries were included. Patients who stayed less than 48 hours did not require a discharge summary, and sometimes summaries were simply missing from the record. This group may have had a different event rate than those included in the study.
An important aspect of this technology is its straightforward transferability to other institutions. Previous experience using the MedLEE natural language processor at other institutions suggests that performance should be comparable and that adjusting the computer queries should reduce any loss of performance.31 For patients with electronic discharge summaries, the overhead of using the system should be minimal. There are minor formatting requirements, and standardized section headings are helpful but not mandatory. Transferability is limited in two ways: (1) not all patients have discharge summaries, typically due to short hospital stays or lack of clinician compliance, and (2) some institutions do not currently have discharge summaries in electronic form. The MedLEE natural language processing component can process a broad range of documents, and extending the adverse event detection system to progress notes, operative reports, consult notes, and ancillary reports would likely result in the detection of additional adverse events.
Moreover, system specificity is high enough to make nationwide screening feasible. For example, if electronic discharge summaries were available for all inpatients, then an investigator interested in wound dehiscence (event 805) could run the system on the 30 million admissions expected per year32 and produce about 11,000 cases with about 11,000 false positives (from Appendix 1, event positive predictive value of 0.51 with approximately one case returned by the system for every 1,350 discharge summaries).
Natural language processing may revolutionize adverse event reporting and may play a significant role in adverse event prevention and other forms of intervention. The described system tripled the number of detected events without impeding or increasing the clinicians' workflow, as the operation of our system on discharge summaries was completely automated and transparent to clinicians. As health care moves from simple detection to actual intervention and prevention, the system may become even more important. Processing takes only about a second per document, and MedLEE processes documents at our institution as they are created. In contrast to retrospective manual detection and to voluntary reporting in which clinicians must know about and decide to report an event, natural language processing can provide immediate feedback to clinicians for issues of which they may be unaware. For example, MedLEE processing of chest radiograph reports reduced the rate of erroneously assigning patients with active tuberculosis to nonprivate rooms by almost one half.33
| Conclusion |
|---|
|
|
|---|
| Appendix 1. Events Identified by the Automated Adverse Event Detection System and by Traditional Event Detection on 1,000 Cases and on 57,452 Cases |
|---|
|
|
|---|
|
| Footnotes |
|---|
The authors thank Carol Friedman for the use of the natural language processor MedLEE (National Library of Medicine grant support R01 LM06274 and R01 LM07659), Sue West for her assistance with institutional NYPORTS reporting, and Karina Tulipano for serving as a case reviewer.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. M. Kilbridge and D. C. Classen The Informatics Opportunities at the Intersection of Patient Safety and Clinical Informatics J. Am. Med. Inform. Assoc., July 1, 2008; 15(4): 397 - 407. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. V.S. Pakhomov, P. L. Hanson, S. S. Bjornsen, and S. A. Smith Automatic Classification of Foot Examination Findings Using Clinical Notes and Machine Learning J. Am. Med. Inform. Assoc., March 1, 2008; 15(2): 198 - 202. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane Identifying Patient Smoking Status from Medical Discharge Records J. Am. Med. Inform. Assoc., January 1, 2008; 15(1): 14 - 24. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. L. Hinrichsen, B. Kruskal, M. A. O'Brien, T. A. Lieu, R. Platt, and Vaccine Safety Datalink Team Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events J. Am. Med. Inform. Assoc., November 1, 2007; 14(6): 731 - 735. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wright, H. Goldberg, T. Hongsermeier, and B. Middleton A Description and Functional Taxonomy of Rule-based Decision Support Content at a Large Integrated Delivery Network J. Am. Med. Inform. Assoc., July 1, 2007; 14(4): 489 - 496. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N Cantor, H. J Feldman, and M. M Triola Using trigger phrases to detect adverse drug reactions in ambulatory care notes Qual. Saf. Health Care, April 1, 2007; 16(2): 132 - 134. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Turchin, N. S. Kolatkar, R. W. Grant, E. C. Makhni, M. L. Pendergrass, and J. S. Einbinder Using Regular Expressions to Abstract Blood Pressure and Treatment Intensification Information from the Text of Physician Notes J. Am. Med. Inform. Assoc., November 1, 2006; 13(6): 691 - 695. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. C. Wu and N. Shah Corporate strategies for computerization. Ann Intern Med, September 5, 2006; 145(5): 395 - 395. [Full Text] [PDF] |
||||
![]() |
P M Kilbridge and D C Classen Automated surveillance for adverse events in hospitalized patients: back to the future. Qual. Saf. Health Care, June 1, 2006; 15(3): 148 - 149. [Full Text] [PDF] |
||||
![]() |
M K Szekendi, C Sullivan, A Bobb, J Feinglass, D Rooney, C Barnard, and G A Noskin Active surveillance using electronic triggers to detect adverse events in hospitalized patients. Qual. Saf. Health Care, June 1, 2006; 15(3): 184 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Miller, R. M. Gardner, K. B. Johnson, and G. Hripcsak Clinical Decision Support and Electronic Prescribing Systems: A Time for Responsible Thought and Action J. Am. Med. Inform. Assoc., July 1, 2005; 12(4): 403 - 409. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |