Natural language processing and IMO clinical terminology: A pilot study

In October, the Journal of Healthcare Informatics Research published a study that explored how IMO’s terminology – coupled with natural language processing tools – could improve clinical documentation at the point of care and harvest information from the terabytes of data stored in electronic health records. Below, Roger Gildersleeve, MD, Senior Clinical Terminologist at IMO, explains the research, its findings, and future implications.
medical coding system

When I started practicing medicine in the mid-1990s, I admitted patients I had never met before to the hospital, typically for severe problems. A central activity of the admission was creating the history and physical, or “H&P”, which told the story of how the patient went from their usual life to a place of serious illness and what we were going to do about it.

The ideas that form in a clinician’s mind are born of the story she and the patient create about the illness. It is narrative and evolving, often containing twists, turns, and uncertainties. Before electronic health records (EHRs), I would dictate an H&P that a transcriptionist then turned into a document within a paper chart. The healthcare team built on the story as care proceeded, and when the patient left the hospital I created a discharge summary. In a few minutes, a person could discern what we knew, what we didn’t know, how the patient was doing, and what to do next.

Electronic health records have evolved to offer great benefits since I started practicing and have also brought the problem list – the hub of knowledge about the patient’s conditions – into focus, albeit with benefits that have not yet been fully realized. Further, physicians spend a great deal of time keying data into templated screens and inserting canned text responses to meet documentation requirements. The patient’s narrative has become diluted in a sea of structured and semi-structured data.

EHRs: The next frontier

Technology that accurately transcribes dictated clinical notes into text already exists. But electronically processing these notes to find mentions of the patient’s problems is a step beyond. Presenting those problems as IMO lexicals (i.e., terms) that a clinician can insert into the problem list with a single click would be yet another enhancement. A few years ago, IMO, Nuance, and MEDITECH collaborated on a pilot project called Fact Finder to create that very system.

Beyond this point-of-care application, we also imagined the power of harvesting structured data from millions of archived text documents. This knowledge sits dormant without the person-power to read and translate it into computable data.  So IMO – in partnership with the UK’s Cambridge University Hospitals and the natural language processing (NLP) company Linguamatics – created a system to do just that.

Testing the technology

In October, the Journal of Healthcare Informatics Research published the findings from a study of the technology, which showed that the pilot project reliably extracted structured problems in the form of IMO lexicals from dictated free-text notes.

During our research, we loaded over 330,000 IMO problem terms into the Linguamatics system. Text-based notes from five specialties were fed into Linguamatics’ NLP engine and parsed into analyzable phrases. The phrases were matched to IMO terms using a customized algorithm, with flags for when statements were negated, uncertain, or asserted. As a control, clinicians carefully analyzed 60 notes and annotated places where language was synonymous with IMO problem terms, while also noting expressions of uncertainty, certainty, or negation. This “gold standard” was compared to the matches made by the automated system.

What we learned

In one analysis, we expected the matched term to be exactly synonymous with the source text, required accurate negation, and expected the engine to piece together terms when words were spread out over the span of an entire sentence. Even with this high expectation, the results were remarkable: 81% precision and 70% recall. These values are comparable with other NLP publications, but exceed performance of many in several ways.

For starters, the Linguamatics-IMO system essentially extracted the entire universe of clinical conditions, not just a single concept area. Furthermore, we did very little tuning of the NLP engine, but identified easy adjustments to reduce false negatives. We also noticed the system worked better for disorders and other clinical findings than for social or procedural history.

This demonstrates the value of IMO’s terminology when extracting structured data from clinical text at the point of care, giving clinicians more time to describe the patient’s story instead of managing the problem list. It also supports data mining from archived notes and shows the value of strong collaborative relationships between organizations in order to make the most of highly specialized expertise developed over decades.

To read the full article in the Journal of Healthcare Informatics Research, click here.

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Related Ideas