Free text clinical narratives in healthcare: A multi-lingual challenge

Healthcare has hundreds of languages that require hundreds of algorithms to understand them. Learn how clinical terminology fits into the mix.
clinical terminology

I went to the movie theater a couple of weeks ago to watch a popular and much-talked-about Indian movie with my family. The movie’s title was Ponniyin Selvan. You may be able to pronounce that phrase correctly since it’s written with letters from the English alphabet, but clearly, it has no meaning for someone who speaks only English. Now, if you are from the south of India and speak a regional language called Tamil (in addition to English), then that term would make immediate sense even though it’s written in English.

In a way, this English title for a Tamil movie encapsulates the fundamental problem with applying natural language processing (NLP) techniques to free-text clinical narratives. These narratives can take the form of various documents, including pathology and radiology reports; history and physicals; discharge summaries; and progress notes. On the surface, they all seem to be written in English. But they are not. Often, the sentences in these documents do not conform to the rules of English grammar and are filled with abbreviations, acronyms, and jargon that is unintelligible to the untrained, English-speaking eye.

Each of these clinical documents represents a unique language that can only be understood by a “tribe” that has had formal training in a particular medical discipline. Replace the trained human eye with an NLP algorithm used to read and derive meaning from these narratives, and you are faced with the same problem. An NLP algorithm that has only been trained on well-formed English documents will perform poorly when it encounters real world free-text clinical narratives. Likewise, an NLP algorithm trained on a steady diet of pathology reports will probably perform poorly when it encounters a radiology report.

An NLP algorithm that has only been trained on well-formed English documents will perform poorly when it encounters real world free-text clinical narratives.

Training algorithms on robust clinical terminology

Health care has hundreds of languages” says David Talby, the CTO of John Snow Labs, each with its own language model. If that is true, and we at IMO Health believe that it is, then there is no “one-size-fits-all” clinical NLP algorithm. We need hundreds of algorithms that are trained specifically to read and understand these multi-lingual clinical narratives. Furthermore, these algorithms need the benefit of clinical terminology that captures the multifarious ways of describing clinical concepts as part of their training regimen.

IMO Health recognized the multi-lingual and fragmented nature of clinical terminology more than 25 years ago. The company has painstakingly worked to curate millions of commonly used variations of terms for problems, diagnoses, procedures, medications, labs, vaccinations, and much more. These terms – when combined with sophisticated NLP models and arranged in carefully thought through pipelines – have the potential to mitigate healthcare’s “Tower of Babel” problem.

For a closer look at NLP in clinical documentation, click here.

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Latest Ideas​

Article
The HCC V28 rollout continues. Learn how to navigate the shift, manage dual models, and ensure accurate clinical documentation.
Article
AI in healthcare is one of the hottest topics today – and for good reason. Brush up on key terms in this
Article
Facing inconsistent healthcare data? Discover how NLP and clinical terminology can clean your dirty data lake and enhance the accuracy of autonomous