Modernizing public health reporting with NLP

Electronic case reporting systems can help speed disease surveillance, but only if they’re capable of integrating EHR unstructured data – at scale.
EHR unstructured data

Prompted by the response to the COVID-19 pandemic, the Centers for Disease Control and Prevention (CDC) launched its Data Modernization Initiative (DMI) to enhance the ability of the nation’s public health agencies (PHAs) to monitor and respond to emerging public health threats. Yet to achieve the goals of this ambitious project, providers will need to harness the power of the latest tools in informatics – such as natural language processing (NLP) – to meet the data demands required to modernize the public health infrastructure.   

Part of the modernization process involves digitizing the case reporting system. Case reports are the critical first step in disease surveillance whereby a clinician reports to the PHA upon encountering a patient with a suspected or confirmed case of a reportable disease. Currently, many providers rely on manual communication methods – such as fax, phone, or email – to submit case reports to state or local PHAs. Unfortunately, this manual process leads to a lagging response due to the slow trickle of data coming from upstream.

This is why the DMI is pushing the adoption of electronic case reporting (eCR). eCR is the automated, real-time exchange of case report information between the electronic health record (EHR) and PHAs. Upon entering relevant data – such as diagnoses or lab results – into the EHR, an eCR platform can determine whether this information warrants a case report and, if so, automatically prepares one.

The challenge of codifying unstructured EHR data

The eCR is initiated when the platform recognizes specific trigger codes. These codes come from lists of value sets taken from the Value Set Authority Center (VSAC), a central repository for value sets maintained by the National Library of Medicine. These value sets contain list codes from common standardized terminologies (e.g. SNOMED CT®, ICD-10-CM, LOINC®) that are used to describe suspected and confirmed diagnoses, lab orders and results, and medications.  

For this process to work, data entered into the EHR must be properly codified. While this is a comparatively easy task for structured data elements that have been entered into discrete fields, it is much trickier when dealing with unstructured documents such as clinical notes. However, clinical notes often contain a wealth of information relevant to public health surveillance – such as travel history, pregnancy status, and occupation data – all of which are normally absent from common clinical documents like C-CDAs. The challenge then is integrating these unstructured data elements with other pieces of information to prepare a more complete report for PHAs.

NLP, clinical terminology, and unstructured data

NLP can relieve the burden of manually identifying and entering critical information from free text into the EHR by automatically extracting relevant terms from phrases such as “lives with mother who patient suspects has COVID-19.” When coupled with a comprehensive clinical terminology containing concepts pre-mapped to standardized codes, these extracted terms can be automatically linked to the right code – for example, ICD-10-CM code Z20.822, “Contact with and (suspected) exposure to COVID-19.” Thus codified, this data can then be fed into decision support engines to determine whether an initial electronic case report should be submitted, freeing providers from the time-consuming case reporting process so that they can devote their attention to their patients.

With a NLP engine driven by a robust clinical terminology, providers can leverage critical information contained in free text documents to quickly determine whether a case should be reported to PHAs. Coupled with the power of automatic eCR generation, case reporting can be transformed from a time-consuming, error-prone, and unreliable process into a quick, streamlined, and reliable method for providing essential, time-sensitive information to inform disease surveillance efforts and achieve the goals of the DMI in digitizing the nation’s public health infrastructure.

To learn how IMO is leveraging NLP to help our clients make the most of unstructured data, click here.   

SNOMED and SNOMED CT are registered trademarks of SNOMED International.

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Related Ideas