Ask an expert: How do we normalize data in healthcare?

Healthcare data standardization is important, but that doesn’t mean it’s easy. So, what do we mean when we talk about normalization in healthcare?
healthcare data standardization

For those in the health IT world, the concept of healthcare data standardization – often called data normalization – is nothing new. But what this idea means isn’t always clear, particularly in our field. Indeed, data normalization in general and data normalization in healthcare aren’t exactly the same thing. So, what’s unique about healthcare data standardization? Greg Aldin, IMO’s Director of Operational Analytics, breaks it down.

IMO: Why is healthcare data standardization – or normalization – so important?

Greg Aldin: When data is not normalized, it doesn’t have a unified representation – meaning that information from different sources can’t work together to support analytics and other important secondary uses of the data. Having appropriately normalized data ensures patients are represented as accurately as possible and allows existing systems to have both better outcomes and lower costs when it comes to metrics like clinical decision support, population health, and interoperability.

IMO: Can organizations just use data that isn’t normalized?

GA: In short – no, not really. Whenever healthcare organizations aim to use data collected from a number of disparate sources, it’s extremely difficult and time-consuming to query if it’s not represented consistently. We have seen mature health systems think they are doing well as long as all of their data connects to at least some sort of standardized code, only to find that their patients’ diagnosis data is spread across systems like ICD-9-CM, SNOMED CT®, and ICD-10-CM. Querying it then becomes not only time consuming, but also inaccurate. This deficiency can pose real patient safety risks for healthcare organizations when inconsistent inputs are used to make decisions about clinical decision support and best practice workflows.

IMO: How is data normalized in healthcare?

GA: All of the sources of data in the healthcare ecosystem represent their information differently –  from pharmacies to payers to labs. An effective normalization process takes into consideration the nuances of those sources; leverages a normalization application specific to that content and domain; and outputs data in a unified structure. An effective strategy is also to leverage clinical terminologies in those outputs, enabling data representation in a way that isn’t the least common denominator of a standardized code system.

IMO: What are some challenges of data normalization unique to healthcare?

GA: Much of it comes down to the initial data entry. When information isn’t entered in a complete, specific way at the point of care, it can be difficult to retroactively add the needed details to a specific record. And patients often receive care from providers at different locations, over extended periods of time, and within multidisciplinary teams of specialists. Varying data formats and idiosyncrasies in vocabulary, terminology, and abbreviations at different institutions and among individual clinicians also contribute to a data pool rife with inconsistencies. Such differences must be transformed and translated into a unified clinical terminology that is matched and mapped to the appropriate standardized codes. For instance, if a  CPT® code is captured in a patient record but the accompanying text description is a completely different procedure, which do you trust?

IMO: How can data normalization fail in healthcare?

GA: Data normalization can fail when the initial information is not complete enough and cannot be translated into a common representation. Additionally, building natural language processing (NLP) engines for normalization purposes while lacking a sufficient understanding of how clinicians use data – and without a team of clinical informaticists to refine the outputs – can easily introduce errors. In this case, matching data to a best-fit standardized term may assume information, correctly or incorrectly, that the healthcare provider did not explicitly state. This, in turn, can lead to downstream decisions about patient care being made based on faulty or inaccurate information.

We also see normalization engines built without a deep understanding of the world of healthcare and repurposing tech from other fields, leading to dangerous results. For example, when using these engines the number of patients diagnosed with square decimeter instead of diabetes mellitus type 2 is staggering – and that happens just because output domains of the normalization engine weren’t properly bounded for a DM2 input.

IMO: What does the future of data normalization look like?

GA: A large part of the future of data normalization in healthcare will come from capitalizing on technological advances, like NLP. Since many providers document care in sections of the electronic health record (EHR) that are difficult to codify – particularly progress notes – using a tool like NLP allows data analysts to extract information from those unstructured documents. This effort is critical to the goal of full data normalization in healthcare and is one of the most active areas of development right now.

To learn more about IMO’s data normalization solution, IMO Precision Normalize, click here.

CPT is a registered trademark of the American Medical Association. All rights reserved.

SNOMED and SNOMED CT are registered trademarks of SNOMED International.

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Latest Ideas​

Hear how the Piedmont team approached a successful implementation of IMO Health’s surgical scheduling data solution and take away helpful best practices
In May, the AMA met to discuss adding more CPT codes to the RPM section – but updates and revisions have been
Learn how value sets impact data use and EHR workflows, plus how organizations can enhance their creation and maintenance with innovative tools.