Figure it out or fail: Extracting the value from unstructured data

Organizations may think that more healthcare data means more value, but the complexity of extracting that value can be challenging. Learn why.
EHR unstructured data

In most industries other than healthcare, handling data is usually straightforward; continuous, discrete, and categorical values can be neatly organized in cells and readily loaded into spreadsheet tools that enable data analysis. However, healthcare data is significantly more complex than data from other fields, making it far more challenging to manipulate and manage.

For companies dealing with an abundance of data, figuring out how to extract the full value of healthcare data can be daunting. Ultimately, unless a solution is implemented, especially within startups, this problem can cause businesses to fail.

Healthcare data is plentiful but not pretty

Whether you work for a health system looking to expand reporting capabilities, an organization that focuses on precision medicine, or a network of hospitals creating a health information exchange (HIE), it is imperative to recognize that healthcare data operates under a unique set of rules and regulations that must be strictly adhered to.

Healthcare data is typically categorized into two major types: 

  • Structured: Patient diagnoses, medications, immunization dates, allergies, and laboratory and test results
  • Unstructured: Clinical notes, treatment plans, image studies, and genomic information

Structured data is sometimes the more straightforward of the two to utilize and operationalize, but it often still requires standardization to fill in gaps. That’s because as patient data is extracted from and exchanged among sites and systems, it can become incomplete and inconsistent, making it less useful for analytics. Unstructured data, which also contains a wealth of information that aligns with the original assessment from the care provider, faces the same challenge – and then some.

Finding the value in unstructured data

With a considerable 70 to 80% of healthcare data being unstructured, unlocking its inherent value takes time and effort. It necessitates more than basic tools like out-of-the-box natural language processing (NLP) solutions or free researcher-developed solutions. More sophisticated methods are needed to transform this unstructured data from mere aggregation to curated, usable, and evidence-generating assets.

Typically, unstructured information is housed within Electronic Health Records (EHRs) or sometimes extracted via an Extract, Transform, Load (ETL) process for secondary purposes. The challenge, however, lies in pulling meaning from this data.

Think about a scenario in which you had to look for a book in a library that did not follow the Dewey Decimal System. Finding the book is doable but not simple, much like the technical challenges one would face trying to search through unstructured data.

Unfortunately, in many cases, manual intervention or interpretation is required to extract valuable information from healthcare data, which may opinionate or diverge from the original clinical intent of the care provider. And unless you leverage both structured and unstructured data, the semantics or original clinical intent of the data will be lost in translation, preventing you from ever achieving full value.


Regardless of the scenario, the process required to leverage both structured and unstructured clinical data is daunting and has no quick solution. Most organizations and institutional leaders do not fully understand the level of complexity and the level of investment necessary to obtain any sort of parity to leverage their data. And at the end of the day, if you can’t leverage your data, you can’t monetize it.  

Learn how IMO solutions can help translate unstructured data to be usable while retaining accuracy.

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Latest Ideas​

Hear how the Piedmont team approached a successful implementation of IMO Health’s surgical scheduling data solution and take away helpful best practices
In May, the AMA met to discuss adding more CPT codes to the RPM section – but updates and revisions have been
Learn how value sets impact data use and EHR workflows, plus how organizations can enhance their creation and maintenance with innovative tools.