Every day, a wealth of clinical patient information is documented in electronic health records (EHRs) – information that holds great promise for groundbreaking research. But thus far, that potential hasn’t been realized due to poor data quality caused by inconsistent data entry, the loss of essential detail, and systems that are built around data capture for billing.
In the white paper, High quality data enables medical research, these challenges – baked into the processes of electronic clinical documentation and data exchange – are explored alongside measures that can be taken to solve them. Strategies for data standardization and refinement are discussed to help structure EHR data and make it truly usable for a variety of purposes.
For a preview of the white paper – developed by MIT Technology Review Insights, in association with IMO – scroll down for an excerpt. Or, to get the full story, click on the download button below:
Researchers were excited when the government committed to underwrite the cost of converting to EHRs, because the investment held the promise of access to an enormous amount of real-world data for clinical trials and research. This access had the potential to dramatically reduce the cost of medical research, expand its scope, and accelerate the achievement of results.
Unfortunately, we didn’t implement EHRs with standard terminology for describing patients, their treatment, and their outcomes. In addition, commercially available EHR systems are primarily designed to document a single patient encounter and bill an insurance company. And while EHR systems do support care to some extent, differences in how providers use them makes aggregating and using their data for secondary purposes—such as analytics and discovery— challenging.
Most data sources need to be cleaned before they are fit for analysis, but EHR data is particularly vulnerable to quality problems. Two of the most serious are inconsistent entry and loss of essential detail.
Inconsistent data entry makes for inconsistent data
Health records aren’t tidy like banking records. They collect a mixture of data types— lab results, clinicians’ notes, diagnostic images—and when EHRs try to replicate the organization of paper charts, information often ends up stored in a disorganized or disparate fashion. To add to the problem, different institutions and clinicians use their own preferred shorthand, acronyms, and documentation conventions, making it difficult to readily combine their data with data from other institutions, even when both are using the same software. Differing EHR platforms— and how individual providers use them—increase variability as well. And much like other software systems, they evolve with each new release.
Many aspects of medical data are intrinsically fuzzy, adds Gianna Zuccotti, MD, Vice President for Digital Health Transformation and Chief Medical Information Officer at Mass General Brigham. “Anything that’s non-numeric is subject to judgment,” she says. A patient’s record might note “uncontrolled” diabetes, but the meaning of that word varies by clinician: it might be based not only on blood tests but on whether the patient complies with the clinician’s instructions, and whether treatment has improved their test results. A clinical trial recruiting “uncontrolled” patients would have to dig deeper to identify the ones that fit its own definition.
Losing details along the way
Even if all clinicians enter all data for every patient encounter with perfect consistency, details are lost when EHR information travels elsewhere—to generate a bill, to keep the public health department abreast of disease trends, or to be added to disease-specific registries maintained by professional medical associations. Those use cases demand that the data be put into tidy boxes, even if it won’t all fit. And researchers often tap those data sources, even knowing their imperfections, because the data has already been cleaned and standardized.
When EHR data is used for billing, for example, detail that is not necessary to justify the payment is stripped out. The provider’s billing office maps the EHR data onto a set of numerical codes that explain their charges and determine how much the provider will be paid. This may omit, however, additional detail about the particular patient’s condition or other apparently unrelated concerns the patient raises in the same visit. The payer data may show that a lab test has been done, so the lab can collect its payment, but it may not connect that fact to the test result. So while large databases of insurance claims are frequently used for research, this loss of detail limits the types of questions they can be used to answer.
Similarly, a cardiovascular disease registry might ask a provider to supply information directly relevant to its interests—for example, data on the performance of a certain type of anti-clotting drug—but doesn’t gather enough information to answer questions about cardiovascular disease’s relationship to other diseases or specific patient characteristics. “If you could intermingle that registry with, say, a hypertension registry and a diabetes registry, you could see that in patients with a certain constellation of issues, you should use this anticoagulant and not that one,” says John Lee, MD, an emergency physician and clinical informaticist. But each registry usually stands alone.