The Observational Health Data Sciences and Informatics, or OHDSI, annual symposium is a multi-stakeholder, interdisciplinary collaborative of professionals who participate in collective research. OHDSI maintains an international network of databases dedicated to the secondary or observational use of health data for medical decision-making informed by large-scale analysis.
In OHDSI’s recent October symposium, methodology to ensure confidence in the evidence produced by observational research was a primary theme. Both the Food and Drug Administration (FDA) and the European Medicines Agency (EMA) utilize data generated through observational research to better understand the uses, safety, and efficacy of medicines.
Challenges to ensuring data quality in healthcare
Reproducibility and data quality are critical to research informed by secondary or observational data. Reproducibility is the ability of independent researchers to determine the same findings when applying the same design and operational choices in the same data source. It plays a role in healthcare data’s ultimate quality, which is key when mapping primary data to the OHDSI Observational Medical Outcomes Partnership (OMOP) common data model (CDM) for analysis. In short, confidence in real-world evidence relies on confidence in the data itself.
But confidence in data isn’t always a guarantee. Common issues in observational research that can lead to conflicting results include observational study bias, or confounding; publication bias; and p-hacking. Confounding can occur when what appears to be a causal relationship between a treatment and an outcome is affected by a third variable that is not accounted for. Publication bias is the tendency to only publish results that are statistically or clinically significant. P-hacking can occur when researchers select data or alter an analysis until they obtain the desired result.
OHDSI’s role in the solution
To address these issues, OHDSI launched the Large-scale evidence generation and evaluation across a network of databases (LEGEND) initiative, which was designed to generate evidence from observational health data. LEGEND describes best practices, which include the application of a systematic and causal effect estimation procedure to provide information about the direction and strength of the relationship between treatment and an outcome. By defining control questions with known answers and running the analysis locally at multiple sites for comparison, researchers can estimate systematic errors and correct for data biases.
While LEGEND addresses the validity of study design, addressing issues of data quality is another matter. EHR data is designed for clinical care and billing purposes, not research. EHR data can be incomplete or inaccurate, lack validity or plausibility, be of insufficient granularity, or lack in conformance. If study data is of poor quality, the integrity of study cohorts can be compromised. OHDSI assesses data quality through the Automated Characterization of Health Information at Large-scale Longitudinal Evidence System (ACHILLES) tool and the newly developed Data Quality Dashboard (DQD). These tools compute a set of summary statistics on the characteristics of data quality that include conformance, completeness, and plausibility.
Data that has been normalized will be more complete and accurate – an improvement in source data quality. While tools that assess data quality and identify potential quality issues are essential to ensure confidence in evidence generated from observational research, beginning with better quality data has the potential to significantly improve the quality of evidence generated in this research.