Linking sequence and patient data for clinical research

In part two of our four-part blog series about COVID-19 variants, we explained the importance of data harmonization, which must occur before sequencing, clinical, and epidemiologic data can be combined. In part three, we look at how to maintain the link between patients and their sequence.
Published September 17, 2021
Written by
Picture of IMO Health
Staff

Data siloization challenges 

Researchers need access to detailed, patient-level information linked to variant sequencing data to determine the impact of a variant on patient care. Unfortunately, the process for sequencing SARS-CoV-2 variants in the US is not set up to collect such data. Highly-detailed patient information is often siloed within systems – such as the electronic health record (EHR) or laboratory information management system (LIMS) – that can be hard to access. Finally, many healthcare systems are resistant to sharing identifiable patient data with public health departments due to privacy concerns. As a result, by the time the lab specimen reaches a health department, critical patient information may be missing. This severely limits the utility of this data for clinical research.

Maintaining the link

Instead of relying on the data from public health departments, researchers need a protected database or registry where they can access identifiable patient data. This could be similar to biobanking registries, which store specimen-derived data alongside patient information. Once in the registry, this data can then be moved into a secure research environment and anonymized to protect patient privacy.

However, storing sequencing data within such a registry could also become problematic. SARS-CoV-2 sequences can be documented in a number of ways – from the specific sequence of base pairs to more general terms – like the World Health Organization’s Greek letter labeling system. Systems will need to support a high degree of granularity in order to store data at the appropriate level of specificity so it can be linked to clinical outcomes. This could be done using terminological harmonization, (see Part II) which would map various elements of variant sequence data to a common term or concept.

Terminological solutions for unlocking insights

Tools built around this common terminology could then be used to unlock critical insights. For example, highly granular value sets could be used to identify specific patient cohorts based on variant sequence and clinical outcome. Data could also be stratified based on certain patient characteristics to provide researchers with a clearer picture of how variants both spread across and impact certain populations.    

Conclusion

The current system for sequencing and reporting SARS-CoV-2 sequences provides only a high-level overview for public health monitoring – resulting in data without the patient-specific granularity needed for research.

Researchers need comprehensive patient-level data. Capturing SARS-CoV-2 variant data alongside patient data in a secure registry could provide them with a way to access this information. However, the benefits of linking SARS-CoV-2 variant data with clinical information is not limited to research. In Part IV of our series, we will look into how to use SARS-CoV-2 variant information for better patient care and management.

Related Content

Blog digest signup

Resources sent straight to your inbox.

Latest Resources​

See how generative AI and clinical terminology are transforming systematic literature reviews to make evidence generation faster and scalable.
Article
Halloween was always meant to be a month-long celebration, right? Embrace the spooky season with these perfectly timed ICD-10-CM codes.
CMS expands ASC prior authorization rules with the 2026 WISeR model. Learn what providers need to know to prepare and reduce claim...
ICYMI: BLOG DIGEST

The latest insights and expert perspectives from IMO Health

In your inbox, twice per month.