What’s in a variant? Part III – Linking sequence and patient data for clinical research

In part two of our four-part blog series about COVID-19 variants, we explained the importance of data harmonization, which must occur before sequencing, clinical, and epidemiologic data can be combined. In part three, we look at how to maintain the link between patients and their sequence.

Data siloization challenges 

Researchers need access to detailed, patient-level information linked to variant sequencing data to determine the impact of a variant on patient care. Unfortunately, the process for sequencing SARS-CoV-2 variants in the US is not set up to collect such data. Highly-detailed patient information is often siloed within systems – such as the electronic health record (EHR) or laboratory information management system (LIMS) – that can be hard to access. Finally, many healthcare systems are resistant to sharing identifiable patient data with public health departments due to privacy concerns. As a result, by the time the lab specimen reaches a health department, critical patient information may be missing. This severely limits the utility of this data for clinical research.

Maintaining the link

Instead of relying on the data from public health departments, researchers need a protected database or registry where they can access identifiable patient data. This could be similar to biobanking registries, which store specimen-derived data alongside patient information. Once in the registry, this data can then be moved into a secure research environment and anonymized to protect patient privacy.

However, storing sequencing data within such a registry could also become problematic. SARS-CoV-2 sequences can be documented in a number of ways – from the specific sequence of base pairs to more general terms – like the World Health Organization’s Greek letter labeling system. Systems will need to support a high degree of granularity in order to store data at the appropriate level of specificity so it can be linked to clinical outcomes. This could be done using terminological harmonization, (see Part II) which would map various elements of variant sequence data to a common term or concept.

Terminological solutions for unlocking insights

Tools built around this common terminology could then be used to unlock critical insights. For example, highly granular value sets could be used to identify specific patient cohorts based on variant sequence and clinical outcome. Data could also be stratified based on certain patient characteristics to provide researchers with a clearer picture of how variants both spread across and impact certain populations.    


The current system for sequencing and reporting SARS-CoV-2 sequences provides only a high-level overview for public health monitoring – resulting in data without the patient-specific granularity needed for research.

Researchers need comprehensive patient-level data. Capturing SARS-CoV-2 variant data alongside patient data in a secure registry could provide them with a way to access this information. However, the benefits of linking SARS-CoV-2 variant data with clinical information is not limited to research. In Part IV of our series, we will look into how to use SARS-CoV-2 variant information for better patient care and management.

In case you missed the first two posts in our What’s in a variant series,  catch up with:

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Latest Ideas​

A lack of specificity in clinical documentation can negatively impact reimbursement, billing, and more. Read on to learn how to protect your
By leveraging IMO Health technology, CyncHealth has scaled data standardization and enriched patient diagnoses in their data warehouses.
Let's explore how better HCC capture can improve payer-provider collaboration, leading to enhanced data quality and more efficient workflows in healthcare.