What’s in a variant? Part III – Linking sequence and patient data for clinical research

Data siloization challenges

Researchers need access to detailed, patient-level information linked to variant sequencing data to determine the impact of a variant on patient care. Unfortunately, the process for sequencing SARS-CoV-2 variants in the US is not set up to collect such data. Highly-detailed patient information is often siloed within systems – such as the electronic health record (EHR) or laboratory information management system (LIMS) – that can be hard to access. Finally, many healthcare systems are resistant to sharing identifiable patient data with public health departments due to privacy concerns. As a result, by the time the lab specimen reaches a health department, critical patient information may be missing. This severely limits the utility of this data for clinical research.

In the US, the process for sequencing a SARS-CoV-2 strain starts when a patient gets a COVID-19 test that uses a lab technique called a polymerase chain reaction – a PCR test. If positive, the test sample can then be sent for sequencing. Depending on where the test was given – for example, at the provider’s office or at a mass testing site – this sample could be sent to a university lab or a local or state public health department for sequencing. The sequence is then published in a publicly accessible registry, such as GISAID, and reported to the Centers for Disease Control and Prevention (CDC). Data about the patient – such as age, gender, and residential zip code – is submitted along with the sample for sequencing. Health departments use the de-identified data to monitor the spread of variants across the population.

Maintaining the link

Instead of relying on the data from public health departments, researchers need a protected database or registry where they can access identifiable patient data. This could be similar to biobanking registries, which store specimen-derived data alongside patient information. Once in the registry, this data can then be moved into a secure research environment and anonymized to protect patient privacy.

However, storing sequencing data within such a registry could also become problematic. SARS-CoV-2 sequences can be documented in a number of ways – from the specific sequence of base pairs to more general terms – like the World Health Organization’s Greek letter labeling system. Systems will need to support a high degree of granularity in order to store data at the appropriate level of specificity so it can be linked to clinical outcomes. This could be done using terminological harmonization, (see Part II) which would map various elements of variant sequence data to a common term or concept.

Terminological solutions for unlocking insights

Tools built around this common terminology could then be used to unlock critical insights. For example, highly granular value sets could be used to identify specific patient cohorts based on variant sequence and clinical outcome. Data could also be stratified based on certain patient characteristics to provide researchers with a clearer picture of how variants both spread across and impact certain populations.

Conclusion

The current system for sequencing and reporting SARS-CoV-2 sequences provides only a high-level overview for public health monitoring – resulting in data without the patient-specific granularity needed for research.

Researchers need comprehensive patient-level data. Capturing SARS-CoV-2 variant data alongside patient data in a secure registry could provide them with a way to access this information. However, the benefits of linking SARS-CoV-2 variant data with clinical information is not limited to research. In Part IV of our series, we will look into how to use SARS-CoV-2 variant information for better patient care and management.

In case you missed the first two posts in our What’s in a variant series, catch up with:

What’s in a variant? Clinical documentation, epidemiology, and genomic sequencing

August 20, 2021

What’s in a variant? Part II – Making sense of data through harmonization

September 3, 2021

What’s in a variant? Part III – Linking sequence and patient data for clinical research

Data siloization challenges

Maintaining the link

Terminological solutions for unlocking insights

Conclusion

In case you missed the first two posts in our What’s in a variant series, catch up with:

What’s in a variant? Clinical documentation, epidemiology, and genomic sequencing

What’s in a variant? Part II – Making sense of data through harmonization

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Related Ideas

Case study: Optimizing the medical problem list with clinical categorization

Beyond dictation: Boosting clinical workflows with ambient AI

Data quality in healthcare and the elusive digital twin

Powering the healthcare ecosystem.

Solutions

Top Articles

Resources

Contact

Follow Us

Headquarters