How experts are tackling the tedium of data cleansing and preparation

Achieving data quality in healthcare is tricky. Learn how data scientists are streamlining data cleansing to free up time for more meaningful work.
data quality in healthcare

A recent report1 has shown that nearly 40% of data scientists’ time is consumed by data preparation and cleansing tasks. This significantly shrinks the window of time for the more engaging and significant work of analysis and modeling. Given the substantial amount of time spent on these laborious tasks, it’s evident that there’s a need for a shift – a need to refine processes, focus individual and team efforts, and broaden the range of their work.

To better understand how healthcare and health IT experts are addressing this issue, we engaged with data science leaders in the field. We asked them to share how they are tackling data quality challenges or what goals they would turn their attention to if they had more time on their hands.

Curious about what they said? Scroll down for a couple of excerpts from the insight brief. Or, to get the full story, click on the download button below:

The importance of data governance

“Ensuring accurate and reliable insights from data requires maintaining data cleanliness. Data teams can address data cleanliness through restructuring, intensive profiling, and automated data quality checks. They can also establish data governance policies and practices while providing training and support to data users.

The most impactful measure that our team implemented to increase data quality was the establishment of a Data Governance Committee (DGC). The DGC includes business-facing personnel from different departments who use the data. This program includes assessing the data for meaningfulness and usefulness; conducting regular quality assessments; implementing metrics to measure data quality; establishing processes to address quality issues; and fostering a culture of data quality throughout the organization. Prioritizing data quality and maintaining it through established processes allows our team to produce trustworthy and accurate insights from data, leading to better business outcomes and informed decision-making.”

– Syed Ferdous, Data Analyst at Summit Health

What data scientists wish people knew

“Data scientists love solving problems and creating insights, but this doesn’t happen in a vacuum. There is a misconception that data science can provide value in isolation and without context and collaboration. But we need to ask big questions! Iterate and challenge. Lay out the complex problems. Provide stretch goals and aspirations for the product, department, or organization and provide contextual support that converts inert data into meaningful insights when combined with data science’s thinking, techniques, and tooling.”

– Brenden McGlinchey, Information Insight Service Lead at Kenway Consulting

Ready to meet the rest of the experts? For the full insight brief, Moving beyond data cleansing: How data scientists are reclaiming their time, click the button below.

1No author given. 2022 State of Data Science. Anaconda. Accessed via:

Ideas are meant for sharing.

Sign up today and have Ideas delivered straight to your inbox.

Latest Ideas​

Hear how the Piedmont team approached a successful implementation of IMO Health’s surgical scheduling data solution and take away helpful best practices
In May, the AMA met to discuss adding more CPT codes to the RPM section – but updates and revisions have been
Learn how value sets impact data use and EHR workflows, plus how organizations can enhance their creation and maintenance with innovative tools.