Contact Us Menu
Research Solutions

Best Practices for Transforming Real-World Data into a Quality Asset for Healthcare

Author: Aracelis Torres, PhD, MPH
February 2022

The healthcare industry has the potential to reap extraordinary benefits from real-world data (RWD), but transforming data into a valuable, quality asset that can help clinicians improve care and lower costs has proven challenging for the industry. Harnessing the power of RWD will require industry-wide collaboration among key stakeholders such as payers, providers, researchers and technology developers. 

…much RWD data in EHRs remains untapped because it is unstructured and not easily usable or searchable through traditional means.

Published Paper: Best Practices in the Real-world Data Lifecycle

In a recent paper published in PLOS Digital Health, I collaborated with a number of experts from a diversified set of organizations across the healthcare ecosystem, including King’s College Hospital, AstraZeneca, Mayo Clinic and the Harvard T. H. Chan School of Public Health. My fellow authors and It describe seven best practices for developing a data infrastructure to realize value from RWD and expand resulting capabilities.

Traditionally, RWD has been used primarily by life science companies and regulators to assess drug safety or therapeutic outcomes and inform coverage and payment. However, with wide-spread adoption of electronic health records (EHRs) the industry’s ability to capture RWD for a growing number of use cases has greatly expanded. 

Today, providers, health systems, academic institutes, and more leverage RWD in many ways, including artificial intelligence (AI) assisted clinical decision making, clinical operations management, and population health. However, much RWD data in EHRs remains untapped because it is unstructured and not easily usable or searchable through traditional means. As a result, this data must undergo a process of aggregation and enrichment that involves simplifying raw data into essential components (abstraction), as well as conversion into suitable formats (transformation) and standard terminologies (harmonization).

To gain the full value of RWD, healthcare organizations should consider the following seven best practices:

1. Compatibility with internationally recognized data standards enables data aggregation at scale: To enable RWD aggregation, EHR source data must comply with internationally recognized standards, such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) and International Classification of Diseases (ICD). However, competing EHR systems across the industry often use proprietary, vendor-specific data formats, resulting in limited interoperability. To drive compliance with internationally recognized data standards, commercial stakeholders must collaborate with policymakers, providers, and standards organizations to reach consensus on data standards to ensure future RWD sources are interoperable at inception.

2. Quality assurance must be considered in advance, and tailored for use-case: Data will always be a less-than-perfect representation of what actually took place, due to a number of factors including: imperfect translation of the data, errors in data capture, human mistakes in data entry, and incomplete documentation. These factors do not stop data from being able to generate meaningful insight, but they do mean that stakeholders must pay close attention to data quality assurance (QA). As the industry evolves, different QA approaches will emerge from dynamic consensus and gain validation through use and deployment. This will be supplemented by AI-driven approaches, reducing, but not eliminating, the need for domain expert oversight.

3. Incentivize detailed data entry to maximize value of source data: In all scenarios, the best time to ensure RWD value is upstream at the point of data entry. However, incentivization of high-quality data entry is challenging, whether through financial or nonfinancial means. Regardless of method, adopting suitable incentivization may be the most efficient way of adding value to detailed RWD datasets.

4. Deploy natural language processing to mobilize unstructured data sources: Approximately 80% of RWD is unstructured, taking the form of free-text, and requires significant processing. To unlock unstructured data, the industry has relied on manual transcription onto case-report forms, a time-consuming and costly effort that is impractical for large-scale curation. This challenge can be surmounted by adopting AI-based natural language processing (NLP) tools that enable mass unstructured text mining and integration of NLP into the RWD lifecycle offers sustainable data enrichment.

5. Rapid-cycle and flexible analytics can be enabled by platform solutions: The enterprise data warehouse (EDW) has been the usual data storage model employed by provider networks and research groups. However, EDW can be time-consuming to implement, inflexible once populated, and risks excluding data that might later be found relevant. In contrast, a flexible data platform is capable of handling multiple, varied solutions.

6. Return value to patients through transparency, engagement, and a focus on data privacy: Electronic health records are co-created by patients, care providers, and provider organizations. Given the patient’s contribution, it is important that RWD use-cases consider patient benefits and preventing harm as key goals. To preserve trust in healthcare systems, it is critical that patients are informed of their rights pertaining to their own health data.

7. Diversity in real-world data reduces bias and maintains equity: A lack of diverse populations in clinical trials is a well-recognized problem in the healthcare industry, but RWD represents an opportunity to expand diversity. To improve the situation, stakeholders should prioritize information gathering, such as capturing demographic data in EHRs. Additionally, policymakers may consider establishing incentives for the creation of RWD infrastructure in underserved communities.

A lack of diverse populations in clinical trials is a well-recognized problem in the healthcare industry, but RWD represents an opportunity to expand diversity.

In the next decade, the industry will uncover new use cases for RWD, such as reinforcing the quality of pharmaceutical real-world evidence, expanding real-world evidence to new disease areas, and improving AI research quality for clinician- and patient-facing devices. To arrive at that point, however, the industry will need to follow the above best practices. 

Verana Health remains committed to these best practices and to our important role as a data steward to the more than 20,000 healthcare providers who participate in our exclusive real-world data network.