Obtaining causal estimates from observational data often requires individual and sequential adjustment for multiple sources of bias. Multi-bias modeling typically involves the sequential adjustment of uncontrolled confounding, information bias, and selection bias, using knowledge of the sequence in which the biases took place.
In research presented at the Joint Statistical Meeting (JSM) 2021 virtual conference held Aug. 8-12, Dr. Paul Brendel, the study’s lead author and senior quantitative scientist at Verana Health, introduced a new approach to bias adjustment that allows for simultaneous adjustment of multiple biases by combining individual-level data with bias parameters to obtain imputed values and/or a regression weight.
The researchers verified the validity of this method and explored the sensitivity of effect estimates to misspecified bias parameters through a study that used Monte Carlo simulations to generate two data sets of 100,000 rows each. One data set had stronger biasing paths and the other weaker biasing paths. With data simulated so the effect of interest had a value of 2.00, the goal of the simulation was to be able to derive this true effect estimate with the use of biased data. The team built a directed acyclic graph (DAG) depicting a multi-bias scenario to facilitate the study.
Brendel and his co-authors – Dr. Aracelis Torres, vice president of quantitative sciences at Verana Health, and Dr. Onyebuchi Arah, a professor of public health and epidemiology at UCLA – concluded that simultaneous multi-bias analysis is a useful tool to quantify how multiple biases could impact an observed effect estimate.
“Combining simultaneous multi-bias analysis with a sensitivity analysis of the underlying DAG and the values of the bias parameters is key to obtaining a thorough understanding of the bias in a system,” Brendel said. “Future work should expand on this method to include other variable types, model families, and measures of effect.”
Healthcare providers increasingly are relying on data to drive clinical decision-making. But to ensure data integrity, healthcare organizations must be aware of biases in data (such as selecting non-representative populations and incomplete data sets) that can result in invalid estimates of effect.
By understanding the context surrounding the collection and analysis of data sets – that is, by having “data empathy” – health data scientists can provide meaningful insights to improve patient and population care.