At a time when many of us are striving to take a more active stance towards racial discrimination and bias, during a pandemic that has disproportionately cost the lives of African Americans and Hispanics no less, the topic of bias in machine learning has been on my mind.
This topic gained prominence in 2015, when Google’s facial recognition software notoriously tagged two African-Americans as “gorillas”. This was the result of representation bias: the dataset on which the software was trained contained faces of many white people, but few black people.
Bias is also a significant concern in machine learning algorithms used in healthcare. For example, it was recently discovered that a widely-used risk stratification algorithm systematically assigns lower risk scores to African Americans. Risk scores are then used to target patients for high-risk care management programs, which provide additional care coordination resources, including greater attention from trained providers. The end result is racial discrimination in referrals to high quality (and costly) care management programs.
Of course, this was not intentional. There wasn’t any rule specifically assigning lower risk scores to African Americans. In fact, standard practice is to explicitly exclude race from models to prevent bias.
Instead, the algorithm was using annual healthcare costs as a proxy for risk. This may seem reasonable, but it turns out that due to unequal access to care, healthcare costs for a black person are on average $1800 less than for a white person with similar disease burden. The study identifies several reasons for this disparity, some correlated with socioeconomic factors such as competing demands from jobs or child care and others documenting direct effects of race, e.g. physicians’ differential perceptions of Black patients’ pain tolerance. So even though the model doesn’t use race explicitly, bias in historical treatment patterns can perpetuate and even increase future bias.
Fortunately, in many cases this problem can be reduced substantially by ensuring that the predictors are not associated with race in a biased way. For example, in the risk model above, adding a measure of disease burden as a second proxy for risk led to an 84% reduction in bias.
One of the complicating factors in healthcare is that for certain conditions, such as hypertension and heart failure, race considerations are part of the standard of care and therefore should not be excluded from relevant models. In these situations, special care needs to be taken to ensure that race isn’t impacting the model in unexpectedly biased ways.
While this awareness is critical, it is not always enough to address bias in machine learning. Some machine learning algorithms, and specifically deep learning algorithms, lack explicit predictors like total healthcare cost that can be analyzed for correlation with race. They operate in a much more “black box” manner, making this analysis much more difficult. This is one reason to be careful when implementing deep learning algorithms in healthcare, despite recent improvements in their performance.
Mathematical innovations in this area may help improve the situation. In the wake of its gorilla debacle, Google developed a new technique for analyzing bias in machine learning models, called TCAV (“testing with concept activation vectors”), which can quantify (and make sense of) questions like “which of the following striped fabrics are most similar or dissimilar to the concept ‘CEO’?” As the figure below (taken from the paper by Google) shows, this approach can be used to detect bias in image recognition tasks (in this case, gender bias, while reinforcing the poor sartorial choices of certain corporate executives.)
As is common in machine learning, TCAV and other novel bias detection techniques are most mature for image processing, and haven’t been widely applied for healthcare yet. This development will likely come soon and be instrumental to continue to address bias in applications of machine learning for healthcare. But in the meantime, awareness among data science, clinical, and operational teams is the most critical tool to detect and mitigate implicit bias.