After a two-year hiatus, I am excited to relaunch my newsletter, intended for non-technical (but data curious) healthcare professionals. Each post will highlight a recent AI research article, why it’s interesting, and how it may be applicable in practice to healthcare companies, with their actual systems, teams, and needs.
This week's paper is about using deep learning to classify social determinants of health (SDoH) from electronic health records (paywall; email me for a copy).
TL;DR: The researchers extract SDoH from clinical notes using various machine learning tools, including off-the-shelf deep learning models. The latter outperformed other methods and achieved good accuracy.
Why is this important? SDoH are increasingly recognized as critical features for applications like risk stratification, predicting health outcomes, and evaluating programs for equity and bias. But SDoH are generally not coded in a structured way, so automatic extraction from EMR would be useful.
Also interesting: The performance of off-the-shelf models on ad-hoc clinical annotation tasks is promising. This was generally not the case before 2020, with the launch and broad adoption of a model called “BERT” developed by Google.
Who is this relevant for? Healthcare organizations with access to EHR data and interest in incorporating SDoH to their work. Good results were achieved with off-the-shelf models, which are accessible to data science teams without specialized research capabilities.
Caveats: The results were tested data from a single hospital and based on a set of SDoH defined by the researchers. Real-world applications will require careful definition of SDoH of interest, an investment in annotation, and a rigorous evaluation of the results.
The bottom line: With standardized SDoH definitions and studies across diverse datasets, it may be possible to extract SDoH from EHR at scale and with high accuracy. This is of particular importance for evaluating equity and bias of clinical programs. Meanwhile, off-the-shelf deep learning models show promising performance on ad-hoc clinical annotation tasks, and are within reach for most data science teams.