COVID-19 risk models: Getting started with production machine learning

In these extraordinary times, and with COVID-19 projected to impact us for at least a year, I am temporarily refocusing this newsletter to discuss how healthcare organizations can leverage data science and machine learning in their response to the virus. 

In the previous post, I outlined how improving data collection, organization, and reporting will help ensure that the most critical information is always available and up-to-date during the first stage of the crisis.

As the first wave of COVID-19 begins to subside, healthcare organizations will begin shifting from reactive crisis management to a more proactive approach, managing the ongoing risk that the virus poses to vulnerable populations. Risk assessment capabilities will be central to these efforts, answering questions such as:

  • Which patients are most likely to develop complications if they contract the virus?

  • Which hospitalized patients are at the highest risk of mortality?

With limited budgets and thinly-spread staff, broad risk categories like age and underlying conditions may not be good enough for targeted interventions. Companies with granular and accurate risk stratification capabilities will enjoy a strong advantage. 

A few early studies have already identified some specific risk factors. For example, underlying heart conditions may present a higher mortality risk than underlying lung conditions. Savvy organizations can begin to adapt these studies to develop a simple rules-based risk model.

However, these studies were done in early stages of our understanding of the disease and while responding to emergency conditions. Inevitably, they include just a small number of subjects and are not well-controlled. Over the coming months, new studies will improve and refine our understanding of risk factors. So rather than relying on static risk models, it will be critical to adapt and evolve them. This includes the ability to:

  • Quickly implement and refine new risk models

  • Evaluate the performance of a new model on historical data and compare it to an existing model

  • Smoothly roll out a new model and manage any downstream impact on operational and clinical workflows 

All of this is impossible to do without organized and reliable data. This is where the effort being put today into data organization will pay off. Moreover, with minor extensions to the technical infrastructure required for data organization, you will also be able to automate large parts of the model testing and update workflow.

And, even though your risk models may be simple rules-based models, at the end of this process you will have built the main components of a production machine learning system. In fact, the ability to test, deploy, and monitor new models in the real world is often the main bottleneck for developing production machine learning systems, much more so than the ability to use sophisticated machine learning techniques. Ironically, the tight operational focus imposed by COVID-19 and lack of historical data due to the novelty of the disease, can empower even large organizations to follow the playbook for building machine learning capabilities from scratch with clear and measurable value.

If you are not on the mailing list, please subscribe. And if you would like to discuss how data science can be a component of your response to COVID-19, please reach out.