Lessons from Three Covid-19 Disease Models

As the world continues to battle the coronavirus pandemic, policy makers, and the rest of us, are looking for reliable forecasts of how the epidemic will progress, especially when social distancing restrictions are relaxed. Several scientific teams have developed and shared predictive disease models, which together offer a real-time case study of the practical applications of predictive models for population health.

Given the novelty of the disease, all these models were necessarily built quickly and with limited amounts of data. Therefore, the teams developing them had to make some strong assumptions and balance competing tradeoffs. How do these considerations affect the models’ real-world applications?

The IHME model, developed by the Institute for Health Metrics and Evaluation at the University of Washington, attracted significant popular attention after being cited in a White House briefing. But it also drew unusually sharp criticism from the scientific community. This is partly due to true methodological weaknesses, but also because its approach departs from common epidemiological methods. The IHME model assumes that the virus’s trajectory will be similar in areas with similar densities and social distancing policies, and relies on information from China and Europe, where the epidemic is further along, to estimate trajectories in the United States. In contrast, standard epidemiological practice relies on estimates of exposure and transmission rates to make these predictions. This disparity makes the IHME approach difficult to validate given the relative lack of data. This tension between expert-driven and fully data-driven approaches is very common in healthcare. Often, the back-and-forth is healthy, ensuring that models are vetted by experts but are also flexible enough to discover new relationships in the data.

Another concerning aspect of the IHME model is its lack of robustness, meaning that small changes in the input can lead to wildly different results. For example, the IHME generated some frustration in Massachusetts by choosing to interpret its stay-at-home “advisory” as a substantially weaker measure than an “order”, leading to a doubling in the number of predicted deaths. While there is room for argument over the semantics, the outcome is clearly implausible, leading to reduced trust by policy makers and poor usability of the model. 

A second model, developed at the University of Texas, addresses some of the flaws of the IHME model. It takes a similar trajectory matching approach, but rather than using only the timing and general category of each state’s social distancing guidelines, it uses aggregated data from mobile-phone GPS traces to quantify actual levels of social distancing. 

The UT model results in much more robust predictions, and it is helpful for short-term predictions of up to two or three weeks into the future. But the trajectory matching approach means that predictions can’t be extended farther out, since longer trajectories haven’t been observed yet. A longer-term model, for example one that forecasts how the disease will spread as current restrictions are relaxed, requires taking the underlying epidemiological dynamics into account, which neither model does at this point.

This is the starting point for another model, developed at MIT. Classical epidemiological methods rely on estimates of exposure and transmission rates. However, they don’t typically take into account the effect of social distancing, partly because social distancing is difficult to quantify. The MIT model augments epidemiological methods with a “quarantine” variable whose dynamics are inferred from the data using a simple artificial neural network.

The MIT model achieves reasonably accurate predictions despite using very little data. However, unlike the previous models, the quarantine term is very difficult to interpret, since the factors impacting it aren’t modeled explicitly. This makes it impossible to answer questions like “how will the number of cases grow if schools are reopened?”

From the perspective of a healthcare organization, there is much to be learned from the public and scientific conversation around these models. Consider how nuances of robustness (IHME model) or interpretability (MIT model) can restrict the usefulness of epidemiological models for policy makers. The same themes arise constantly in machine learning models for healthcare. And just as in the epidemiological context, making them explicit and transparent to all stakeholders rather than relegating them to technical teams alone is one of the keys for their successful application in the real world. 

Covid-19 has scientists around the world collaborating to develop better tests, treatments, and vaccines. Similarly, the pandemic may lead to the development of new epidemiological methodologies and a stronger partnership with policymakers. An equivalent process could facilitate a fuller, more collaborative adoption of machine learning in healthcare. Perhaps this could be one of the positive outcomes of this global crisis. 

If you are not on the mailing list, please subscribe. And if you would like to discuss how data science can be a component of your response to COVID-19, please reach out.