COVID-19 Insights: Can We Predict Where the Pandemic will Spread Next?

By automating data discovery, Metopio can find similarities between places that might not be readily apparent to develop better models.


Defining the Problem

Analyzing aggregate data through a geospatial lens helps find patterns across places that don’t appear to be the same on the surface, such as places with different levels of population density, modes of transportation and demographics.

As more granular COVID-19 data is released, we can understand the pandemic at a more local level. Each layer offers us an opportunity to refine an analysis, with the hope of getting to high-quality, neighborhood-level data across the country with demographic stratifications.

Metopio is built to find patterns across different populations and places quickly, and without having to spend hours chasing down and cleaning data.

This descriptive analysis tells us --what commonalities are shared across the parts of Illinois that have the highest COVID-19 case rates. If there are patterns, can we use some of this local data to further understand the virus progression?

Metopio is a start-to-finish analytics platform so we can take the next step and see if any data is significantly correlated with coronavirus infections. If strong correlations do exist, we can go yet one step further and build predictive models to anticipate where additional hotspots might appear. Here's how to uncovering patterns.

Process for Finding Patterns

The initial step is to find the ZIP codes in Illinois with a the highest COVID-19 case rates. In this case that includes case rates above 500 per 100,000 residents (meaning >0.5% of residents have a confirmed positive test). From there we create a custom region with our mapping tool.

The top 22 ZIP codes in Illinois span hotspots such as the South and West Sides of Chicago, but also suburban locales including Melrose Park, southern Cook County, Lincolnwood and Waukegan.

When we create a custom region, Metopio immediately combs through 250 million data points in the data library to find out how this new “place” stands out.

After 10 seconds, we get these two lists of topics, specific to this new region of high COVID-19 case rate ZIP codes--

Metopio image

What stands out?

Air quality is clearly a problem. Not only do these ZIP codes which span urban and suburban, lakefront and inland areas have high diesel particulate matter concentration, they also fall into two EPA indices for high respiratory hazard and diesel particulate matter.

The relationship between poor air quality and COVID-19 has been the focus of researchers at Harvard University. The New York Times published the study, which Metopio visualized here. It found high levels of particulate matter significantly correlated with poor COVID-19 outcomes.

Second, the ZIP codes are among the top 5% in the country for people commuting by bus, despite being both urban and suburban. These ZIP codes in aggregate also have a large percentage of homes without access to vehicles, meaning a further reliance on public transportation to take care of errands, which could increase exposure. This frames another important question--is time on public transit a predictive value for COVID-19? Check out this scatterplot.

Third is the density of both population and housing, which has an understandable impact on virus spread.

Taken together we can form a more complete picture of what conditions might lead a community to see higher rates of infection. We want to emphasize might since this analysis does not prove causation, but significant relationships within the data. It can help focus research, direct community outreach and leverage resources.

Can we help your organization understand the impact of COVID-19? Contact us