#BeyondEpicurves: Battling Infectious Diseases in the 20th Century: The Impact of Vaccines from the WSJ
I love these heatmaps created by Tynan DeBold and Dov Friedman at the Wall Street Journal using Project Tycho data. Each row is a state, and each column is a year. The intensity of the color represents how many cases were recorded in that place and time. Also marked on some of the plots is the year a vaccine was introduced. The visualizations have a bit of an interactive component - if you go to their site, you can mouse over each cell to see the data. The result is a stunning testament to the power of public health and vaccines.
I use heatmaps often because they are a clean way to visually organize a huge amount of data. Any time I have three dimensions to plot (shown here are state, year, and cases) I turn to heatmaps. It's almost like flattening 50 epicurves into a single graphic. And because the numeric data are represented by color sequences, it's easy to pick out patterns over time and space. Humans are still better at patter recognition than computers, which is one reason why it's so important to always visualize your data.
The diverging color scheme of green to red on the WSJ plots is nice, if a bit misleading since the jump from gray to green to yellow is quite sharp. I like how there are no strong axis or tick marks to distract your eye. That choice lets the "vaccine introduced" vertical bar stand out more prominently. The ytick labels are a mix of abbreviations (N.D.), camel case abbreviations (Vt.) and shortened names (Calif.) which I don't love. There are also some states missing, which is not explained. The cell borders are white, which is much cleaner than black in my opinion.
This pertussis graph is interesting to me because there is so much missing data. Missing, incomplete, or otherwise suspicious data is a omnipresent problem in public health. The choices for handling it are 1) don't 2) interpolate 3) make do. I think choice 3 is the only logical one for most situations. I like how it's displayed here instead of a removed with an axis break.
When working in Python I create heatmaps using seaborn. Use pandas' pivot table function to arrange the data exactly how you intend to plot it (with rows as the index, columns as columns, and numeric data as values). Then just pop the table in seaborn's heatmap function, and it's done. The default aesthetics are similar to those shown in the WSJ graphics.
In R you can use ggplot's geom_tile function to do the same thing. Here's an (old) tutorial on making a heatmap with similar aesthetics to the ones shown above.
Epidemiologists changing the future of public health.