#BeyondEpicurves: Battling Infectious Diseases in the 20th Century: The Impact of Vaccines from the WSJ
I love these heatmaps created by Tynan DeBold and Dov Friedman at the Wall Street Journal using Project Tycho data. Each row is a state, and each column is a year. The intensity of the color represents how many cases were recorded in that place and time. Also marked on some of the plots is the year a vaccine was introduced. The visualizations have a bit of an interactive component - if you go to their site, you can mouse over each cell to see the data. The result is a stunning testament to the power of public health and vaccines.
I came across a link recently to an interview with Richard Nisbett arguing against multiple regression analysis in social psychology. His arguments have great relevance to epidemiology - what he's arguing against is not just regression, but observational science in general. The punchline is that correlation is not causation, an observation popular enough to have spawned entire blogs.
"A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging." - Richard Nisbett
Research epidemiology gets around this trap by using case control studies and randomized controlled trials. These trials take time and money though, and for certain problems (including most in my subfield, which prioritizes speed), an interim solution is needed. As Alessandro Vespignani says, "start where you are, use what you have, do what you can." Sometimes observational studies (and regression) are part of that.
I can't give a total pass though. I see linear and logistic regression used as hammers for every nail. There are many other methods for regression and classification that need more attention in the epi world. This Quora post has a nice assembly of terms to google, and the Kaggle blog is a great resource for learning what others have used for different problems. I don't expect all epidemiologists to be statisticians or data scientists, but having a few extra tools at your disposal never hurts.
What do you think?
The blog FlowingData recently posted one of the best epidemiology visualizations I've seen in a while. It's animated, so you have to visit the site to get the full effect. A dropdown menu at the top lets you customize basic demographics. Then the chart animates with the distribution of causes of death for each year of life so far.
It's a beautiful graphic, and I love the interactivity. But what I really love about it is how easily it communicates the bridge between population health and individual health. Epidemiology is concerned with the former, but most people are really only interested in the latter.
Epidemiologists use aggregated data from populations to better understand patterns health and disease, but it can be very hard to translate those findings into information that individuals can use to understand their own circumstances. I like that this graphic drills down far enough to let people see themselves in the data, while still capturing the range of possible outcomes. This style of graphic would work really well for communicating other scenarios with multiple dependencies and possible outcomes. It's almost like a dynamic decision tree.
Foodborne Chicago is an amazing initiative by the Chicago Department of Health to track food poisoning in the city using social media. Most foodborne illness is not reported to the public health department or traced back to the source because diarrheal illness is common and usually gets better without treatment. This makes it hard to identify restaurants or foods that are contaminated, so people continue to eat them and get sick.
The Chicago Department of Health partnered with a civic hacking organization, Smart Chicago Collaborative, to develop a set of online tools to collect reports of food poisoning. People who have fallen sick can go to the Foodborne Chicago website to submit a report. But what's even cooler is the team has set up a Twitter app to scan for people in Chicago complaining of food poisoning symptoms.
Epidemiologists then review the tweets and contact the sick people directly with a link to the online form submission. The Foodborne Chicago site reports that it has classified almost 4,000 tweets and replied to nearly 500 Chicagoans in the quest for better food safety. The website has collected almost 1,300 total reports of foodborne illness.
I love that this project uses technology to bring public health professionals closer to the community they serve. Most residents don't know to alert their local health departments about foodborne illness. In fact, I'd venture a guess that most people barely think about the role the health department plays in their lives at all. The Foodborne Chicago helps epidemiologist to identify and reach out to people who have important information about the health of their community.
This is the kind of project that makes me passionate about teaching Python instead of SAS or another stats-only language. Fluency in a general purpose language like Python lets you go beyond data analysis (though I love that to) to creating websites, apps, bots, and a whole range of tools to improve public health. I see this as the future of public health, and it's what will make epidemiology great again.
Epidemiologists changing the future of public health.