An Interactive Visualization of the
1854 London Cholera outbreak
mreda2 at uic dot edu
Note: Applet might take few minutes to load. Please wait. Safari users, please use Up/Down arrow keys to zoom in/out of the map. Mouse wheel support in Safari is broken.
Source code (licensed under GPLv3): SnowCholera CholeraData CholeraDay ClusterGrid DeadPerson Grapher Slider TimeSeries TimelineWidget VectorMap Visualizer misc
In 1854, a cholera outbreak swept over the Soho district in London. The outbreak reached its peak on the first week with more than 150 deaths, causing wide spread panic in the neighborhood which lead to the flight of its residents. At that time, the cause of cholera was not scientifically established, and people believed that toxic vapors resulting from the decay of organic matters was the cause of cholera, among other diseases. Dr. John Snow (1813 - 1858) was skeptical of this theory. To investigate other causes, Snow made his famous map of the Soho district, plotting the location of deaths alongside street water pumps in the neighborhood. At that time, the data acquisition and visualization tools at Snow's disposal were severely limited by todays standards. Nevertheless, Snow's visualization was arguably the first clear evidence that linked cholera transmission to contaminated water supply.
This project is aims to create an interactive version of Dr. Snow's visualization. The data used here comes from Snow's original map. Additional data was randomly generated and added to the original data such as gender and age of victims to make the visualization slightly more interesting.
The visualization is divided into two parts. The left side shows a map of the Soho district in London, plotting the location of deaths, as well the location of street water pumps. The right side of the visualization shows 3 different graphs showing the total number of deaths per day, the percentage of male/female victims, and the percentage of various age groups. Underneath the graph is a timeline with two sliders allowing the user to select a time-window (this is discussed in more detail later). A clustering slider allows the user to turn on/off clustering and vary its coarseness (more in the later). The bar seperating the map from the graphs can be moved by dragging it to incraese the size of one side at the expense of the other.
The map occupies the left side of the visualization. It plots the location of deaths along with the location of street pumps. The user can zoom in/out of the map using the mousesheel (or using UP and DOWN arrow keys in Safari, which for some reason doesn't send mousesheel events to the Java applet). The map can also be panned by dragging with the left mouse button.
The icon for a victim is either a triangle for a male victim, or a circle for a female victim. A circle is significantly different in terms of visual features from a triangle. Thus, it should be easy for the analyst to distinguish between a male and a female victim, or to conduct a quick visual search based on gender. The male's triangles are isosceles, and are rotated about -40 degrees. This, based on my subjective judgement, made them a bit easier to notice. Pumps are depicted with white diamonds with a thicker outline, which should make them pop out as they're significantly different in shape and color from the victims.
The color of a victim's icon is used to represent the age of the victim. The color scale goes as follows:
|10 - 20|
|21 - 40|
|41 - 60|
|61 - 80|
The color progresses from dark blue to cyan to yellow to red. The cyan color blends a bit of green and blue, which makes the color distinct even for people with some form of color-vision impairment. Here's how the scale looks like for people with Deuteranopia (red weakness):
|10 - 20|
|21 - 40|
|41 - 60|
|61 - 80|
For me personally, conducting a visual search for a particular age group was relatively easy. Though it was easier to search for yellow, than for orange, for example. A limitation of the shape-color scheme that I selected is that it is not easy to conduct a visual search based on both gender and age. For example, triangles (male victims) that are orange (61-80 years old). To overcome this limitation, a gender filter is provided which can be used to filter either males or females, which reduces the problem to search for a particular color. Similarly, the case can be also made for an age filter that restricts the victims to one or more age group. This filter was not implemented for the sake of simplicity.
Three graph are rendered on the right side of the visualization.
An interactive version of the 1854 Cholera outbreak visualization must support additional analytical tasks that the static version can not for it to be worthy of use by experts. In this instance, two main analytical goals were established:
The solution to this was to provide a timeline widget which allows the analyst to focus the analysis on a particular day, or a time period. Deaths that fall outside this time period are not visualized. Additionally a gender filter allows the analyst to restrict the visualization to plotting male or female victims. The zoom and pan functions can be used to focus the view on a sub-region of the neighborhood to investigate the deaths that happened in a desired area.
The timeline widget has two sliders: one for the 'begin' timestep (blue), the other for the 'end' timestep (red). The two sliders can be manipulated independently to select an arbitrary time-window that fall anywhere within the observation period. Only deaths that fall within the specified time-window are plotted. The timeline widget is aligned with the X axis of the 3 graphs so that it can serve as a scale. Manipulating the timeline widget does not affect the graphs. The LEFT and RIGHT arrow keys can be used to control the 'end' slider.
The clustering slider is used to turn on clustering and specify the size of rectangles in the clustering grid. When turned on, the map visualization will show the total number of deaths in each rectangle using color variation instead of rendering individual victims.
This filter can be used to render female or male victims only. It also affects the clustering map.
This can be used to disable rendering of street names or map legend.
Here are some observations made after using the visualization
1. Deaths clustered around Broad Street pump
This is was the main finding that Snow came up with after analyzing his static visualization. Using the clustering feature of our interactive version, this is easier to notice. As one can see, the darkest colored rectangles are the ones that are closer to Broad st. pump indicating that most of the victims who died lived within close proximity to the Broad St. pump, and presumabely, used to drink from its water. As we get further away from the pump, the number of deaths decreases significantly.
2. More youngsters and seniors died than other age groups
This pattern is consistent with infectious diseases being more virulent in youngsters and older people.
3. More men have died than women
One can see from the graph that the blue are occupies a larger portion of the graph. Since the gender of victims was picked at random (at original data set did not record gender of victims), we can conclude the the pseudo random number generator used is not up to the task :)
Last update: sep 14, 09