Week 5

Geospatial Visualization

A few examples to start  ...

NY Times - (flash based)  Mapping America

some stills from the interactive application:

here is a nice map of cell phone strength

What can we say about the people in an area

Visual contrasts established by manipulating perceptual qualities

the following are retinal variables - perceived immediately and effortlessly - fundamental units of visual communication

retinal variables

Information represented in a visual display is characterized by

Lets do an example. For each of the two projects below write what you think the length and scale are for each dimension in a word processing file. Print the file and add it to your gradescope submission for the week.

The data from Project 1 in 2020, which can be found at https://www.evl.uic.edu/aej/424/litterati challenge-65.csv , included the following 11 dimensions. Take a few minutes and decide on the length and scale of each of these dimensions.

The data from Project 1 in 2019, available at https://aqs.epa.gov/aqsweb/airdata/annual_aqi_by_county_2019.zip , included the following 19 dimensions. Take a few minutes and decide on the length and scale of each of these dimensions.

We can look back at the MicrobeScope example. Here all of the points have the same Size, but there are different Shapes (circles, diamonds, triangles) and different Hues (dark blue, light blue, pink, etc.) and they are in different Positions in the

Here microbe type (shape) and primary transmission method (hue) are nominal / categorical.
Deadliness, Contagiousness, and the other X and Y axis options are quantitative


Nominal - User interested in categorizing

In ordered perception the viewer must determine the relative ordering of values along a perceptual dimension. Given any two visual elements, a natural ordering must be clearly apparent so the element representing 'more' of the corresponding quality is immediately obvious

In quantitative perception the viewer must determine the amount of difference between two ordered values. The user does not need to refer to an index or key - the relative magnitudes must be immediately apparent

Visual variables differ substantially in length:

here are some more examples from our textbook:

The Principles of Symbolization chapter from Thematic Cartography and Geovisualization, 3rd ed. by Slocum, McMaster, Kessler, and Howard gives a nice introduction on mapping data to symbols so we will use several examples from it below.

A lot of data today can represented geographically, as the popularity of Google maps / earth can attest to, so its a nice place to start looking at details.

Nature of Geographic Phenomena:

Spatial Dimension

Discrete vs Continuous and Abrupt vs Smooth Phenomena
    discrete - occur at distinct locations (and have a space between them)
    continuous - occur throughout a region of interest

    abrupt - can change suddenly
    smooth - change gradually

Discrete-Continuous / Abrupt-Smooth Phenomena
Figure 5.1 from Thematic Cartography showing phenomena and appropriate ways of representing them

Distinction between data that has been collected to represent a phenomenon and the phenomenon being mapped
we are typically collecting data at discrete sites (weather stations, well sites) or aggregating over small regions (counties, states) where the actual phenomena being modeled (e.g. the temperature outside) is continuous. Other times we are collecting discrete data on a discrete phenomena (e.g. electricity usage at this address).

Type of visualization used depends both on the nature of the underlying phenomenon and the purpose of the map

We often deal with continuous data represented by discrete sampling

A familiar example is a weather map showing the current temperature across the state or country, but the data is only sampled at certain scattered stations which is then interpolated. You can click on the map to gain access to the data files and to see how the data is interpolated across the state. Here are the FAA sites in Illinois https://www.faa.gov/air_traffic/weather/asos/?state=IL

and the Weather underground https://www.wunderground.com/wundermap sites near campus

Interpolation is then used to predict the values in between, with a variety of possible methods.

shepards method is one way to perform that interpolation - https://en.wikipedia.org/wiki/Inverse_distance_weighting

Visual Variables:

visual variables for qualitative should reflect only a nominal level of measurement - i.e. there shouldn't be a sense that one value is 'more' than another, just that they are different.

Visual Variables for Qualitative
Figure 5.4 from Thematic Cartography

so for example as I look at the areal visualization I don't (and I shouldn't) get a sense of which area is higher or lower, or has more cows, etc.

Why are 2.5D representations not recommended for qualitative phenomena?

note that only different hues are used for qualitative data - not saturation or lightness which have an obvious ordering

much of this work was done at a time when a line plotter was the tool to make drawings like this, giving very high resolution vector black and white drawing capability. Today those maps still exist but more work is done on bitmapped displays with lower resolution but a greater use of colour.

Here is a more appropriate use of a wide variety of colors showing data from Facebook on the favorite American Football teams for various counties, though not without its problems.

On the other hand, visual variables for quantitative should reflect ordinal, interval, or ratio level of measurement

Visual Variables for
      Quantitative Phenomena

so for example as I look at the areal visualization I do get a sense of which area is higher or lower.

here is an example of different ways of using colour to map life expectancy in the US which is quantitative. Which is more readable? the first from mapoftheunitedstates.org or the second from www.measureofamerica.org

So, lets try an in class activity to make this more clear, and for that we will take a look at mapping the data for the winning countries of the Eurovision Song Contest from 1956 to the present - https://en.wikipedia.org/wiki/Eurovision_Song_Contest

Here are the totals:

Wins Countries
7 Ireland
6 Sweden
5 France, Luxembourg, United Kingdom, Netherlands
4 Israel
3 Norway, Denmark
2 Spain, Switzerland, Italy, Germany, Austria, Ukraine
1 Monaco, Belgium, Yugoslavia, Estonia, Latvia, Turkey, Greece, Finland, Serbia, Russia, Azerbaijan, Portugal

and there is a current map of Europe available here:

You can use a computer/tablet to create a visualization with the map, or print out the map and use colored pencils, or some other visualization primitives and then take a photo of the map, convert it to pdf and add it to your gradescope submission for the week.

You will find that even with small datasets it can be hard
- some locations / countries / streets / buildings may no longer exist
- some locations may be very small and harder to shade, color, or add glyphs to

- some locations may be off the map

Comparison of choropleth, proportional symbol, isopleth, and dot mapping:

  • commonly used to portray data collected for units such as counties or states
  • regions are shaded / colored based on the phenomena
  • good for when values change abruptly at unit boundaries but hides variation within units, and the boundaries may be artificial in relation to the phenomena.

isopleth (contour map)
  • good when data collected was from a smooth continuous phenomenon
  • regions are shaded / colored based on the phenomena
  • interpolating set of isolines between sample points of known values

proportional symbol
  • scale symbols in proportion to the magnitude of the data
  • symbol might be a true point (located at a data collection point) or a conceptual point (at the center of a unit)

dot mapping
  • one dot is set equal to a certain amount of the phenomenon
  • dots should be placed where the phenomena occurs (much higher level of accuracy than other maps)
Thematic Mapping
Figure 5.10 from Thematic Cartography

R has variety of nice libraries and available data to help with this kind of thing. The following code reads in data on counties in the US as well as a file of data to map onto those counties (in this case number of electric vehicles, population, and number of passenger cars registered in each Illinois county). Since the county names match in both files its pretty simple to join them together and then display the results as a couple different choropleth maps.

Create a new Jupyter notebook for this activity and when you are done print out that notebook and add it to your gradescope submission for the week.

# example of mapping data onto illinois counties
# based on example from https://people.ohio.edu/ruhil/Rbook/maps-in-r.html

you should start by installing some packages: rgeos, ggmap, mapdata, maptools, ggthemes


usa <- map_data("county")
il <- subset(usa, region == "illinois")

il$county = str_to_title(il$subregion)

#basic map with county boundaries and county names at the centroid of the county
getLabelPoint <- # Returns a county-named list of label points
    function(county) {Polygon(county[c('long', 'lat')])@labpt}
centroids = by(il, il$county, getLabelPoint)     # Returns list
centroids2 <- do.call("rbind.data.frame", centroids)  # Convert to Data Frame
centroids2$county = rownames(centroids)
names(centroids2) <- c('clong', 'clat', "county")

#simple map with county borsers and county names
#ggplot() + geom_polygon(data = il, aes(x = long, y = lat, group = group), fill = "white", color = "gray") + coord_fixed(1.2)  + geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "darkblue", size = 2.25)  + theme_map()

# read in data on the number of electric vehicles registered in each county
evs <- read.table(file="http://www.evl.uic.edu/aej/424/EVs%20in%20IL.csv", sep=",", header=TRUE)

# under windows the first column header gets corrupted - this fixes it
names(evs)[1] <- 'county'

# as usual here you should take a look at the data and see if the numbers make sense

#combine the county data and the EV data - they have the 'county' attribute in common
# join keeps the original ordering where merge does not
ilCountyPlusEV <- join(il, evs)

#plot the population per county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = population), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges") +
    labs(fill = "population") + theme_map()

#plot the percentage of cars to people in each county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = percent_cars), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges") +
    labs(fill = "car %") + theme_map()

#plot the total number of EVs per county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = evs), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges") +
    labs(fill = "# of EVs") + theme_map()

#plot the percentage of cars that are EVs in the county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = percent_cars_evs), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges") +
    labs(fill = "EV %") + theme_map()

#There are better (more complicated) ways to use color here
#but this should give you a starting point for what you can do

Choropleth of car ownership in
        Illinois counties

Here are some other ways of displaying data onto geographic entities from Information Graphics -  A Comprehensive Illustrated Reference. There are many ways to do it, but all require having enough space.


There are also pictographic symbols

Here its pretty easy to make rough comparisons - its tricky to make exact comparisons as its hard to avoid the lie factor. Even avoiding that, we have to be careful about the conclusions drawn from an image like this. The states have very different populations (as we talked about last week comparing Wyoming to California) and its not directly correlated to the size of the state. One issue directly related to the size of the states is the overlap of the icons in the northeast making it hard to make any sense of the data there.

Here is an image showing crime statistics compared to the average over time for the US from CommonGIS via the Thematic Cartography and Geovisualization book showing why this can be tricky ...

Here are some variations of the typical red/blue election map for the 2008 presidential election

and a different way to view election data using dots by John Nelson

and a 2019 solution to the problem by making each US congressional district the same size and trying to keep the states in the correct shape and relative position to each other, though the locations of the districts themselves within each state are only correct relative to each other. In this case geography is distorted to make areas of roughly equal population more clear.

When looking at much larger regions of the planet the issues are a bit more complex. A 'flat' map can be a good way to see data from all over the planet simultaneously, but it does add in distortions as the earth is sphere-ish, and trying to represent a sphere, or even a portion of it, on a rectangular plane will generate errors. Other tools like Google Earth can be used to map data onto a spherical Earth model, but then there are issues of only being able to see part of the planet at one time.

XKCD has a nice overview of some of the most common map projection types

and Wikipedia has an extensive list - https://en.wikipedia.org/wiki/List_of_map_projections

As with many things in visualization, all of the options lie to some degree, and the 'best' solutions require some hand-crafted work to compromise between the various different 'mathematical' solutions.

First some definitions:

the Earth (which is close to being a sphere) rotates about its axis of rotation which passes through the North and South Poles (and note the North Pole is nowhere near the North Magnetic Pole)

We can place a plane halfway between the North Pole and the South Pole and perpendicular to that axis. Where that plane intersects the surface of the Earth we have the Equator allowing us to split the planet into the Northern Hemisphere and the Southern Hemisphere.

Any point on the Earth's surface can be given by its Latitude and Longitude. They are measured in degrees, minutes, and seconds. Each degree is divided into 60 minutes ' and each minute into 60 seconds ". Any position on the surface of the Earth can be given by these two angles.

Lines of latitude (parallels) are parallel to each other and the equator. The North Pole is 90 degrees North or +90. The equator is 0. The South Pole is 90 degrees South or -90 degrees.

Lines of Longitude (meridians) run from pole to pole so they are not parallel to each other. Where is equator makes a nice 0 point for latitude there is no obvious 0 point for longitude so the Prime Meridian is declared to run through the Royal Observatory in Greenwich England. On the opposite side of the planet from the Prime Meridian the longitude is 180 degrees west, or +180 degrees  and 180 degrees east, or - 180 degrees, and is mostly where the International Dateline is chosen to exist. The US is west of the prime meridian.

Here in Chicago we are at 41 degrees, 52 minutes, 13 seconds North and 87 degrees, 38 minutes, and 51 seconds West

another common system in use is UTM https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system
and a big map of them here: https://upload.wikimedia.org/wikipedia/commons/e/ed/Utm-zones.jpg

Data related to the planet also comes referenced in multiple ways. Some data will be in feet / meters / miles / kilometers from a known 0,0 point, some will be given as Latitude, Longitude, some in UTM coordinates. All of them may need to be combined to integrate the data.

issues of  different map projections - https://en.wikipedia.org/wiki/Map_projection

Here is a nice example from FlowingData showing the true size of Africa - https://flowingdata.com/2010/10/18/true-size-of-africa/

and a nice interactive page letting you move countries around - https://thetruesize.com

Today Google maps and Google earth are nice common platforms to distribute geospatial information about the earth.

Here is a map from the LA Times that was updated regularly during the Los Angeles 'Station Fire' in August 2009 to show where the fire was believed to be, where it seemed to be headed, and where important places in the news were located. Its not an overly professional job but it works really well to give current information about a fast changing news story.

During the LA 'Station Fire' this map was used to give hourly air quality reports showing how the affect of the fires reached far beyond their immediate area.

 Cities like Chicago are making a fair amount of data available to the public that can be overlaid onto maps of the city

including crime

abandoned vehicles

grocery stores

Coming Next Time

Project 1 Presentations

last revision 12/15/2020