Week 5

Geospatial Visualization



A few examples to start  ...

NY Times - Mapping America was a nice interactive letting you visualize census data using colored points.

https://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html

In the case of the first image below each dot represented 200 people, the color gave racial information, and the dot was located within the census tract for those 200 people. Instead of coloring a census tract by the majority, this gives a more nuanced view of the data, making it easier to see areas that are more and less mixed, and areas that are more and less populated.

The application also lets you pan and zoom across all of the US, including Alaska, Hawaii, and Puerto Rico, giving you a sense of the population density and the racial makeup of different regions at different scales.


here is a nice map of cell phone strength
https://webcoveragemap.rootmetrics.com/us

What can we say about the people in an area - you can enter a zip code to explore an area using their tool.
https://www.esri.com/en-us/arcgis/products/tapestry-segmentation/zip-lookup


Visual contrasts established by manipulating perceptual qualities

the following are retinal variables - perceived immediately and effortlessly - fundamental units of visual communication

retinal variables


Information represented in a visual display is characterized by


Lets do an example. For each of the two projects below write what you think the length and scale are for each dimension in a word processing file. Print the file and add it to your gradescope submission for the week. Note that for many data visualization tasks the data will constantly be increasing as more is collected, so its important not only to think of the values currently in the dataset, but the values that are likely to be in the dataset as it grows.


The data from Project 1 in 2020, which can be found at https://www.evl.uic.edu/aej/424/litterati challenge-65.csv , included the following 11 dimensions. Take a few minutes and decide on the length and scale of each of these dimensions.


The data from Project 1 in 2019, available at https://aqs.epa.gov/aqsweb/airdata/annual_aqi_by_county_2019.zip , included the following 19 dimensions. Take a few minutes and decide on the length and scale of each of these dimensions.


We can look back at the MicrobeScope example. Here all of the points have the same Size, but there are different Shapes (circles, diamonds, triangles) and different Hues (dark blue, light blue, pink, etc.) and they are in different Positions in the
chart.

Here microbe type (shape) and primary transmission method (hue) are nominal / categorical.
Deadliness, Contagiousness, and the other X and Y axis options are quantitative


MicrobeScope

Nominal - User interested in categorizing




In ordered perception the viewer must determine the relative ordering of values along a perceptual dimension. Given any two visual elements, a natural ordering must be clearly apparent so the element representing 'more' of the corresponding quality is immediately obvious


In quantitative perception the viewer must determine the amount of difference between two ordered values. The user does not need to refer to an index or key - the relative magnitudes must be immediately apparent


Visual variables differ substantially in length:




here are some more examples from our textbook:




The Principles of Symbolization chapter from Thematic Cartography and Geovisualization, 3rd ed. by Slocum, McMaster, Kessler, and Howard gives a nice introduction on mapping data to symbols so we will use several examples from it below.



A lot of data today can represented geographically, as the popularity of Google maps / earth can attest to, so its a nice place to start looking at details.


Nature of Geographic Phenomena:

Spatial Dimension


Discrete vs Continuous and Abrupt vs Smooth Phenomena
    discrete - occur at distinct locations (and have a space between them)
    continuous - occur throughout a region of interest

    abrupt - can change suddenly
    smooth - change gradually

Discrete-Continuous / Abrupt-Smooth Phenomena
Figure 5.1 from Thematic Cartography showing phenomena and appropriate ways of representing them


Distinction between data that has been collected to represent a phenomenon and the phenomenon being mapped
we are typically collecting data at discrete sites (weather stations, well sites) or aggregating over small regions (counties, states) where the actual phenomena being modeled (e.g. the temperature outside) is continuous. Other times we are collecting discrete data on a discrete phenomena (e.g. electricity usage at this address).

Type of visualization used depends both on the nature of the underlying phenomenon and the purpose of the map


We often deal with continuous data represented by discrete sampling

A familiar example is a weather map showing the current temperature across the state or country, but the data is only sampled at certain scattered stations which is then interpolated. You can click on the map to gain access to the data files and to see how the data is interpolated across the state. Here are the FAA sites in Illinois https://www.faa.gov/air_traffic/weather/asos/?state=IL


and the Weather underground https://www.wunderground.com/wundermap sites near campus

Interpolation is then used to predict the values in between, with a variety of possible methods.

shepards method is one way to perform that interpolation - https://en.wikipedia.org/wiki/Inverse_distance_weighting

Air quality data usually changes slowly but if you live in Hawaii near a volcano or live in an area prone to forest fires, the values may change much faster and may regularly impact your life, so dashboards that show the value now and predict it into the future, like we typically predict rain on maps, can be very useful.

This map is nice as you can see the values for the individual stations and the contours generated from that data, which also helps to see how the contours are generated from the individual data locations.

https://gispub.epa.gov/airnow/



Visual Variables:


visual variables for qualitative should reflect only a nominal level of measurement - i.e. there shouldn't be a sense that one value is 'more' than another, just that they are different.


Visual Variables for Qualitative
      Phenomena
Figure 5.4 from Thematic Cartography

so for example as I look at the areal visualization I don't (and I shouldn't) get a sense of which area is higher or lower, or has more cows, etc.

Why are 2.5D representations not recommended for qualitative phenomena?

note that only different hues are used for qualitative data - not saturation or lightness which have an obvious ordering

much of this work was done at a time when a line plotter was the tool to make drawings like this, giving very high resolution vector black and white drawing capability. Today those maps still exist but more work is done on bitmapped displays with lower resolution but a greater use of colour.


Here is a more appropriate use of a wide variety of colors showing data from Facebook on the favorite American Football teams for various counties, though not without its problems.





On the other hand, visual variables for quantitative should reflect ordinal, interval, or ratio level of measurement

Visual Variables for
      Quantitative Phenomena

so for example as I look at the areal visualization I do get a sense of which area is higher or lower.

here is an example of different ways of using colour to map life expectancy in the US which is quantitative. Which is more readable? the first from mapoftheunitedstates.org or the second from www.measureofamerica.org


So, lets try an in class activity to make this more clear, and for that we will take a look at mapping the data for the winning countries of the Eurovision Song Contest from 1956 to the present - https://en.wikipedia.org/wiki/Eurovision_Song_Contest

Here are the totals:

  
Wins Countries
7 Ireland
6 Sweden
5 France, Luxembourg, United Kingdom, Netherlands
4 Israel
3 Norway, Denmark, Italy
2 Spain, Switzerland, Germany, Austria, Ukraine
1 Monaco, Belgium, Yugoslavia, Estonia, Latvia, Turkey, Greece, Finland, Serbia, Russia, Azerbaijan, Portugal

and there is a current map of Europe available here:

You can use a computer/tablet to create a visualization with the map, or print out the map and use colored pencils, or some other visualization primitives and then take a photo of the map, convert it to pdf and add it to your gradescope submission for the week.

You will find that even with small datasets it can be hard
- some locations / countries / streets / buildings may no longer exist
- some locations may be very small and harder to shade, color, or add glyphs to

- some locations may be off the map




Comparison of choropleth, proportional symbol, isopleth, and dot mapping:


choropleth
  • commonly used to portray data collected for units such as counties or states
  • regions are shaded / colored based on the phenomena
  • good for when values change abruptly at unit boundaries but hides variation within units, and the boundaries may be artificial in relation to the phenomena.

isopleth (contour map)
  • good when data collected was from a smooth continuous phenomenon
  • regions are shaded / colored based on the phenomena
  • interpolating set of isolines between sample points of known values

proportional symbol
  • scale symbols in proportion to the magnitude of the data
  • symbol might be a true point (located at a data collection point) or a conceptual point (at the center of a unit)

dot mapping
  • one dot is set equal to a certain amount of the phenomenon
  • dots should be placed where the phenomena occurs (much higher level of accuracy than other maps)
Thematic Mapping
              Techniques
Figure 5.10 from Thematic Cartography



R has variety of nice libraries and available data to help with this kind of thing. The following code reads in data on counties in the US as well as a file of data to map onto those counties (in this case number of electric vehicles, population, and number of passenger cars registered in each Illinois county). Since the county names match in both files its pretty simple to join them together and then display the results as a couple different choropleth maps.

Create a new Jupyter notebook for this activity and when you are done print out that notebook and add it to your gradescope submission for the week.

you should start by installing some packages: ggmap, mapdata, ggthemes, sp

depending on your platform you may be able to do this from within the Jupyter notebook itself with the typical R install.packages("whatever") command or you may need to go back to anaconda navigator, click on Environments, (whatever you named your R environment back in week 2), then Open Terminal, and then issue the following commands at the terminal

(in general anaconda comes with a bunch of essential libraries pre-loaded but you can add others.  if you go to anaconda.org you can use SEARCH PACKAGES to search for packages (e.g. ggmap) and get a list of packages with that name. Clicking on the most popular one takes you to a page with a set of conda install commands - usually the first simple one is all you need)

conda install -c conda-forge r-ggmap
conda install -c conda-forge r-mapdata
conda install -c conda-forge r-ggthemes
conda install -c conda-forge r-sp

usually these commands have worked for me, but I have found that if I have loaded in some other odd packages into anaconda that conda hits some incompatibilities and has a hard time dealing with them. In those cases I have found it simpler to start over with a new R environment (the notes from week 2) and then install the packages above.


#
# example of mapping data onto Illinois counties
# based on example from https://people.ohio.edu/ruhil/Rbook/maps-in-r.html

library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(ggthemes)
library(sp)
library(stringr)
library(plyr)

usa <- map_data("county")
il <- subset(usa, region == "illinois")

il$county = str_to_title(il$subregion)

#basic map with county boundaries and county names at the centroid of the county
getLabelPoint <- # Returns a county-named list of label points
    function(county) {Polygon(county[c('long', 'lat')])@labpt}
centroids = by(il, il$county, getLabelPoint)     # Returns list
centroids2 <- do.call("rbind.data.frame", centroids)  # Convert to Data Frame
centroids2$county = rownames(centroids)
names(centroids2) <- c('clong', 'clat', "county")

#simple map with county borders and county names
ggplot() + geom_polygon(data = il, aes(x = long, y = lat, group = group), fill = "white", color = "gray") + coord_fixed(1.2)  + geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "darkblue", size = 2.25)  + theme_map()

# read in data on the number of electric vehicles registered in each county
evs <- read.table(file="http://www.evl.uic.edu/aej/424/EVs_in_IL_2021.csv", sep=",", header=TRUE)

# under windows the first column header gets corrupted - this fixes it
names(evs)[1] <- 'county'

# as usual here you should take a look at the data and see if the numbers make sense

#combine the county data and the EV data - they have the 'county' attribute in common
# join keeps the original ordering where merge does not
ilCountyPlusEV <- join(il, evs)

#plot the population per county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = population), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges"
, direction=1) +
    labs(fill = "population") + theme_map()

#plot the percentage of cars to people in each county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = percent_cars), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges"
, direction=1) +
    labs(fill = "car %") + theme_map()

#plot the total number of EVs per county
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = evs), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges"
, direction=1) +
    labs(fill = "# of EVs") + theme_map()

#plot the percentage of cars that are EVs in the county and reverse the scale
ggplot() + geom_polygon(data = ilCountyPlusEV, aes(x = long, y = lat, group = group, fill = percent_cars_evs), color = "black") + coord_fixed(1.2) +
    geom_text(data = centroids2, aes(x = clong, y = clat, label = county), color = "black", size = 2.25) + scale_fill_distiller(palette = "Oranges", direction=1) +
    labs(fill = "EV %") + theme_map()

#There are better (more complicated) ways to use color here
#details at
https://ggplot2.tidyverse.org/reference/scale_brewer.html
#but this should give you a starting point for what you can do


Choropleth of car ownership in
        Illinois counties

after this, try and show the counties in a different US State or show another set of countries as in this web page https://www.datanovia.com/en/blog/how-to-create-a-map-using-ggplot2/

this makes it really easy to map data onto filled county / state / country regions, though as we saw above, choropleth maps are not always the most intuitive representation.




Here are some other ways of displaying data onto geographic entities from Information Graphics -  A Comprehensive Illustrated Reference. There are many ways to do it, but all require having enough space.


  


There are also pictographic symbols
Pictographic
        Representation

Here its pretty easy to make rough comparisons - its tricky to make exact comparisons as its hard to avoid the lie factor. Even avoiding that, we have to be careful about the conclusions drawn from an image like this. The states have very different populations (as we talked about last week comparing Wyoming to California) and its not directly correlated to the size of the state. One issue directly related to the size of the states is the overlap of the icons in the northeast making it hard to make any sense of the data there.


Here is an image showing crime statistics compared to the average over time for the US from CommonGIS via the Thematic Cartography and Geovisualization book showing why this can be tricky ...




Here are some variations of the typical red/blue election map for the 2008 presidential election
http://www-personal.umich.edu/~mejn/election/2008/

and a different way to view election data using dots by John Nelson
http://uxblog.idvsolutions.com/2012/11/election-2012.html

and a 2019 solution to the problem by making each US congressional district the same size and trying to keep the states in the correct shape and relative position to each other, though the locations of the districts themselves within each state are only correct relative to each other. In this case geography is distorted to make areas of roughly equal population more clear.




When looking at much larger regions of the planet the issues are a bit more complex. A 'flat' map can be a good way to see data from all over the planet simultaneously, but it does add in distortions as the earth is sphere-ish, and trying to represent a sphere, or even a portion of it, on a rectangular plane will generate errors. Other tools like Google Earth can be used to map data onto a spherical Earth model, but then there are issues of only being able to see part of the planet at one time.


XKCD has a nice overview of some of the most common map projection types

and Wikipedia has an extensive list - https://en.wikipedia.org/wiki/List_of_map_projections

As with many things in visualization, all of the options lie to some degree, and the 'best' solutions require some hand-crafted work to compromise between the various different 'mathematical' solutions.


First some definitions:

the Earth (which is close to being a sphere) rotates about its axis of rotation which passes through the North and South Poles (and note the North Pole is nowhere near the North Magnetic Pole)

We can place a plane halfway between the North Pole and the South Pole and perpendicular to that axis. Where that plane intersects the surface of the Earth we have the Equator allowing us to split the planet into the Northern Hemisphere and the Southern Hemisphere.

Any point on the Earth's surface can be given by its Latitude and Longitude. They are measured in degrees, minutes, and seconds. Each degree � is divided into 60 minutes ' and each minute into 60 seconds ". Any position on the surface of the Earth can be given by these two angles.

Lines of latitude (parallels) are parallel to each other and the equator. The North Pole is 90 degrees North or +90. The equator is 0. The South Pole is 90 degrees South or -90 degrees.

Lines of Longitude (meridians) run from pole to pole so they are not parallel to each other. Where is equator makes a nice 0 point for latitude there is no obvious 0 point for longitude so the Prime Meridian is declared to run through the Royal Observatory in Greenwich England. On the opposite side of the planet from the Prime Meridian the longitude is 180 degrees west, or +180 degrees  and 180 degrees east, or - 180 degrees, and is mostly where the International Dateline is chosen to exist. The US is west of the prime meridian.

Here in Chicago we are at 41 degrees, 52 minutes, 13 seconds North and 87 degrees, 38 minutes, and 51 seconds West



another common system in use is UTM https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system
and a big map of them here: https://upload.wikimedia.org/wikipedia/commons/e/ed/Utm-zones.jpg

Data related to the planet also comes referenced in multiple ways. Some data will be in feet / meters / miles / kilometers from a known 0,0 point, some will be given as Latitude, Longitude, some in UTM coordinates. All of them may need to be combined to integrate the data.

issues of  different map projections - https://en.wikipedia.org/wiki/Map_projection

Here is a nice example from FlowingData showing the true size of Africa - https://flowingdata.com/2010/10/18/true-size-of-africa/

and a nice interactive page letting you move countries around - https://thetruesize.com





Today Google maps and Google earth are nice common platforms to distribute geospatial information about the earth.

Here is a map from the LA Times that was updated regularly during the Los Angeles 'Station Fire' in August 2009 to show where the fire was believed to be, where it seemed to be headed, and where important places in the news were located. Its not an overly professional job but it works really well to give current information about a fast changing news story.



During the LA 'Station Fire' this map was used to give hourly air quality reports showing how the affect of the fires reached far beyond their immediate area.




 Cities like Chicago are making a fair amount of data available to the public that can be overlaid onto maps of the city
https://data.cityofchicago.org/browse?limitTo=maps

including crime
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Map/c4ep-ee5m

abandoned vehicles
https://data.cityofchicago.org/Service-Requests/Abandoned-Vehicles-Map/hxh5-e8eh

grocery stores
https://data.cityofchicago.org/Health-Human-Services/Grocery-Store-Status-Map/rish-pa6g


Coming Next Time

Project 1 Presentations


last revision 2/25/2022