Week 4

Information Visualization

First lets talk about the Project 1 Presentations next week.

There are a lot of different ways to visualize different types of information so we are going to spend some time looking at the variety.

Last week we saw a bunch of examples of how design can get in the way of understanding, but design is not the enemy - it can be very valuable when used well. The goal is to help people relate data to each other and to other things that the viewer already understands to make the new data easier to put into context, and to make it more usable and actionable.

lets start with a nice static visualization of different espressos

      of contents of different types of espresso
previously available at http://www.lokeshdhakar.com/2007/08/20/an-illustrated-coffee-guide/

a related chart with more data relating caffeine and calories is
The Bizz vs the Bulge - 2d
      chart of cafffeine vs calories for different foods and drinks

and how about the growth in the number of Crayola Crayons where the number of colours doubles every 28 years. Here color, and the increasing variety of colors is the focus of the chart.

growth in the number of colors for Crayola crayons over the

Lets take a look at some examples on visualizing text:

tagCrowd - http://www.tagcrowd.com/

e.g. a tagCrowd comparison of the 50 most used words from the first inaugural presidential address where words that are said more often are larger
Kennedy, Nixon, Reagan, Clinton, Bush jr, Obama, Trump

in the case of Kennedy's speech at the upper left, 'power', the most prominent work was spoken 9 times, while the smallest words in the tag such as 'earth' were spoken twice, and not all of the words used twice made it into the tag.

full texts can be found at: http://www.presidency.ucsb.edu/inaugurals.php

and a site doing similar things to US presidential speeches over time - http://chir.ag/projects/preztags/
this allows a user to brush through these overviews of the text quickly to see what were the important issues of the day.

Ben Fry has a site looking at changes to Darwin's On the Origin of Species through its various editions -

and here was a very nice political one looking at words in the congressional record: http://www.capitolwords.org/congress/111/
here is an archive link: http://web.archive.org/web/20091125090034/http://capitolwords.org/congress/111/

In these cases the visualizations dealt with simple words rather than common phrases, which may be more useful, but require a bit more intelligence to process.

Which leads us into some more dynamic information visualization tools that allow the user to interact with the data and put data into context.

DiskInventoryX, WinDirStat to see relative file sizes on disk using treemaps, flattening out the hierarchies and colour-coding by file type as one example of treemaps. It would be better to avoid the 3D lighting affects however. - http://www.cs.umd.edu/hcil/treemap-history/index.shtml

Once the map is drawn the user can click on a large (or small) box and see it identified in the hierarchy, or click on part of the hierarchy and see its area. Its easy to explore the larger files, much harder to explore the smallest ones, unless one restricts the map to only a subset of the hierarchy.

Here 2D squares are sized appropriately, as opposed to some of the designs we looked at last week, and so they do give the user a good sense of how much space various types of files take up, from many small emails or music files to large virtual machines or movies.

Tree map of files in a
      multi-level directory

A similar styled chart looking at relative amounts of dollars spent (or lost) on various things is The Billion Dollar Gram allows people to compare things that they may not normally think about comparing - depending on what a user is familiar with

The Billion Dollar

The BBC used to have a nice flash-based treemap of the top 100 sites on the internet - http://news.bbc.co.uk/2/hi/technology/8562801.stm
BBC treemap of top internet sites

Newsmap (flash-based) previously showed the news of the moment in a similar style where more important news items are shown larger, and all are colopr coded by topic - http://newsmap.jp

Newsmap of currently covered
      news topics

theme river (flash-based) style:
Theme River of Hollywood films

What are people doing in Japan (java-based)?

and a similar one showing
how people in the US spend their days (flash-based):
How Americans are spending their
      time by hour in the day

name voyager - http://www.babynamewizard.com/voyager
Popularity of different
      first names over time

xkcd recently did a nice combination of name data and chicken pox data -

job voyager (flash-based) used to exist at http://flare.prefuse.org/apps/index
but there is a not-quite-as-good version available at: https://vega.github.io/vega/examples/job-voyager/

Popularity of different jobs over time

another similar interactive visualization is Google's ngrams
Google ngrams screenshot

NY times (flash-based) Billboard Rankings http://www.nytimes.com/interactive/2009/06/25/arts/0625-jackson-graphic.html?hp
NY Times billboard ranking

and another nice one related to the media of music http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html

Popularioty of different
      media for playing music

Traffic Fatality Visualization by John Nelson

Here is a visualization of 50 years of space exploration

Visualization of Space
      Probes to Other Places in the Solar System

and the new Google trends site is interesting http://www.google.com/trends

and (now slightly less) more modern stuff here - http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/

(flash-based) growth of target (earthquake data is often visualized the same way) - http://projects.flowingdata.com/target/
we will talk more about animation later in the course. This one is very good for getting a visceral feel for the rate of expansion and the locations, but for more numeric comparisons it would be good to augment this with a graph showing how many stores open per year across the country or in different regions
Growth of Target stores around
      the US

here is a site with lots of nice examples http://flowingdata.com/

London Underground Map by Harry Beck

Its more of a diagram than a map, as geography is less important than visibility and consistency, so we are going to talk about it here, rather than in the Geospatial notes.

        Underground Diagram

and a shorter 4 minute excerpt:

and you can see the history of the maps at:

Compare this map of the CTA

to this map of the CTA
CTA L train map

and to this map of the CTA

Line Map

        Underground Line Map

Here is a interesting way of visualizing the distance to nearby stats from http://strangemaps.wordpress.com/ in terms of what Earth programs they are just receiving (now a few years out of date):

What TV programs are other star
      systems receiving

and a variant with radio - http://lightyear.fm

and a little closer to home, the history of Earth reduced to 24 hours from http://www.geology.wisc.edu
 (though clocks are usually 12 hours per cycle)

Here is a nicely varied set of visualization of US Immigration data over time from FlowingData.

and a nice visualization of current migration patterns (which would work better on a much larger screen)

MIT's eyebrowse (flash-based) had some interesting visualizations of browser history in 2010 - http://web.archive.org/web/20100107030938/http://eyebrowse.csail.mit.edu/

and there are a variety of things at chartporn.org

There are a very large number of different ways to visualize information including free tools such as Vega  https://vega.github.io/vega-lite/examples/
and some good examples at https://bl.ocks.org/mbostock

and there are quite a few interactive info-viz tools, ranging from full commercial products to very particular coded solutions for particular problems. One nice tool is XmdvTool, though it is only distributed as source that requires QT to compile. It has a homepage at: http://davis.wpi.edu/xmdv/ and source at https://github.com/kaiyuzhao/XmdvTool

which includes being able to interactively brush parallel coordinates and scatterplot matrices (sploms)
Parallel coordinates in
        xmdvtool        Scatterplot matrix in

nView in the early 90s had an interesting twist on this by embedding a 2D map into the parallel coordinates - http://www.youtube.com/watch?v=FI2Wm5CgHSE

R has similar capabilities that you can run in R-studio

in R you can get a list of the pre-installed datasets with

some datasets come within other packages, e.g. loading in the MASS package allows access to the UScereal dataset


once you have some data ...

pairs gives you a scatterplot matrix, e.g. pairs(~calories+protein+sugars+potassium+carbo,data=UScereal)

parcoord gives you a parallel coordinates plot, e.g. parcoord(mtcars [, c("mpg", "disp", "hp", "drat", "wt", "qsec")])

more interactive versions can be found at:
http://mbostock.github.io/protovis/ex/cars.html parallel coordinates for the cars dataset

http://mbostock.github.io/protovis/ex/brush.html scatterplot matrix for the iris dataset

Here are some notes introducing Xmdv from a study done by one of our PhD students, Kyoung Park, a couple years ago.

we will take a look at the cars dataset to get a feel for the tool. The dataset is described at:

Overall, there are 406 observations on the following 8 variables:

Some of these are continuous variables (MPG, displacement, horsepower, weight, acceleration time)
Some are discrete variables (# cylinders, model year)
And one is categorical with no natural ordering (origin)

The XmdvTool site has a downloadable version of the car dataset from the examples. In order to run it with the lite version however you will need to edit the fcars.okc file so that the top of the file looks like this:

7 392
8. 50. 4
2.8 8.2 4
40. 250. 4
1500. 5500. 4
5. 30. 4
69.5 82.5 4
.8 3.2 3
18.000000 8.000000 130.000000 3504.000000 12.000000 70.000000 1.000000

Coming Next Time

Project 1 Presentations

last revision 2/5/19