The purpose of
visualization is to make it easy for the user to see the patterns, the
similarities, the differences in the data.
This involves the variation in the data itself, the variation in the
representation of that data, and the ability of a human being to
In general you do not want to let the computer use its
default values. Unless you are using a specific program for a specific
field the default values will not be right for your work. This is
especially true of programs like Word and Excel (though both have
improved a lot in the last couple years in this regard).
Tables and simple graphs are going to come up fairly often
in visual analytics. Even with all the fancy new visualization options
we have there are still very good reasons to use simple tables and
charts when possible.
with tables - the format of a table can greatly enhance or
reduce the readability.
- Similar numeric data should have the same
number of significant digits and be right justified if a whole number,
or aligned to the decimal point if a real number - this makes it easier
to see values getting bigger or smaller. Either use scientific notation,
or don't, but don't mix scientific notation and standard notation. For
big numbers in standard notation (e.g. 123482656) it can help to use
commas to separate out the thousands, millions, billions places (e.g.
- Row headings with multiple words in
multiple rows are easier to read if they are left justified
- Sans-serif fonts like Helvetica are
easier to read on a computer screen than serif fonts like Times,
though as screens gain resolution this is becoming less of an issue.
- If you have a table with just Yes / No
values for every cell then only show the most appropriate one (e.g. 'Y')
and leave the other one blank or use a '-'. Its hard to scan a table
with entries of Yes / No or Y / N quickly because the words/letters look
too similar. Colour can also be helpful in highlighting the values you
want the viewer to pay attention to.
- Have meaningful column headings and row
labels. If you need multiple rows for the heading then break the words
intelligently; don't let the program break up words where it feels like
- Standardize and use consistent
abbreviations that are familiar to your target audience
- Tables should not cross a page boundary.
If you have a really long table then replicate the column headings at
the top of the next page/slide. A reader should not have to look to a
previous page/slide to see what the column headings are
- If you have a big table it can be useful
to alternate the background colour on adjacent rows with slightly
different colours. This makes it easier to trace across a row.
- Leave enough space between the table
boundaries and the text
Here is a table from the US Environmental Protection Agency from 2002 -
the Total Emissions column of data is centered making it very hard to
compare the values within.
National Carbon_Monoxide Emissions in
|Fossil Fuel Combustion
|Non Road Equipment
|On Road Vehicles
|Residential Wood Combustion
A better version of the table would be the following where both the
sources and the amount of emissions are easier to see and quickly grasp:
National Carbon_Monoxide Emissions in 2002
| Electricity Generation
| Fossil Fuel Combustion
| Industrial Processes
| Non Road Equipment
| On Road Vehicles
| Residential Wood Combustion
| Road Dust
| Solvent Use
| Waste Disposal
Here is a made-up table - its hard to see any pattern in the Yes/No
A better version (if all of the cells are filled with one of two values)
A different better version of the table using colour to help highlight the
pattern would be:
Here is a table from the Nielsen Games page:
The Usage Min % column is hard to read
because its left justified.
This version below is easier to read because the right column of numbers
is right justified. The decimal points align and bigger numbers look
bigger. I also moved the text off the grid lines to make them more
for some more recent related data you can
Be careful of significant digits
Your table should not show more accuracy than the accuracy of
the data collection. The computer will happily compute an average out to
an alarming number of digits, but if you only took measurements to one
decimal point then that's as far as you should show any derived (average,
min, max, median, etc) values.
Programs may also reduce your significant digits by eliminating
trailing zeros (turning 4.20 into 4.2) so you will want to force all the
data of the same type collected in the same way to have the same number of
For presentations, your tables should only show as much accuracy
as needed to get your point across. If two values differ by 100 then you
don't need to show those values to the third decimal place. The additional
detail in the numbers gets in the way of seeing the bigger trend. You can
keep another slide hidden in the slide morgue after the end of your talk
that has all the explicit details in case someone is interested.
Here is another table from the same Nielsen page. Again left justifying
the numbers makes things harder to read, but there are also an issue of
significant digits. We can presume since they have been in the survey
business a long time that they do have faith in their data out to that
degree of significance, and very likely that number of digits is necessary
to disambiguate data further down the table, but since they are just
presenting the top 10, the extra digits get in the way.
The next version makes it easier to see the overall
relationships. Another possible change would be to convert the data on
minutes per week into data on hours per week. Its hard to have an
intuitive sense of '546 minutes'. If you are telling a friend how long a
movie you saw last night was do you say it was 140 minutes long, or do
you say 2 hours and 20 minutes long?
Keep your audience in mind when
creating a table. You will want to keep all of your data in its highest
resolution form, but when you present it, present just the right amount
of detail for the people you will be speaking to. More technical people
will want more detail; less technical people will want the information
at a higher level. Some people want to see detailed trends, others just
overall trends. Don't reuse your charts for different audiences, create
new ones targeted towards the specific audience.
I should point out that if I was creating these tables myself for
these notes then I would use white text on a black background, since these
web pages have a black background
A bit more on text. You have several
general choices of font styles to use
- sans-serif (e.g. Verdana, Tahoma,
Helvetica) good for on-screen text - e.g. 72 dpi
- serif (e.g. Georgia) - good for printed
text - e.g. 150-300 dpi. As screen resolutions increase serif fonts
become more appropriate.
- monospace (e.g. Courier) -
good for certain occasions when you need exact alignment of the text,
usually while coding
- fantasy / cute / brush strokes / cursive / dripping blood - just say
no, unless you are creating a party invitation
And one font, comic
sans, deserves some mention on its own. Here
is one good link (with profanity) about comic sans. In the summer of
2011 there were quite a few blog posts devoted to a 100 page US Army
PowerPoint presentation using comic sans e.g. this
Scientists do this kind of thing as well. How long does it take for you
to read the title screen here? - https://www.youtube.com/watch?v=nLacmrM5xQw
Since we are focusing on
interactive computer-based visualizations, you should start with a
sans-serif font like Helvetica and only change it if you have a very
good reason, or you work with a graphic designer who picks an
appropriate font since they are trained to know when to bend and break
Here is a nice infographic on type - http://www.buzzfeed.com/lenkendall/a-guide-to-typography-infographic-wh6
Familiar words are recognized
O lny srmat poelpe can raed tihs.
I cdnuolt blveiee taht I cluod
aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the
hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it
deosn't mttaer in waht oredr the ltteers in a wrod are, t he olny
iprmoatnt tihng is taht the frist and lsat ltteer be in the rgh it
pclae. The rset can be a taotl mses and you can sitll raed it wouthit a
porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by
istlef, but the wrod as a wlohe. Amzanig huh? yaeh and I awlyas tghuhot
slpeling was ipmorantt! if you can raed tihs psas it on !!"
- lower case words are read faster
than words in upper case
- individual letters and nonsense
words like UA1416 are read faster in upper case
Choosing the right font makes it easy for people to
recognize the words by shape and read efficiently.
- Avoid 3D - really. 3D may make a chart
'exciting' but it also makes it much harder to see relationships. Drop
shadows are OK if you really want a 3D look, but if in doubt you should
remove any bling
- Avoid fully saturated colours (e.g. 255
0 0 red.) Look around you, most of the world is not bright primary
colours. Pastels and colour mixtures are easier on the eyes (note the
difference in Microsoft Excel's colour default pallets at the bottom of
these notes). ggplot2 has a wide range of colors by name - http://sape.inf.usi.ch/quick-reference/ggplot2/colour
- Choose an appropriate value for the axis
to cross depending on what you want the user to see in the chart (e.g.
should the smallest value be 0, which may minimize differences in the
data, or closer to the low point in your dataset, which will emphasize
the differences in the data, but may hide the overall picture)
- The backdrop colour of the chart should
match the backdrop of the page/slide you will put it on. For a paper the
backdrop should be white. For my notes the backdrop should be black
- If it is a line chart then make sure all
the lines are visible against the backdrop - a yellow line is not
visible against a white page and dark blue is not visible against black.
Do not just accept the default colours. If certain lines are related to
other lines then their colours should be related to each other
- If it is a scatter plot of points then
make sure the points are visible against the backdrop. If there is more
than one point icon (e.g. circles and triangles, or red circles and blue
circles) then make sure the viewer can see the difference from where
they will be sitting in a room, or on the device they are most likely
using. It can be very hard to see colour differences on small icons.
- If you are labeling the data points,
columns etc. with their actual values (which is often a good idea) then
make sure the text is readable against any lines in the background and
follow the rules for text above.
- Make sure all axis are well labelled in
terms of what the axis is and the units / categories
- Make sure there is a meaningful title
- Make sure there is a legend
explaining what the various lines / points / colours mean. The legend
should be ordered in a logical way - e.g. if the lines / points /
colours are based on a value like temperature then the legend should
be ordered from low to high (or high to low depending on the field).
If the legend shows several independent things then the legend should
be ordered in the same order as the general trend in the graph (i.e.
things that are generally at the top of the graph should be at the top
of the legend, things that are generally at the bottom of the graph
should be at the bottom of the legend) to make it easier for the user
to map between them.
- If certain data values relate to
one another then they should be next to each other or have a similar
color, so its easier for the viewer to see the relationship,
conversely things that are not related should have different colors.
- If you have a series of graphs showing
related results then try to keep the same axis extents, colours,
patterns etc on all the charts so its easy to compare across them
- It can be very important in certain
fields to show error bars on your charts to show how accurate the
Charts are pretty ubiquitous in visualization and visual
analytics, usually used in combination with several other
representations of the same data (e.g. geographic) so its important to
get the charts right. Its also important to think about the differences
between a static chart used in print or on the web, and a dynamic chart
that the user can manipulate by hovering over elements with a mouse or
clicking on an item in the chart, or having the chart dynamically update
based on interactions with other elements of the visualization.
Here is an example charting the population
of the USA from 2000 to 2007. First up is an overly dynamic 3D chart with
a hard to read set of population numbers and a trend that is made even
more pronounced by the 3D viewpoint. Please do not create charts like
Here is a less exciting but much more useful version where the
data is shown in 2D and the population values have commas to make it
easier to see what the numbers actually are. Another good possibility
would be to make the vertical column "Population (in Millions)" and then
have 270, 275, 280 etc as the vertical values.
Here are a couple variants using lines with the actual data
points highlighted, so you know what data was collected and what was
interpolated. The big difference is in the Y-axis. One chart suggests
there is slow steady growth; the other suggests rapid steady growth. When
you choose the values for the Y-axis here you are making a statement about
what the user will see as their first impression - you can't escape that.
In all these cases a simple interaction is to allow the user to
hover or click on one of the boxes and see the actual data values, or
dynamically change the Y axis.
and now lets go back to the video game console data from above.
First let's see a couple charts from the older version of
Microsoft Excel which just wasn't very good at making charts. While
technically correct, the colours seem random and hurt your eyes, etc.
Please do not create charts like this.
The newer versions of Excel are much better
in dealing with colours and layout, but has also included lots of 3D bling
that should be avoided. 3D distorts the data and adds in unnecessary
details that make it harder to see what's really going on. Please do not
create charts like this.
Instead we can display the data without the 3D. By default Excel
with pick the colours for the various data values as seen above. If the
data values are unrelated then the colours should be unrelated, but here
we could also use the colour to relate consoles made by different
manufacturers (blue for Sony, red for Nintendo, green for Microsoft, and
grey for Other, with the more saturated colours for their latest
The pie chart makes it easy to see how each console compares to
the whole, but the bar chart makes it easy to see how they compare to each
other. In an analysis tool you may need both views simultaneously, and
then additional visualizations to see the values over time.
Here are a couple other pie chart examples. A good one comes
a bad one comes from our local fox news affiliate:
There are many different kinds of charts
A really good book to look at for an introduction to this sort of thing
is Edward Tufte's 'The Visual Display of Quantitative Information.'
Another good reference is Robert Harris' Information Graphics - A
Comprehensive Illustrated Reference. Here is a nice overview of
different kinds of charts:
We will talk about various kinds of charts throughout the
course, though not to this depth.
Naturalness is an important design principle - better when
the properties of the representation match the properties of the thing
being represented. Representations that make use of spatial and
perceptual relationships make more effective use of our brains. If these
representations use arbitrary symbols then we need to use mental
transformations, mental comparisons and other mental processes, forcing
us to think reflectively. In experiential cognition we perceive and
react efficiently. In reflective cognition we use our decision making
Before you create a chart you should know whether it will eventually
appear in colour or greyscale. Colour is more prevalent today than in the
past since more people are getting their information in digital form, but
some conferences and journals will still only print in greyscale, and some
people still get their information through photocopies.
It would be good if the
colours you choose also work for people who are colour blind.
8 percent of men
1 percent of women
Are you colour blind? You can check on Wikipedia - http://en.wikipedia.org/wiki/Ishihara_color_test
Here is an image of a color wheel seen with Protanope and
Deuteranope colour blindness.
and of course there are many apps for showing what color blind
people see using your smartphone's camera. One fairly nice one is
Chromatic Vision Simulator for ios and Android.
Try it out on a weather map like one: http://www.intellicast.com/National/Temperature/HighToday.aspx
You should at least make
sure that you data doesn't blend together or disappear for people who
are colour blind. A really good way is to avoid using green in your
charts since red/green is the most common form of colour blindness, but
that can be pretty limiting. Photoshop can be used to check images
(View menu, Proof Setup, Color Blindness), and a good web site to check your graphics is: http://colorfilter.wickline.org/
There is a nice diagram of the eye at:
Light is focused by the cornea and the lens onto the retina at
the back of the eye.
Vitreous humor - liquid inside the cornea is close to water,
and has the same index of refraction as water. If we are under water the
light is not refracted, but it is refracted if we are not in water.
Light passing through the center of the cornea and lens hits
the fovea (or macula).
Human eye has 2 types of photosensitive receptors: cones and
- operate at higher illumination levels
- provide better spacial resolution and contrast sensitivity
- provide colour vision
- operate at lower illumination levels, most sensitive to green
The cones are highly concentrated at the fovea and quickly
taper off around the retina. For colour vision we have the greatest
acuity at the fovea, or approximately at the center of out field of
vision. Visual acuity drops off as we move away from the center of the
field of view. However, we are very sensitive to motion on the periphery
of our vision, so we can see movement even if we can't see what is
The rods are highly concentrated 10-20 degrees around the
fovea, but almost none are at the fovea itself - which is why if you are
stargazing and want to see something dim you can not look directly at
There is also the optic nerve which is 10-20 degrees away from
the fovea which connects your eye to your brain. This is the blind spot
where there are no cones and no rods. We can not see anything at this
point though we are so used to this that we do not notice it unless we
try to see the blind spot.
Bill Sherman's diagram
Try the following link if you want to see (or not see) your blind spot:
with this being the simplest diagram to use
You need to close your left eye, look at the plus sign with
your right eye and then move your head towards and back from the screen
until the black circle disappears. When you are at the correct distance
the size of your blind spot is about the size of that black circle.
Here is a nice short YouTube video that shows the same effect
with your other eye - http://www.youtube.com/watch?v=O7jpJ12lBjg&feature=relmfu
The brain is really good at filing in that empty space with colors and
What happens when we walk from a bright area into a dark area,
say into a movie theatre? When we are outside the rods are saturated
from the brightness. The cones which operate better at high illumination
levels provide all the stimulus. When we walk into the darkened theatre
the cones don't have enough illumination to do much good, and the rods
take time to de-saturate before they can be useful in the new lower
It takes about 20 minutes for the rods to become very
sensitive, so dark adjust for about 20 minutes before going stargazing.
Since the cones do not operate well at low light intensities
we can not see colour in dim light as only the rods are capable of
giving us information. The rods are also more sensitive to the blue end
of the spectrum so it is especially hard to see red in the dark (it
To human beings, brightness (perceived intensity) has a
logarithmic scale, not a linear scale which gives us a contrast ration
of 100:1 under normal conditions and 1,000,000:1 if we dark adapt.
Our field of view for each eye is 60 degrees inwards towards
the nose and 100 degrees outwards, 60 degrees up and 75 degrees down
The 'resolution' of the average human
eye has been measured by different people in different ways. In general
it seems to be 1 arc minute (where 60 arc minutes = 1 degree), but
that's only in the very center of our field of view at the fovea (within
a few degrees) and under bright lighting conditions, with high contrast
images, and we can only recognize shapes (e.g. the letter E on a Snellen
vision chart) that are twice that big.
distinguishes between colours
how far is the colour from a grey of equal intensity
vivid colours (bright red, royal blue) are highly saturated, further
pastel colours (pink, sky blue) are lightly saturated, closer to grey
perceived intensity of a luminous object
Currently believed there are three
kinds of cones in the human eye, one attuned to red (more like yellow),
one to green, and one to blue (Young and Helmholtz)
Light is electromagnetic energy with wavelengths from 400nm -
peak red response at 580nm (reddish-yellow)
peak green response at 545nm (greenish-yellow)
peak blue response at 440nm
There is a nice graph at http://www.normankoren.com/Human_spectral_sensitivity_small.jpg
So the idea is to add an amount of red and an amount of green
and an amount of blue to produce a wide range of colours.
Unfortunately we can not generate all the colours that the eye
can see using an RGB CRT or LCD or LED or OLED at this point. We also
can not generate all the colours that the eye can see using photographic
film (though it can display a larger part of the visible spectrum than a
Some advice on the use of colour:
- Use colour conservatively
- Limit the number of colours
- Colour can speed recognition, or
hinder it depending on what is coloured and how its coloured. Colour
must support the task(s)
- Colour can help in grouping
- Colour can help in dense
- Colour coding should appear with
minimal user effort and be under the user's control
- Keep colour blindness in
mind (see above)
- Be consistent
- Think about what certain colours
commonly mean / represent (and this varies from culture to culture)
- Be careful what colours are used
together (e.g. bright red on bright blue is really really annoying)
3 kinds of lies: lies, damn lies, and statistics (quote
attributed to several different people)
Here is a comparison of 3 graphics of the same data.
The first is from Time Magazine (4/9/79) via Tufte
The second is from the Sunday Times (12/16/79) via Tufte
The modern graphic below from inflationdata.com is a much more
truthful representation of the data. Both scales are linear and in easy
to understand units. The source of the data is cited. Contextual
information is given at interesting points in the graph.
Nice graphic, so of course we ask how
would you enhance this visualization if it was software-based?
Here is another way to view the price of oil -
geographically - as gasoline prices in the US as of January 2014 by
county from gasbuddy.com. In general prices are pretty similar
within each state showing some variety on a zip code basis.
Back to the
- the representation of numbers,
as physically measured on the surface of the graphic itself, should
be directly proportional to the numerical quantities represented
factor = size of effect shown in graphic vs size of effect in data
- clear, detailed, and thorough labeling should
be used to defeat graphical distortion and ambiguity. Write out
explanations of the data on the graphic itself. Label important
events in the data.
- show data variation not design variation
- the number of information carrying dimensions
depicted should not exceed the number of dimensions in the data
- graphics must not quote data out of context
another one from the New York Times, 8/9/78 via Tufte:
The mileage standards rise from 18
to 27.5 which is a 53% increase, but the difference in the sizes of
the lines representing those values from the New York Times is 783%
which is almost 15 times larger ... dramatic, but not very truthful.
If we graph it without the extra perspective we see the following:
and another one from the Los Angeles Times (8/5/79) via Tufte:
Here a 1D value is represented by a
2D image. The widths of the images are proportional to the values
being represented, and the heights of the images are also proportional
to the values, which makes the visual differences much greater than
the differences in the actual data. If you need to use a series of 2D
images to represent a series of 1D values then the 2D areas of those
images should be proportional to the values.
here is a nice one from http://www.math.yorku.ca/SCS/Gallery/lie-factor.html
There are some graphical
embellishments but basically we have two bar charts showing two
roughly linear series of data ... so what's wrong?
Below is a more truthful version of
the data where the X-axis is spread out linearly:
and finally what Tufte considered
one of the worst
what is this chart telling us? It is
telling us the percentage of college students that were under 25 from
1972 through 1976. That's only 5 values.
Here is a line chart from
http://www.fao.org/worldfoodsituation/en/ showing food prices over
the last four years with the years overlapping which can help show
Here is another county-based map - Pop vs Soda vs Coke from http://www.popvssoda.com/
more data on this at http://en.wikipedia.org/wiki/Names_for_soft_drinks#United_States
There is also a version of this data on a state-by-state
basis. What trends would be hidden by a state-by-state view?
Here is an interesting (flash based)
map of US population from Time Magazine that uses elevation to try and
show the extreme differences in population density across the US.
here are some more bad ones - http://www.math.yorku.ca/SCS/Gallery/say-something.html
and an interactive version for the whole planet: https://pudding.cool/2018/10/city_3d/
another issue is how big to make
your visualization - here is a partial answer from Google (which has
unfortunately moved into their analytics suite and is no longer
available as a standalone - http://browsersize.googlelabs.com/ but
there are still some general statistics available at
Given the movement to smartphones, the most common resolutions are
back to the resolution that desktop computers were at in the mid
2000s, and now with wearables becoming more popular, they are back to
the resolutions of desktop computers in the 1980s (though with much
better colours). Its very important to design for the specific
platform in terms of physical size, resolution, colour representation.
For this week's homework, as always due
at 9pm on Friday, take a look at this site, take the test,
and post a screen
snapshot of your results to your homework
/ class work website send a link to the TA by Friday evening. The
goal here is 2-fold - 1 to really see that there
are a wide variety of colors out there that
can be used for your work not just the fully saturated ones, and 2
- that the distinctions between those colors can be so small
that even though they are different colors, the
users may have a very hard time telling them
Coming Next Time
last revision 1/30/19