In general you do not want to let the computer use
its default values. Unless you are using a specific program for
a specific field the default values will not be right for your
work. This is especially true of programs like Word and Excel
(though both have improved a lot in the last couple years in
this regard).
Lets
start with tables - the format of a table can greatly
enhance or reduce the readability.
Here is a table from the US Environmental Protection Agency from
2002 - the Total Emissions column of data is centered making it
very hard to compare the values within, the source sector is
centered making it harder to read.
Source Sector | Total Emissions |
---|---|
Electricity Generation | 652,314 |
Fires | 14,520,530 |
Fossil Fuel Combustion | 1,499,367 |
Industrial Processes | 2,414,055 |
Miscellaneous | 33,786 |
Non Road Equipment | 22,414,896 |
On Road Vehicles | 62,957,908 |
Residential Wood Combustion | 2,704,197 |
Road Dust | 0 |
Solvent Use | 3,294 |
Waste Disposal | 2,018,496 |
A better version of the table would be the following where both
the sources and the amount of emissions are easier to see and
quickly grasp:
Source Sector | Total
Emissions |
---|---|
Electricity Generation | 652,314 |
Fires | 14,520,530 |
Fossil Fuel Combustion | 1,499,367 |
Industrial Processes | 2,414,055 |
Miscellaneous | 33,786 |
Non Road Equipment | 22,414,896 |
On Road Vehicles | 62,957,908 |
Residential Wood Combustion | 2,704,197 |
Road Dust | 0 |
Solvent Use | 3,294 |
Waste Disposal | 2,018,496 |
Here is a made-up table - its hard to see any pattern in the
Yes/No Values.
Yes |
No |
Yes |
Yes |
No |
No |
No |
Yes |
Yes |
No |
No |
No |
Yes |
Yes |
No |
A better version (if all of the cells are filled with one of two
values) would be:
Yes |
- |
Yes |
Yes |
- |
- |
- |
Yes |
Yes |
- |
- |
- |
Yes |
Yes |
- |
A different better version of the table using colour to help
highlight the pattern would be:
Yes |
No |
Yes |
Yes |
No |
No |
No |
Yes |
Yes |
No |
No |
No |
Yes |
Yes |
No |
Here is a table from the Nielsen Games page:
https://www.nielsen.com/us/en/insights/reports/2018/us-games-360-report-2018.html
The Usage Min % column is hard to
read because its left justified.
This version below is easier to read because the right column of
numbers is right justified. The decimal points align and bigger
numbers look bigger. I also moved the text off the grid lines to
make them more readable.
for some more recent related data you can check out:
https://www.nielsen.com/us/en/insights/news/2015/game-consoles-in-2015-one-stop-shop-for-games-and-entertainment.html
Be
careful of significant digits
Your table
should not show more accuracy than the accuracy of the data
collection. The computer will happily compute an average out to an
alarming number of digits, but if you only took measurements to
one decimal point then that's as far as you should show any
derived (average, min, max, median, etc) values.
Programs may
also reduce your significant digits by eliminating trailing zeros
(turning 4.20 into 4.2) so you will want to force all the data of
the same type collected in the same way to have the same number of
significant digits.
For
presentations, your tables should only show as much accuracy as
needed to get your point across. If two values differ by 100 then
you don't need to show those values to the third decimal place.
The additional detail in the numbers gets in the way of seeing the
bigger trend. You can keep another slide hidden in the slide
morgue after the end of your talk that has all the explicit
details in case someone is interested.
Here is another table from the same Nielsen page. Again left justifying the numbers makes things harder to read, but there are also an issue of significant digits. We can presume since they have been in the survey business a long time that they do have faith in their data out to that degree of significance, and very likely that number of digits is necessary to disambiguate data further down the table, but since they are just presenting the top 10, the extra digits get in the way. |
The
next version makes it easier to see the overall
relationships. Another possible change would be to
convert the data on minutes per week into data on hours
per week. Its hard to have an intuitive sense of '546
minutes'. If you are telling a friend how long a movie
you saw last night was do you say it was 140 minutes
long, or do you say 2 hours and 20 minutes long? Keep your audience in mind when creating a table. You will want to keep all of your data in its highest resolution form, but when you present it, present just the right amount of detail for the people you will be speaking to. More technical people will want more detail; less technical people will want the information at a higher level. Some people want to see detailed trends, others just overall trends. Don't reuse your charts for different audiences, create new ones targeted towards the specific audience.
|
|
Choosing the right font makes it easy for people to recognize the words by shape and read efficiently.
Here is an example charting the population of the USA from 2000 to 2007. First up is an overly dynamic 3D bar chart with a hard to read set of population numbers and a trend that is made even more pronounced by the 3D viewpoint. Please do not create charts like this. |
|
Here are a couple variants using lines with the actual data points highlighted, so you know what data was collected and what was interpolated. The big difference is in the Y-axis. One chart suggests there is slow steady growth; the other suggests rapid steady growth. When you choose the values for the Y-axis here you are making a statement about what the user will see as their first impression - you can't escape that.
and now lets
go back to the video game console data from above.
First let's see a couple charts from the older version of Microsoft Excel which just wasn't very good at making charts. While technically correct, the colours seem random and hurt your eyes, etc. Please do not create charts like this.
The newer versions of Excel are
much better in dealing with colours and layout, but has also
included lots of 3D bling that should be avoided. 3D distorts the
data and adds in unnecessary details that make it harder to see
what's really going on. Please do not create charts like this.
Instead we can
display the data without the 3D. By default Excel with pick the
colours for the various data values as seen above. If the data
values are unrelated then the colours should be unrelated, but
here we could also use the colour to relate consoles made by
different manufacturers (blue for Sony, red for Nintendo, green
for Microsoft, and grey for Other, with the more saturated colours
for their latest releases.)
The pie chart makes it easier to see how each console compares to the whole, but the bar chart makes it easy to see how they compare to each other. A bar chart makes it easier to estimate the actual amount compared to a pie chart if you don't have the actual values displayed. Bar charts are better the more categories you have as the slices of the pie get smaller and harder to discern with more and more categories. In an analysis tool you may need both views simultaneously, and then additional visualizations to see the values over time.
Another option for the same data is a stacked bar chart, which makes it easier to estimate numbers from the chart and can make better use of space than a pie chart.
A line chart would not be appropriate to show this data because the data is categorical (an XBOX 360 is not 'more' or 'less' than a wii - its just a different category), and there is no natural ordering between the categories, and there is no continuous space between an Xbox 360 and a wii for there to be a trend shown by a line.
Here are a
couple other pie chart examples. A good one comes from:
https://flowingdata.com/2008/09/19/pie-i-have-eaten-and-pie-i-have-not-eaten/
a bad one
comes from our local fox news affiliate:
https://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/
There are many different kinds of charts
A really good book to look at for an introduction to this sort
of thing is Edward Tufte's 'The Visual Display of Quantitative
Information.'
Another good reference is Robert Harris' Information Graphics -
A Comprehensive Illustrated Reference. Here is a nice overview
of different kinds of charts
Another nice interactive icon based list is https://datavizcatalogue.com/
There are a variety of diagrams people have come up with to try and help people choose what kind of chart to use in different situations - a good list is available at https://policyviz.com/2014/10/06/graphic_continuum_inspiration/ - and just as many critiques.
In general we
will try and stick to very common chart types in this class, ones
you are most likely to encounter, and ones that are most likely to
be easily understood by people you want to present to. As you move
deeper into general data visualization, and especially data
visualization in particular fields you will find very particular
types of charts being used that are better for that particular
type of data and usage.
It would be good if the colours
you choose also work for people who are colour blind.
8 percent of
men
1 percent of women
Are you colour
blind? You can see at https://www.color-blindness.com/ishiharas-test-for-colour-deficiency-38-plates-edition/
Here is an
image of a color wheel seen with Protanope colour blindness.
You should at least make sure that you data doesn't blend together or disappear for people who are colour blind. A really good way is to avoid using green in your charts since red/green is the most common form of colour blindness, but that can be pretty limiting. Photoshop can be used to check images (View menu, Proof Setup, Color Blindness), and a good web site to check your graphics is: https://www.toptal.com/designers/colorfilter
and of course there are many apps for showing what color blind people see using your smartphone's camera. One fairly nice one is Chromatic Vision Simulator for ios and Android.
Try it out on a weather map like one: https://www.wunderground.com/maps/temperature/us-current
or
https://www.wunderground.com/maps/temperature/us-current
hue:
saturation:
vivid colours (bright red, royal blue) are highly saturated, further from grey pastel colours (pink, sky blue) are lightly saturated, closer to grey brightness: perceived intensity of a luminous object |
Unfortunately we can not generate all the colours that the eye can see using an RGB CRT or LCD or LED or OLED at this point. We also can not generate all the colours that the eye can see using photographic film (though it can display a larger part of the visible spectrum than our current displays)
Some advice on the use of
colour:
A really good place to get advice on what colors and sets of colors to use is https://colorbrewer2.org/
We will be talking about colour more during the class and how the choice of colours depends on the data you are representing. Are the colors showing different categories like the videogame consoles where the colors should explicitly not suggest that one color (or console) is 'more' or 'less' than another? Then pick a qualitative color scheme. Are the colors showing sequential or numerical data like the amount of rain expected or the temperature on a weather map where the colors should explicitly suggest that one color is more or less than another? Then pick a sequential color scheme. Are you trying to highlight the data that is more or less than a particular value like which areas are getting more or less that their average amount of rainfall? Then pick a diverging color scheme.
Here is a
comparison of 3 graphics of the same data.
The first is
from Time Magazine (4/9/79) via Tufte
The second
is from the Sunday Times (12/16/79) via Tufte
The modern
graphic below from inflationdata.com is a much more truthful
representation of the data. Both scales are linear and in easy
to understand units. The source of the data is cited. Contextual
information is given at interesting points in the graph. The
chart on the left shows oil prices. The chart on the right shows
gasoline prices, which is something people can relate to more.
Nice
graphic, so of course we ask how would you enhance this visualization if it was
software-based?
Year |
Percentage |
1972 |
72.0 |
1973 |
70.8 |
1974 |
67.2 |
1975 |
66.4 |
1976 |
67.0 |
There is
also a version of this data on a state-by-state basis. What
trends would be hidden by a state-by-state view?