Lecture
2
Visualization
Basics
Some basic
principles from Norman:
- Appropriateness
Principle – The visual representation should provide neither
more nor less information than that needed for the task at hand.
Additional information may be distracting and makes the task more
difficult.
- Naturalness Principle
– Experiential cognition is most effective when the properties of the
visual representation most closely match the information being
represented. New visual metaphors are only useful for representing
information when they match the user’s cognitive model of the
information; purely artificial visual metaphors can actually hinder
understanding.
Representations that make use of spatial and perceptual relationships
make more effective use of our brains. If these representations use
arbitrary symbols then we need to use mental transformations, mental
comparisons and other mental processes, forcing us to think
reflectively. In experiential cognition we perceive and react
efficiently. In reflective cognition we use our decision making skills.
- Matching Principle –
Representations of information are most effective when they match the
task to be performed by the user. Effective visual representations
should present affordances suggestive of the appropriate action.
Many analytical
reasoning tasks follow this process
- information gathering
- re-representation of the information in a form that aids analysis
- development of insight through the manipulation of this
representation
- creation of some knowledge product or direct action based on the
knowledge insight.
People
understand new information relative to what is already understood
Schneiderman:
“Overview first, zoom and filter, details on demand”
When information
is first presented, the user should be able to quickly orient
themselves.
When
a map program starts up it should start up with a view that makes it
obvious what the map is showing. Maybe that is using your current
location with your position clearly labelled, or maybe its a view the
country or city that you are accessing the map program from. The zoom
factor should also be appropriate enough - if you are initially zoomed
in too far you may not see enough landmarks to judge the scale of the
map.
Naturalness is an important design principle - better when the
properties of the representation match the properties of the thing
being represented. Representations that make use of spatial and
perceptual relationships make more effective use of our brains. If
these representations use arbitrary symbols then we need to use mental
transformations, mental comparisons and other mental processes, forcing
us to think reflectively. In experiential cognition we perceive and
react efficiently. In reflective cognition we use our decision making
skills.
The purpose of
visualization is to make it easy for the user to see the
patterns, the similarities, the differences in the data. This involves
both the variation in the data itself and the ability of a human being
to perceive variation.
In general you
do not want to let the computer use its default values.
Unless you are using a specific program for a specific field the
default values will not be right for your work.
Principles of
graphical excellence from Tufte
- well-designed presentation of interesting data - a matter of
substance, statistics and design
- complex ideas communicated with clarity, precision, and efficiency
- gives to the viewer the greatest number of ideas on the shortest
time with the least ink in the smallest space
- requires telling the truth about the data
- the representation of numbers, as physically measured on the
surface of the graphic itself, should be directly proportional to the
numerical quantities represented - the lie factor = size of effect
shown in graphic vs size of effect in data
- clear, detailed, and thorough labeling should be used to defeat
graphical distortion and ambiguity. Write out explanations of the data
on the graphic itself. Label important events in the data
- show data variation not design variation
- the number of information carrying dimensions depicted should not
exceed the number of dimensions in the data
- graphics must not quote data out of context
Significant
Digits
Your table should not show more accuracy than the accuracy of the data
collection. The computer will happily compute an average out to an
alarming number of digits, but if you only took measurements to one
decimal point then that's as far as you should show any derived
(average, min, max, median, etc) values. Programs may also reduce your
significant digits by eliminating trailing zeros (turning 4.20 into
4.2) so you will want to force all the data of the same type collected
in the same way to have the same number of significant digits.
Keep your audience in mind when creating a table. You will want to keep
all of your data in its highest resolution form, but when you present
it, present just the right amount of detail for the people you will be
speaking to. More technical people will want more detail; less
technical people will want the information at a higher level. Some
people want to see detailed trends, others just overall trends. Don't
reuse your charts for different audiences, create new ones targetted
towards the specific audience.
Text
You have several
general choices of font styles to use
- sans-serif (eg Helvetica, Arial, Verdana, Tahoma) good for
on-screen text - eg 72 dpi
- serif (eg Times, Georgia) - good for printed text - eg 150-300 dpi
- monospace - good for certain ocassions when you need exact
alignment of the text
- fantasy / cute / brush strokes / cursive / dripping blood - just
say no, unless you are creating a party invitation
Since
we are focusing on interactive computer-based visualizations, you
should start with a sans-serif font like Helvetica and only change it
if you have a very good reason.
all
visualizations should be well labeled with a meaningful title and an
explanatory legend
Colours
Avoid fully
saturated colours (e.g. 255 0 0 red.) Look around you, most
of the world is not bright primary colours. Pastels and colour mixtures
are easier on the eyes - color brewer etc
And some more
about colour:
- Use colour conservatively
- Limit the number of colours
- Colour can speed recognition, or hinder it depending on what is
coloured and how its coloured. Colour must support the task(s)
- Colour can help in grouping related items
- Colour can help in dense information displays
- Colour coding should appear with minimal user effort and be under
the user's control
- Keep colour blindness in mind (see above)
- Be consistent
- Think about what certain colours commonly mean / represent
- Be careful what colours are used together (eg bright red on
bright blue is really annoying)
It would be good
if the colours you choose also work for people who are colour blind.
8 percent of men
1 percent of
women
Are you colour
blind? You can check on wikipedia - http://en.wikipedia.org/wiki/Ishihara_color_test
Here is an image
from my backyard run through vischeck to show how it would look for 3
of the more common types of colour blindness.
You
should at least make sure that you data doesnt blend together or
disappear for people who are colour blind The colours I chose in the
last couple graphs are OK, but an even better way is to avoid using
green in your charts since red/geen is the most common form of colour
blindness. Photoshop can be used to check images (View menu,
Proof Seup, Color Blindness), as can the tool at
http://colororacle.cartography.ch/
and couple good web sites to check your graphics are:
http://www.vischeck.com/vischeck/
and
http://colorfilter.wickline.org/
A really good
book to look at for an introduction to this sort of thing is Edward
Tufte's 'The Visual Display of Quantitative Information.'
Another good
reference is Robert Harris' Information Graphics - A Comprehensive
Illustrated Reference.
What should I do
when I get a new dataset
- look at the meta-data (ideally the rules for this dataset should
have been set up before the data was collected and written down
including the formats, bounds, null values)
- is the data for each attribute within bounds?
- are there values that are missing?
- are attributes that are supposed to be unique really unique?
- look at the data distribution of the good data - are there any
odd outliers?
Data mining By Jiawei Han, Micheline Kamber
Data Cleaning
- missing values
- values out of range
- inconsistent formats
Data Integration
- How can I combine different data sets from different sources
- different date formats ( eg Jan-10-90 or 01.10.90 or 10.01.90 or
01/10/1990 or ...)
- different units of measure (metric vs imperial, lat/lon vs UTM)
- different coverage areas (some data at neighbourhood level,
county level, state level, some collected per month and some per year)
Data
Transformation
- smoothing
- emphasizes longer trends over shorter duration changes
- generalization
- replaces detailed concept with a general one (eg for each data value,
replace a zip code with a state name, or a specific age with an age
range 20-30)
- normalization
- depending how the data will be visualized you may need to transform
it to a given range (eg 0 .0 to 1.0)
- aggregation
- combines / summarizes data (eg add up all data for M, T, W, Th, F and
store the weekly total, or the monthly total, or the yearly total, or
average the data for all zip codes in a state and store the state
average)
aggregation
leads us into the more general concept of data reduction
Miles and
Huberman (1994):
Data
reduction is not something separate from analysis. It is part of
analysis. The researcher’s decisions—which data chunks to code and
which to pull out, which evolving story to tell—are all analytic
choices. Data reduction is a form of analysis that sharpens, sorts,
focuses, discards, and organizes data in such a way that “final”
conclusions can be drawn and verified.
Data Reduction -
gives you a reduced dataset that gives you similar analytical results
- reduce the number of dimensions
- atribute
subset selection - one attribute (eg age) may be derived from
another or directly correlated to another, or might be irrelevant in
the work you are doing so those attributes can be removed
- data
cube aggregation - if you think of all the data you collected as
forming a multi-dimensional cube which each attribute being an edge of
the cube then you can collapse various dimensions down by aggregating
the values (eg the example above taking data for each day of the week
and storing only the weekly total)
- dimensionality
reduction - encoding used to reduce dataset size - may be lossy
or lossless - eg using principal component analysis, wavelets, math
increases rapidly here.
- reduce the amount of collected/generated data
- replace data by a model that generates the data values,
- clustering (eg replace all of the attribute values collected
between depths 10 and 15 wtih the average of those attribute values)
- sampling (keep every nth value, or one random value within each
cluster)
Provenance
data moves through several forms and filters on its way to being
visualized and analysed. Its important to keep track of who has done
what to the data at each step so the validitiy of the final prouct can
be ascertained, and if any issues arrise with the original data
collection or the intermediate steps then its easy to find which data
products are affected.
You wouldn't
just grab data off the web and assume that its correct, would you?
A nice overview
is given in http://www.cs.indiana.edu/pub/techreports/TR618.pdf
Visual
Analytics
The
bible of the field is James J. Thomas and Kristin A. Cook. Illuminating
the Path: The Research and Development Agenda for Visual Analytics.
IEEE Computer Society, 2005. ISBN: 0-7695-2323-4. [Available online as
a free PDF.]
A much shorter overview whitepaper from the University of Konstanz:
http://infovis.uni-konstanz.de/papers/2009/edbs2008.pdf
"Visual Analytics is the science of analytical reasoning facilitated by
interactive visual interfaces. People use visual analytics tools and
techniques to synthesize information and derive insight from massive,
dynamic, ambiguous, and often conflicting data, provide timely,
defensible, and understandable assesments; and communicate assesment
effectively for action. The overall goal is to detect the expected and
discover the unexpected. "
The goal of visual analytics is to facilitate this analytical reasoning
process through the creation of software that maximizes human capacity
to perceive, understand, and reason about complex and dynamic data and
situations.
It must build upon an understanding of the reasoning process, as well
as an understanding of underlying cognitive and perceptual principles,
to provide mission-appropriate interactions that allow analysts to have
a true discourse with their information. The goal is to facilitate
high-quality human judgment with a limited investment of the analysts’
time.
Visual analytics
is a multidisciplinary field that includes the following focus areas:
- Data representations and transformations that convert all types
of conflicting and dynamic data in ways that support visualization and
analysis
- Visual representations and interaction techniques that take
advantage of the human eye’s broad bandwidth pathway into the mind to
allow users to see, explore, and understand large amounts of
information at once
- Analytical reasoning techniques that enable users to obtain deep
insights that directly support assessment, planning, and decision making
- Techniques to support production, presentation, and dissemination
of the results of an analysis to communicate
- An analysis session is a dialogue between the analyst and the
data where the visual representation is the interface into the data.
The
use of visual representations and interactions to accelerate rapid
insight into complex data is what distinguishes visual analytics
software from other types of analytical tools. Visual representations
translate data into a visible form that highlights important features,
including commonalities and anomalies. These visual representations
make it easy for users to perceive salient aspects of their data
quickly. Augmenting the cognitive reasoning process with perceptual
reasoning through visual representations permits the analytical
reasoning process to become faster and more focused.
Visual representations invite the user to explore his or her data. This
exploration requires that the user be able to interact with the data to
understand trends and anomalies, isolate and reorganize information as
appropriate, and engage in the analytical reasoning process. It is
through these interactions that the analyst achieves insight.
Analysts may be
asked to perform several different types of tasks:
- Assess – Understand the current world around them and explain the
past.
- Forecast – Estimate future capabilities, threats,
vulnerabilities, and opportunities.
- Develop Options – Evaluate multiple reactions to potential events
and assess their effectiveness and implications.
Steps in the
Analytical Process
- Determine how to address the issue that has been posed, what
resources to use, and how to allocate time to various parts of the
process to meet deadlines.
- Gather information containing the relevant evidence and become
familiar with it, and incorporate it with the knowledge he or she
already has.
- Generate multiple candidate hypotheses.
- Evaluate these alternative explanations in light of evidence and
assumptions to reach a judgment about the most likely explanations or
outcomes.
- Consider alternative explanations that were not previously
considered.
- Create reports, presentations, or other products that summarize
the analytical judgments. These products summarize the judgments made
and the supporting reasoning that was developed, and the uncertainties
that remain during the analytical process. These products can then be
shared with others.
Analysts must deal with data that is dynamic, incomplete, often
deceptive, and evolving and they often must come to conclusions within
a limited period of time.
Analysis products are expected to clearly communicate the assessment or
forecast, the evidence on which it is based, knowledge gaps or
unknowns, the analyst’s degree of certainty in the judgment, and any
significant alternatives and their indicators.
Visual analytics
systems must capture this information and facilitate its presentation
in ways that meet the needs of the recipient of the information.
Coming
Next
Time
VTK
last
revision
1/2/11