Project 1
will be an individual project to give people practice with writing
an application in D3 and get everyone ready to contribute to the
group projects to come. In this project everyone will learn how to
import data, how to write an interactive application, and how to
create an effective user interface for viewing and analyzing this
data. This will give everyone a common basis for communication in
the later group projects where people will start to specialize in
different tasks.
The project will focus on using basic graphs to show and compare
demographic data for different areas of the city of Chicago
.
Chicago is semi-officially divided into 9 districts (or sides),
further subdivided into 77 community areas, and further into
around 200 neighborhoods. You can see an overview here: http://en.wikipedia.org/wiki/Community_areas_in_Chicago
and here: http://en.wikipedia.org/wiki/Neighborhoods_in_Chicago
We are going to concentrate on the 77 community areas in this
project. When the application starts the user should see a map
of the city areas and graphs showing an overview of the
demographic data for the entire city. The user will then be able
to bring up data on individual community areas or districts to
compare them to the city as a whole and to other community areas
or districts.
Part of the display for your application will be a map of the
districts and community areas of the city (the figure on the
wikipedia page should give you an idea what this should look
like) so the user can see where this data is coming from. The
user should be able to chose a community area or district to
display using the map or a textual list of districts and
community areas.
All of the graphs should be well labelled and have common axis
and colors to make comparison easier. The user should be able to
show the data in different ways (e.g. pie charts / column charts
/ raw data tables) depending on his/her need.
The
user should be able to bring up information on who wrote the
project, where the data came from, etc.
The
data files you will start with are:
demographic
data in excel format:
http://robparal.blogspot.com/2012/05/hard-to-find-census-data-on-chicago.html
neighborhood borders as latitude longitude data in a json file:
https://gist.githubusercontent.com/divideby0/4942176/raw/everyblock-neighborhoods-chicago.json
The file will require some formatting and
cleaning: adding some
headers, commas to separate the areas, and reordering some of
the points to make all of the areas behave correctly when you
draw them. Data on the web is usually not in exactly the format
you want it in and cleaning is a big part of the process.
The Chicago Haalth Atlas makes use of a community area level
json file. This file could also give you a good starting point.
e.g. http://www.chicagohealthatlas.org/map/healthcare_providers_dentists.json
here
is a short bit of code to view the health atlas map
<!DOCTYPE html>
<html>
<head>
<title>Chicago</title>
<script
src="http://d3js.org/d3.v3.min.js"></script>
</head>
<body bgcolor="#000000">
<div id="viz"
style="width:100%; height:100%; float:left; background:
#888"></div>
</div>
</div>
<script>
var svg = d3.select("#viz")
.append("svg:svg")
.attr("width", 1200)
.attr("height", 800);
d3.json("http://www.chicagohealthatlas.org/map/healthcare_providers_dentists.json",
function(json) {
var center = d3.geo.centroid(json)
var scale = 80000;
var offset = [400, 400];
var projection = d3.geo.mercator().scale(scale).center(center)
.translate(offset);
var path = d3.geo.path().projection(projection);
svg.selectAll("path")
.data(json.features)
.enter()
.append("path")
.attr("d", path);
});
</script>
</body>
</html>
Note that the data in the data files is rather detailed. You will
probably want to come up with an intelligent way to cluster the
data into fewer categories. Dealing with data files is a major
part of visualization, and most are not as well formatted as
these. If there are too many slices in a pie chart, too many bars
in a bar chart, or so much text that it is unreadable, then you
need to do something smart to make it usable, and document those
decisions in your writeup. Saying 'that is what the computer did'
is never an acceptable answer from a computer scientist.
You will be writing your code to run in a web browser and it
should run on all current browsers (Chrome, Safari, Firefox,
Explorer, etc) but the main evaluation and demonstration will be
done on our classroom wall which runs the latest stable version of
Chrome. This shows the advantage of scalable graphics as your code
should naturally scale up to the larger display. The screen size
will be 8196 x 2188 (which is roughly the same aspect ratio as two
HD monitors side by side).
The application should have obvious
and intuitive controls. We will use the touch overlay on the
wall so your features should be accessible assuming the user
only has a single button mouse.
While scalable graphics scale pretty well, user interaction is a
bit different on a large wall so you should plan on spending
some time testing on the actual wall during office hours to make
sure your application works as expected.
One of the major goals here is to experiment with different ways
to visualize the same data so no other libraries can be used
without prior permission (e.g. no xCharts, rickshaw, etc). You
are in control of the visualization and interaction and you
should not feel limited by what some other libraries provide.
You can use a database to store the data if you wish or flat
files or the cloud. You can use external tools to process the
data as long as you have a pipeline you can document in your
writeup.
For
a C you need:
- static well labelled map of the Chicago districts and
community areas
- ability to show demographic data for Chicago as a whole
and compare it to any one of the 77 community areas (data
for all 77 areas should be available but the user will only
bring up data on one of those areas at a time, and the user
can move from one area to another)
- user can chose a community area from a textual list of
community areas
- ability for user to show data on Age and Gender as well
designed bar charts, pie charts, or raw data
For
a B you need to add:
- ability for user to also show data on race and place of
origin as bar charts, pie charts, or raw data
- user can choose a district from a textual list and see
district level data (by aggregating data from the relevant
community areas)
- ability to show data on up to 3 of the community areas or
districts simultaneously to allow the user to compare them
to each other and to the city as a whole
For
an A you need to add:
- interactive well labelled map of the Chicago districts and
community areas showing the area(s) currently being
investigated, as well as allowing the user to chose a
community area or district from the map (be sure users on
the classroom wall can reach the top of the map)
- ability to bring up a graph showing the difference between
two selected areas
- ability to colour in the community areas and districts on
the map as a heatmap based on one of at least 5 particular
pieces of data (e.g. percentage of people under 18) to see
how that piece of data changes across the city.
- document some interesting findings
Your application should help someone to look for patterns across
the city as a whole and then go into more detail to see the
differences that emerge at the district and community area level.
For example, as the community areas are now almost 100 years old,
the demographics in the populations have shifted, so can you find
some examples where certain community areas are a better match to
a different district.
You should start by getting D3 installed, running through the
demos, and doing some initial tests to load in the data and start
displaying it.
Once you have a basic shell working you should then start to draw
some sketches of what the interface might look like and how you
want to arrange and display the data. You can use other software
to generate statistics about the data if you find that useful but
be sure to document that process. Be careful of missing data when
you generate statistics.
Your application should start out showing some data - a blank
screen is not very inviting. In past terms the students have shown
a desire to show an overwhelming amount of the data to the user
right away. You should be careful not to overwhelm the user. As
Schneiderman said "overview first, zoom and filter, details on
demand." Appropriate levels of aggregating data will be very
important here.
It is also important to note that 'getting it to work' is just a
prerequisite to using the application to find answers to your
questions. It is that usage that will give you ideas on how to
improve your app to make it easier and more intuitive to find
those things. Writing the application at the last minute pretty
much guarantees that you will not come up with an intuitive
interface.
Many of the routines you write for this project will be used again
and expanded upon in the upcoming projects - e.g. all of the
projects will need graphs, so it is a good idea to write your code
in a way that it is reusable so you can modify it rather than
totally rewriting it later.