Project 2 will be the first group
project and will focus more on processing and visualizing textual
data.
You should also very quickly set up a web page for your group
project and send the URL to andy. Each Friday of the project each
team member should post on the project web site an overview of
what he/she did on the project that week.
In this project we are going to make
use of textual data - in particular we are going to focus on
transcripts of the six seasons of Futurama episodes available at
http://theinfosphere.org/Episode_Transcript_Listing
In a larger sense these scripts give a textual representation of
reality which is easier to parse and generate statistics from
compared to looking at the video and listening to the soundtrack,
but may not give a completely accurate version of reality.
We will use the script data to create an interactive visualization
to explore how often various characters appear, which characters
appear most often together, and what their most common sayings
are. The script data are in the form of html files which you can
parse to get the dialogue for each character, eg.
Leela: Oh, come on, he's just a poor kid from the Stupid Ages.
(note the first few transcripts have time codes but the rest do
not)
For
a C you need ...
- see an overview of the entire series showing the relative
importance of the characters (based on their amount of
dialogue)
- pick a character from a list of characters, where the
common characters are easier to pick, and see which episodes
that character appears in over the entire series and how
many episodes that character was in on a timeline of the
episodes
- pick an episode from a list of all episodes and see which
characters are in that episode
- pick a season from a list of all seasons and see how many
episodes each character was in (graphically and textually)
- be able to switch between textual names and icons for each
of the characters
- the application should update quickly when the user
interacts
For
a B you need to add ...
- pick an episode from a list of all episodes and see how
the amount of dialogue compares for all the characters
(graphically and textually)
- pick a season from a list of all seasons and see how the
amount of dialogue compares for all the characters
(graphically and textually)
- pick a character and see the amount of dialogue for that
character over the entire series (graphically and textually)
- pick up to three characters from a list of characters and
see their relative importance per episode across all the
episodes and across each season
For
an A you need to add ...
- pick a character from a list of characters, and see the
common sentences for that character and the common words
(minus the general common English words) for that character
- be able to pick a common character phrase and see in which
episodes it appears
- show which characters commonly appear together
- use your interface to investigate important questions such
as how often does Prof Farnsworth say "Good news everyone!",
what episode was the first time he said it, and does he say
it more often than Bender says "Bite my shiny metal ass."?
and highlight them on your webpage through screen snapshots
Wikipedia has a good Futurama page that will provide you with
some general information about the show with reference material
and a nice character page with images. Be sure to cite any data
(including the transcripts) or images that you use in your work.
Note that the seasons have different lengths so you should take
that into account when doing your statistics. A character should
appear in the database if he/she/it has dialogue in more than
one episode.
There may be some formatting errors in the text files (a common
real world data issue) so be sure to check your statistics to
make sure they make sense.
You will very likely want to divide up the work in your group
between parsing the text, generating appropriate statistics, and
then visualizing and interacting with those results. It will
help to agree early on what the intermediate data formats will
be so work on the different parts can proceed in parallel.
As with Project 1 you should
create a set of web pages describing your work on the project and
containing a working applet that has a max size of 1024 x 768. The
web pages should also describe the contribution of each team
member (ie who worked on which interface elements, who worked on
converting the data into a more usable form, etc.) Your team
should create a 2-3 minute youtube video showing the use of your
application including narration with decent audio quality. That
video should be in a very obvious place on your main project web
page.
Please send me a 1024 x 768 jpg image of your visualization for
the web. This should be named
p2.<someone_in_your_groups_last_name>.jpg.
When the project is done, each person in the group should also
send me a private email ranking your coworkers on the project on a
scale from 1 (low) to 5 (high) in terms of how good a coworker
they were on the project. If you never want to work with them
again, give them a 1. If this person would be a first choice for a
partner on a future project then give them a 5. If they did what
was expected but nothing particularly good or bad then give them a
3. By default your score should be 3 unless you have a particular
reason to increase or decrease the number. Please confine your
responses to 1, 2, 3, 4, 5 and no 1/3ds or .5s please. I will
average out all these scores for projects 2 through 4 and keep
them in mind when assigning final grades to projects 2 through 4.
Each group will present their work to the class and describe its
features to the rest of the class. All team members are expected
to participate equally in that presentation.
Since there are 9 groups each talk will be 4
minutes long. During each talk each group in the audience should
write 1 question for the speaking group, and hand it to them at
the end of their presentation. The speaking group should add a
page to their website by Thursday 10/13 giving the questions
(and the group who asked it) and an answer to the question.