Project 2

In the Year 2525

Project due at 9 pm on 10/10/11 Chicago time

Project 2 will be the first group project and will focus more on processing and visualizing textual data.

You should also very quickly set up a web page for your group project and send the URL to andy. Each Friday of the project each team member should post on the project web site an overview of what he/she did on the project that week.

In this project we are going to make use of textual data - in particular we are going to focus on transcripts of the six seasons of Futurama episodes available at

In a larger sense these scripts give a textual representation of reality which is easier to parse and generate statistics from compared to looking at the video and listening to the soundtrack, but may not give a completely accurate version of reality.

We will use the script data to create an interactive visualization to explore how often various characters appear, which characters appear most often together, and what their most common sayings are. The script data are in the form of html files which you can parse to get the dialogue for each character, eg. 
Leela: Oh, come on, he's just a poor kid from the Stupid Ages.
(note the first few transcripts have time codes but the rest do not)

For a C you need ...

For a B you need to add ...

For an A you need to add ...

Wikipedia has a good Futurama page that will provide you with some general information about the show with reference material and a nice character page with images. Be sure to cite any data (including the transcripts) or images that you use in your work.

Note that the seasons have different lengths so you should take that into account when doing your statistics. A character should appear in the database if he/she/it has dialogue in more than one episode.

There may be some formatting errors in the text files (a common real world data issue) so be sure to check your statistics to make sure they make sense.

You will very likely want to divide up the work in your group between parsing the text, generating appropriate statistics, and then visualizing and interacting with those results. It will help to agree early on what the intermediate data formats will be so work on the different parts can proceed in parallel.

As with Project 1 you should create a set of web pages describing your work on the project and containing a working applet that has a max size of 1024 x 768. The web pages should also describe the contribution of each team member (ie who worked on which interface elements, who worked on converting the data into a more usable form, etc.) Your team should create a 2-3 minute youtube video showing the use of your application including narration with decent audio quality. That video should be in a very obvious place on your main project web page.

Please send me a 1024 x 768 jpg image of your visualization for the web. This should be named p2.<someone_in_your_groups_last_name>.jpg.

When the project is done, each person in the group should also send me a private email ranking your coworkers on the project on a scale from 1 (low) to 5 (high) in terms of how good a coworker they were on the project. If you never want to work with them again, give them a 1. If this person would be a first choice for a partner on a future project then give them a 5. If they did what was expected but nothing particularly good or bad then give them a 3. By default your score should be 3 unless you have a particular reason to increase or decrease the number. Please confine your responses to 1, 2, 3, 4, 5 and no 1/3ds or .5s please. I will average out all these scores for projects 2 through 4 and keep them in mind when assigning final grades to projects 2 through 4.

Each group will present their work to the class and describe its features to the rest of the class. All team members are expected to participate equally in that presentation. Since there are 9 groups each talk will be 4 minutes long. During each talk each group in the audience should write 1 question for the speaking group, and hand it to them at the end of their presentation. The speaking group should add a page to their website by Thursday 10/13 giving the questions (and the group who asked it) and an answer to the question.

last revision 10/5/11 - added text (in yellow) about the presentations