CS 424 Project 3

In this project you will be looking over the time period from 1910 to (almost) 2010 tracking the careers of a number of actors and directors. Your main visualization will be somewhat similar to the NY Times Billboard rankings from week 5 of the course with time (in years) on the x-axis and quality (as measured by the imdb) on the y-axis. By picking a given actor or director, the user should be able to see the quality of his/her work over the years as a year-by-year bar chart. By choosing another actor or director the user should be able to compare the careers of different people.

Most of the work on this project will be in converting the data into an appropriate shape to visualize. The first part of that is cutting down the data size. To that end we will limit the dataset by avoiding: tv shows, tv movies, video games, adult films, direct-to-video releases, documentaries, and short subjects. That should knock the initial 300,000 film entries down a fair amount.

You should then find the top 5-10 male actors, female actors, and directors for each decade. You must to do this in your code, though you can certainly use online resources to help you figure out who those people should be to make sure your code is working correctly. The top people should have done at least m films with more than n votes (where m and n may depend on the decade - document your decisions) For the purpose of this project a 'top' person is defined by working on a highly rated film. One way to start is to pick the top rated films from each decade and then look for commonalities in the cast and directors, count them up, and eventually pick the top 5-10 actors, actresses, directors for each decade. There may be some overlap from decade to decade which is fine. You should document the decisions that you make to create the algorithm that determines a top person.

Your code could have one phase where it works out who the best are and then another phase that does the interactive visualization - the code does not need to automatically compute the top people every time it runs - it could do the computation under user control, or only if that data hasn't been already computed. You could have two separate applications - once to parse through the current data and generate new data files and another to take those new data files and run the visualization. The imdb data files are updated regularly so your code should continue to run on new datasets unless there is a significant change to their format.

For a C you need ...

choose the name of an actor or director from a list on the screen and see their career on the graph. If the person did more than one film per year then the ratings of those films should be averaged.
if a person was both an actor and a director then it should be obvious which films were which
ability to see the career of a person with the chart ranging from 0-10 showing the average of all films, which you should compute from the films in the limited dataset (no tv shows, etc), i.e. the x axis sits at 0
ability to see the chart as ratings above / below the average of all films, i.e. the x axis sits at the average
you should check your visualization with a colour blindness checking web page to see that its ok
the interface should update quickly when the user interacts
ability to mouse over a given column in the chart and see the name of that specific film (or films) for that column

For a B you need to add ...

ability to compare n actors and / or directors simultaneously, each on their own graph which are aligned in time
in addition to showing the average of multiple films per year, it should also show the minimum and maximum
ability to code the columns by genre using a subset of the imdb genre codes. If a film has multiple genres then it should have multiple codes. Very likely you will need to reduce the number of genres either by dropping or combining. Give your rationale for your choices on your web page.
ability to code each decade for a person by the genres they worked in during that decade
ability to show the min / max / average ratings for each decade
ability to see a chart showing the overall career in terms of distribution of ratings and genres
if there are no people selected then clicking on a decade should show which people were active that decade, if there are no people selected then clicking on a genre should show which people were active in that genre, clicking on a decade and a genre would show people in the intersection.

For an A you need to add ...

ability to pan and compress/expand the years
add the bottom 20 male actors, female actors, and directors over the last 100 years who have done at least 10 films with at least m votes. This is harder that finding the top people since you shouldn't pick on complete unknowns, so you may end up with just 'mediocre' actors or actors who are desperate or who make bad choices on a regular basis.
you shoud pick several people that you would like to add that show interesting behavior (i.e. good actors who become good directors, good actors that go on to work in bad movies and then are rediscovered and go on to do good movies again, people with very long careers, etc) and document them on your webpage. Since the imdb has inherant biases towards US films, films in English, and modern films, feel free to make choices that show a broader range.

You should create a web page that describes your work on the project. I will be linking this web page to the course notes so please send me a 1024 x 768 jpg image of your visualization for the web. This should be named p3.<someone_in_your_groups_last_name>.jpg. Again, please make sure that your application is Mac / Windows / Linux compatible. Again, the web page should describe the contribution of each team member.

Each group will bring their visualization to class to present it and describe its features to the rest of the class. This allows everyone to see a variety of solutions to the problem, and a variety of implementations.

NEW: I would also like each group to email me a list of you top 5 male actors, female actors, and directors for each decade so I can also show a comparison of the names that the different algorithms generated. I would like each of the three lists in this form so its easy to put them together:

1910	#1	last name
	#2	last name
	#3	last name
	#4	last name
	#5	last name
1920	#1	last name
	#2	last name
	#3	last name
	#4	last name
	$5	last name
etc

I would also like to have these lists the night before the project is due so I have time to compile the lists before the presentations in class.

2009 Project 3

Saturday Night at the Movies