Project 3 - Big Yellow Taxi

Group membership due 3/19 at 8:59 pm Chicago time
Application due 4/23 at 8:59 pm Chicago time
Documentation due 4/25 at 8:59 pm Chicago time



Project 3 can be either an individual project or a 2-person group project to give people more practice with wrangling data and then visualizing it in a web-based application that visualizes data in multiple ways using R and Shiny and leaflet and Shiny Dashboard. Please let andy know your group members even if it is the same group as Project 2 by the Group membership due date above.

This project will focus on using R to visualize data on Taxi rides in Chicago, and use shiny to give people an interactive interface to create those visualizations.

The original data is available from the Chicago Data Portal. The 2019 data is available at:
https://data.cityofchicago.org/Transportation/Taxi-Trips-2019/h4cq-z3dy
It is about 7 GB and contains 16.5 million rows. I will make a USB stick version of the data available in class, and there will be a tsv and csv version on the evl shiny server.

Since the 2019 data is pre-COVID it is more representative of a 'typical' year.

As always I would start by taking a look at the raw data file in a text editor to get an idea what it contains, quickly try to reduce the file size and then try out different types of manipulations and visualizations in either R-Studio or Jupyter to work out what you are going to need to do. Then think about how you are going to lay out the various visualizations and how you are going to create the user interface in shiny.

Again we will be running your dashboard full screen on the touch screen classroom wall with the same resolution as in Project 2.

For this project we are going to switch over to running the code on evl's shiny server at shiny.evl.uic.edu to avoid the hard limits on file sizes, but you are still going to need to dramatically reduce the data file size to under 500 MB in order to have your application start up quickly. Note that we are using the free version of the shiny server so there are limits to the parameters that we can tweak, but this gives you some idea of how you could set up your own server to serve these kinds of dashboards.

You will only need a subset of the 23 columns in the data file
3.Trip Start Timestamp (string -> date and time)
5. Trip Seconds (int)
6. Trip Miles (float)
9. Pickup Community Area (int)
10. Drop-off community Area (int)
17. Company (string)

You should also remove all trips less than 0.5 miles, and more than 100 miles, and less than 60 seconds, and greater than 5 hours, and all trips that either start or end outside of a Chicago community area. We also will only be using looking at trips down to a resolution of the starting hour rather than the 15 minute intervals in the data. The command line (sed, grep, etc) can be your friend doing these manipulations or you can write a program to do it, or use R itself if you have enough memory, but you must document these manipulations so they are reproducible. That should get you down to about 12 million rides and around 300 MB.


Some potential gotchas:


This link should show my evl weather app running on the evl shiny server as an example - http://shiny.evl.uic.edu:3838/aej/evlweather

Each individual or group will get an account on
shiny.evl.uic.edu named g0, g1, g2, ...

you should be able to ssh into shiny.evl.uic.edu as user gX where X is your group number. We will talk about the passwords in class. Please change the password ASAP. We will also talk about the port number to use to ssh in.

You will also find that you have the directory /srv/shiny-server/gX where you can create a subdirectory with your app_name and then you can then place your files in the app_name directory. Be sure your R code is named app.R

Your /srv/shiny-server/gX directory will be RWX only to you and X to everyone else. Your /srv/shiny-server/gX/app_name directory will be readable and executable to all so that shiny can read data files from it while its running, and all of your files in the /srv/shiny-server/gX/app_name directory should be readable to all.

and then you should be able to see the app running as http://shiny.evl.uic.edu:3838/gX/app_name

error logs for everyone can be found in /var/log/shiny-server/
as all the error logs are in one place it can be really helpful to name your project something more unique than Project3 to make your log files easier to find.




For 40% you need to:
  • bar chart showing the distribution of the number of rides by day of year (Jan 1 through Dec 31)
  • bar chart showing the distribution of the number of rides by hour of day based on start time (midnight through 11pm)
  • bar chart showing the distribution of the number of rides by day of week (Monday through Sunday)
  • bar chart showing the distribution of the number of rides by month of year (Jan through Dec)
  • bar chart showing the distribution of the number of rides by binned mileage (with an appropriate number of bins)
  • bar chart showing the distribution of the number of rides by binned trip time (with an appropriate number of bins)


For an additional 30% you need to


For an additional 30% you need to

Graduate Students need:


In all of these case you need to make sure that your visualizations are well constructed with good color and font choices, proper labeling, fast updates, and that they effectively reveal the truth about the data to the user.  The application should load within 30 seconds, and definitively not time out.

Note that as part of the web page part of the grade you will need to use your interface to show your findings, so make sure that the way your interface displays information is clear.



Turning in the Project

There are two due dates for the project.

The source code and application are due first.

For this project you should host your solution on the evl shiny site.

Your code should be turned in and made available on GitHub in a public repository for the project. You can keep the repository private while doing your development. I would suggest setting up the GitHub project early and regularly pushing code to it as a backup. Also keep in mind the limits on file sizes in GitHub.

It is important to note that 'getting it to work' is just a prerequisite to using the application to find answers to your questions. It is that usage that will give you ideas on how to improve your app to make it easier and more intuitive to find those things. Writing the application at the last minute pretty much guarantees that you will not come up with an intuitive interface.

Chrome's Developer Tools allow you to emulate screens of different sizes (view / developer / developer tools / settings / devices).

Before the application due date&time please send an email to andy with the URL of your GitHub site and your Shinyapps or evl shiny server site.


The second deadline is for the documentation.

You should create a public web page with multiple sections (visible to anyone for at least the duration of the course) that describes your work on the project. You can host your web page at UIC (http://people.uic.edu), GitHub, or the provider of your choice, as long as it remains publicly available to all. You can use any publicly available templates as long as you cite them, or create your own.

This page should have several sections including:

all of which should have plenty of screenshots with meaningful captions. Web pages like this can be very helpful later on in helping you build up a portfolio of your work when you start looking for a job so please put some effort into it.


You should also create a 5 minute YouTube video showing the use of your application including narration with decent audio quality. That video should be in a very obvious place on your web page. The easiest way to create the video is to use a screen-capture tool while interacting with your application, though you will most likely find its useful to do some editing afterwards to tighten the video up. If you do decide to use your phone or tablet to make the video, then please shoot the video in landscape rather than portrait orientation. Your video should show the capabilities of your tool through a set of specific examples of interesting things you found in the data.


I will be linking your web page to the course notes so please send andy a nice jpg image of your visualization for the web along with the link to your website before the deadline. The image should be named p3.<your_last_name>.<your_first_name>.jpg and be roughly 1920 pixels wide.

Once you have your web page done, send the URL to andy before the deadline. We will respond to this email as your 'receipt'.



An important part of creating these kinds of applications is getting feedback and using it to improve your design, and learning to give quality feedback to others.

See the course notes for week 15 for more details on the presentations.


#
Student Name(s)
GitHub Link
Shiny Link
Web Page Link
Video Link
Representative Image
1
Kodithyala & Mullenkuzhiyil Sunny link
link
link
link

2
Martin & Jakvani
link
link
link
link

3
Genova & Kao
link
link
link
link

4
Saxena & Awan
link
link
link
link

5
Omar & Magnadia link
link
link
link

6
Elliot & Arica link
link
link
link

7
Zeng & Qi link
link
link
link

8
Herrera & Fernandez Lezama link
link
link
link

9
Chintakunta & Jogi
link
link
link
link

10
Ranganathan
link
link
link
link

11
Parovyi & Yelyubayeva link
link
link


13
Handowo
link
link
link
link

14
Hussain
-
-
-
-
-
15
Kmita
link
link1
link2
link
link

16
Sivaraman & Kushwah
link
link
link
link

17
Lau
link
link
link
link

18
Mehta
link
link
link
link

19
Nath
link
link
link
link

20
Nunez
-
-
-
-
-
22
Patel
link
link
link
link




last revision 4/25/2022