Project X

Down by the Seaside

Project  due Friday 5/1/19 at 4:59 pm Chicago time - note 4:59 not 8:59.

This optional extra single person project is worth up to an extra 5%.

The project will make use of R in Jupyter notebooks as we have been using for the in-class work.

The project is to look at beach water quality data in Chicago from 5/2015 to 9/2019 during the summer seasons at roughly 20 beach locations. There is detail on the data here:

and a local copy available at

The most useful columns are:

Your Jupyter notebook report should allow a person interested in going to the beach, or someone responsible for beach safety, to better understand what is going on during the summer at Chicago's beaches. Your report should show:

- introduction to the problem and the dataset being used

- the locations of the monitored beaches

- overall data for all the beaches

- data about each beach (90%, average, 10% data for each beach, overall percentage of good and bad days at each beach) to help answer whether there particular beaches that are safer than others

- data from a weekly point of view as the summers go on to help answer whether there particular times of the year that are better or worse

- data from year to year to see if things getting better or worse over the last several years and at which beaches

- you should also relate the data to temperature and rainfall data in the Chicago area to see if there is a correlation. The DarkSky R library is one good general source for this data (at least until the end of 2021), and more locally the Chicago Data Portal has data from their Beach Weather Stations that cover this time period.

- final set of conclusions about which beaches you recommend people visit and which you recommend people avoid, and at which times of the year.

Your report should contain a mixture of explanatory text, R code, and visualizations, and be written to help the reader get a better understanding of the situation by explaining what they can see in the various visualizations.

To turn in your solution, download your solution as a notebook, and an html page and email them to andy.

Some additional details from the Chicago Data Portal:

The rapid testing method (qPCR analysis) is a new method that measures levels of pathogenic DNA in beach water. Unlike the culture based test that requires up to 24 hours of processing, the new rapid testing method requires a few hours for results. The Chicago Park District can use results of the rapid test to notify the public when levels exceed UPEPA recommended levels. US Environmental Protection Agency (USEPA) recommends notifying the public when DNA bacteria levels are above the federal water quality Beach Action Value (BAV), which is 1000*CCE. When DNA bacteria levels exceed 1000 CCE, a yellow or red flag will be implemented. For more information please refer to the USEPA Recreational Water Quality Criteria

last revision 4/5/2020