Week 2

Introduction to R and Shiny



For the projects in this class we are going to use R to do the heavy lifting on the data side and Shiny to handle the front end so we can create interactive data visualization applications for the web.




Here are links to some webpages that were created with R and Shiny to show various ways they can be used to show data.
http://shiny.rstudio.com/gallery/


You should start by downloading R (20178-12-20 Eggshell Igloo) from a site such as https://cran.cnr.berkeley.edu/
and installing it

and then the free version of R-Studio (1.1.463) from https://www.rstudio.com/products/rstudio/download/
and installing it

There is a very good set of video tutorials with source code here that take about 2-3 hours to complete:
https://shiny.rstudio.com/tutorial/ that will give you a very good grounding in the basics.

There is also Swirl for learning R - http://swirlstats.com/

Some useful links

Start R studio and you should see something similar to a modern IDE with an area for viewing and editing code, a terminal, an area for seeing the current state of the R environment, and a multi-purpose area.

A lot of the power of R comes from the variety of packages that can be installed into it

To see what packages are currently installed, in the terminal at the > prompt, type installed.packages()[,1:2]

To see which packages are out of date type old.packages()

To update all old packages type update.packages(ask = FALSE)

Some useful libraries to install with install.packages() include:

e.g. install.packages("shiny")

I will be going through a demonstration of using these tools to visualize some local temperature data in class. For over 12 years we have been collecting temperature data in the various rooms of the lab, trying to understand why the temperature changes so dramatically at times.

The relevant files are located here

What this code produces is interactive, hosted on the shiny site, here

and a snapshot below:



Lets play a bit with the evl temp data using R and R studio

one nice way to do this is to load the following text into R studio as a new text file (sitting in the upper left in the standard layout) and then copy and paste the relevant lines into the console (lower left) one by one to see their affect on the current environment. I have made a copy of this file available here



Lets play a bit with the evl temp data using R and R studio

download a copy of the evlWeatherForR.zip file from the 'relevant files' link above and unzip it.
set the working directory to the evlWeatherForR directory
setwd(dir)

test if thats correct with
getwd()

to get a listing of the directory use
dir()

read in one file
evl2006 <- read.table(file = "history_2006.tsv", sep = "\t", header = TRUE)  ... be careful to avoid smart quotes. You should now see evl2006 in the Global Environment at the upper right

take a look at it
evl2006

and then some commands to get an overal picture of the data:
str(evl2006)
summary(evl2006)
head(evl2006)
tail(evl2006)
dim(evl2006)

hmmm almost all the fields are integers including the hour and the temperature for the 7 different rooms, but Date is a factor - what is a factor? use the Help viewer in R-Studio. R assumes that this is categorical data and assigns an integer value to each unique string.

convert the dates to internal format and remove the original dates
newDates <- as.Date(evl2006$Date, "%m/%d/%Y")
evl2006$newDate<-newDates
evl2006$Date <- NULL

if we use str(evl2006) again, now we have newDate in date format

try doing some simple graphs and stats

plot all temps in room 4 from 2006 using the built in plotting
plot(evl2006$newDate, evl2006$S4, xlab = "Month", ylab = "Temperature")

we can set the y axis to a fixed range for all temps in a given room
plot(evl2006$newDate, evl2006$S4, xlab = "Month", ylab = "Temperature", ylim=c(65, 90))

the built in plotting it nice to get quick views but it isnt very powerful or very nice looking so now lets get the noon temp and plot it for one / all the rooms using ggplot

if ggplot2 is not already installed then lets install it
install.packages("ggplot2")

and then lets load it in
library(ggplot2)


we want values from evl2006 where $hour is 12
noons <- subset(evl2006, Hour == 12)

we can list all the noons for a particulate room
noons$S2

note that you can see similar information in the environment panel at the upper right

or plot them
ggplot(noons, aes(x=newDate, y=S2)) + geom_point(color="blue") +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + geom_line()

we can set the min and max and add some aesthetics

ggplot(noons, aes(x=newDate, y=S2)) + geom_point(color="blue") +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + geom_line() + coord_cartesian(ylim = c(65,90))

add a smooth line through the data

ggplot(noons, aes(x=newDate, y=S2)) + geom_point(color="blue") +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + geom_line() + coord_cartesian(ylim = c(65,90)) + geom_smooth()

no points just lines and the smooth curve

ggplot(noons, aes(x=newDate, y=S2)) +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + geom_line() + coord_cartesian(ylim = c(65,90)) + geom_smooth()

just the smooth curve

ggplot(noons, aes(x=newDate, y=S2)) +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + coord_cartesian(ylim = c(65,90)) + geom_smooth()

we can show smooth curves for all of the rooms at noon at the same time

ggplot(noons, aes(x=newDate)) +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + coord_cartesian(ylim = c(65,90)) + geom_smooth(aes(y=S2)) + geom_smooth(aes(y=S1)) + geom_smooth(aes(y=S3)) + geom_smooth(aes(y=S4)) + geom_smooth(aes(y=S5)) + geom_smooth(aes(y=S6))+ geom_smooth(aes(y=S7))

show smooth curves for all of the rooms at all hours at the same time

ggplot(evl2006, aes(x=newDate)) +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F") + coord_cartesian(ylim = c(65,85)) + geom_smooth(aes(y=S2)) + geom_smooth(aes(y=S1)) + geom_smooth(aes(y=S3)) + geom_smooth(aes(y=S4)) + geom_smooth(aes(y=S5)) + geom_smooth(aes(y=S6))+ geom_smooth(aes(y=S7))


we can play with the style of the points and make them blue
ggplot(evl2006, aes(x=newDate, y=S4)) + geom_point(color="blue") +  labs(title="Room Temperature in room ???", x="Day", y = "Degrees F")

we can create a bar chart for all the temps for given room
ggplot(evl2006, aes(x=factor(S4)))  + geom_bar(stat="count", width=0.7, fill="steelblue")

or just the noon temps (note that it only lists temps that existed so some temps may be ‘missing’ open the x axis
ggplot(noons, aes(x=factor(S6)))  + geom_bar(stat="count", fill="steelblue")

we can do a better bar chart that treats the temperatures as numbers so there wont be any missing, and we can get control over the range of the x axis.
temperatures <- as.data.frame(table(noons[,5]))
temperatures$Var1 <- as.numeric(as.character(temperatures$Var1))

ggplot(temperatures, aes(x=Var1, y=Freq)) + geom_bar(stat="identity", fill="steelblue") +
 labs(x="Temperature (F)", y = "Count") + xlim(60,90)

we can get a summary of the temperature data
summary(temperatures)

and then could create a box and whisker plot of those values
ggplot(temperatures, aes(x = "", y = temperatures[,1])) + geom_boxplot() + labs(y="Temperature (F)", x="") + ylim(55,90)


so we have a lot of options here.

Shiny allows us to give a user access to do these things interactively on the web.


Some things to be careful of:

- be careful of smart quotes
- be careful of commas, especially in the shiny code
- remember to set your working directory in R Studio
- try clearing out your R studio session regularly and running your code to make sure your code is self-contained using rm(list=ls())
- be careful of groupings to get your lines to connect the right way in charts
- be careful what format your data is in - certain operations can only be performed on certain data types




Here is another data set to play with on Thursday in class as part of a data scavenger hunt

You should form a group of 3 or 4 people and try to find interesting trends and changes in those trends. One of the main ideas here is to get a feel for how people use visualization interactively to look for patterns and events and outliers in the data. In this case we will start with some familiar concepts of utility usage - electricity, water, and natural gas.

Create a web page linked off of one of the student pages with the names of all the members of your group, and document your findings with screen snapshots from the application and text describing what you think you found. By the end of class email the location of your group's page to andy and sai.

Here is some background information:

Here is how to start playing with the data:

utility <- read.table(file = "utilitydata2018.tsv", sep = "\t", header = TRUE)

sometimes there is missing data - lets check
complete.cases(utility) — all of the rows should be TRUE, so we are good. If not, then:
utility[complete.cases(utility), ]
utility <- utility[complete.cases(utility), ]

lets convert the two year and month columns into a date
library(lubridate)

what will it look like if we concatenate the year and month columns:
paste(utility$Year, utility$Month, "01", sep="-")

lets create a new field using that concatenation to create a date:
utility$newDate <- ymd(paste(utility$Year, utility$Month, "01", sep="-"))

library(ggplot2)
ggplot(utility, aes(x=newDate, y=Temp_F)) + geom_point(color="blue") + geom_line()
or
ggplot(utility, aes(x=newDate, y=Gas_Th_per_Day)) + geom_point(color="blue") + geom_line()

and set the y axis lower limit to 0
ggplot(utility, aes(x=newDate, y=E_kWh_per_Day)) + geom_point(color="blue") + geom_line() + coord_cartesian(ylim = c(0,80))

we could just look at june
junes <- subset(utility, Month == 6)
ggplot(junes, aes(x=newDate, y=E_kWh_per_Day)) + geom_point(color="blue") + geom_line() + coord_cartesian(ylim = c(0,80))

maybe I want to compare june electricity usage to the temperature in june to see if there is a direct correlation
ggplot(junes, aes(x=newDate, y=E_kWh_per_Day)) + geom_point(color="blue") + geom_line() + coord_cartesian(ylim = c(0,80)) + geom_line(aes(y=junes$Temp_F))




All of this from data on a 1-month time scale. As we move to this data being more available on smaller timescales it starts to become easier and easier to track people's behaviour, even down to knowing what room a person is likely in based on real time utility usage. We will talk more about privacy issues later in the course.



For the rest of this week I'd like people to get R and R studio installed, and if you have not used R before then start with Swirl http://swirlstats.com/

If you are familiar with R then take a look at the shiny video tutorials at http://shiny.rstudio.com/tutorial/

Fell free to work with partners and in groups in class to get familiar with the tools before moving onto work individually on project 1.



By the end of this week you should get the evl weather example above running in your local copy of R Studio and then create a shinyapps.io account and move the file up there. Add an obvious descriptive link to this visualization to the public webpage you created last week - hint breaking up your webpage into tabs or sections based on the week in class might be a good idea.


Coming Next Time

The Basics

last revision 11/23/19