install.packages("CVXR")
STA/OPR 9750 Week 7 In-Class Activity: More Thoughts on Plots
Update Slides: Slides 07
This week, we’re going to break into project groups and do three ggplot2
exercises of increasing difficulty. As you work through these with your teammates, be sure to reflect on what plots and what tools you will need to best present your mini-project and course project findings.
Exercise 1: Basic ggplot2
(15 minutes)
In this exercise, you will create ggplot2
graphics to analyze the diamonds
data from the ggplot2
package. This data contains pricing and measurements for 50,000 diamonds sold in the US. (Note that these prices are rather out of date.) Before beginning this exercise, you might want to read about the “4 C’s of Diamonds” commonly used to measure quality.
Make a scatter plot of price vs carat and facet it by cut.
Use
geom_smooth
to see how the price-carat relationship changes by color.Create a frequency polygon plot of price, broken out by different diamond cuts.
Create a scatter plot of color by clarity. Why is this plot not useful?
- Stretch Goal: Make a better plot to visualize this relationship using the
ggmosaic
package.
- Stretch Goal: Make a better plot to visualize this relationship using the
Exercise 2: Trend Analysis with ggplot2
(30 minutes)
The Carbon Dioxide Information and Analysis Center studies the effect of carbon dioxide on global and local temperature trends. A key tool in their analysis is the temperature “anomaly”. An anomaly is the difference between observed temperature (in a world with anthropogenic atmospheric CO2) and ‘natural’ temperature (from a world without anthropogenic gases). Note that these anomalies require significant analysis to compute and are not “simple observational” data.
Politicians have adopted the tools of temperature anomaly to set national and international emissions targets, e.g., the 2 Degree Target. Note that 2 degrees is calculated as a global average: in practice, some regions will experience a much larger change in temperature, while others may experience a smaller change or even a negative change.
The CVXR
package includes the cdiac
dataset, capturing CDIAC’s estimated global temperature anomalies from 1850 to 2015. In this question, you will explore these estimated anomalies. Note that you may need to install the CVXR
package before beginning this question.1
library(CVXR)
library(tidyverse)
data(cdiac)
glimpse(cdiac)
Rows: 166
Columns: 14
$ year <int> 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 186…
$ jan <dbl> -0.702, -0.303, -0.308, -0.177, -0.360, -0.176, -0.119, -0.512,…
$ feb <dbl> -0.284, -0.362, -0.477, -0.330, -0.280, -0.400, -0.373, -0.344,…
$ mar <dbl> -0.732, -0.485, -0.505, -0.318, -0.284, -0.303, -0.513, -0.434,…
$ apr <dbl> -0.570, -0.445, -0.559, -0.352, -0.349, -0.217, -0.371, -0.646,…
$ may <dbl> -0.325, -0.302, -0.209, -0.268, -0.230, -0.336, -0.119, -0.567,…
$ jun <dbl> -0.213, -0.189, -0.038, -0.179, -0.215, -0.160, -0.288, -0.310,…
$ jul <dbl> -0.128, -0.215, -0.016, -0.059, -0.228, -0.268, -0.297, -0.544,…
$ aug <dbl> -0.233, -0.153, -0.195, -0.148, -0.163, -0.159, -0.305, -0.327,…
$ sep <dbl> -0.444, -0.108, -0.125, -0.409, -0.115, -0.339, -0.459, -0.393,…
$ oct <dbl> -0.452, -0.063, -0.216, -0.359, -0.188, -0.211, -0.384, -0.467,…
$ nov <dbl> -0.190, -0.030, -0.187, -0.256, -0.369, -0.212, -0.608, -0.665,…
$ dec <dbl> -0.268, -0.067, 0.083, -0.444, -0.232, -0.510, -0.440, -0.356, …
$ annual <dbl> -0.375, -0.223, -0.224, -0.271, -0.246, -0.271, -0.352, -0.460,…
- Plot the estimated annual global mean temperature (GMT) anomaly from 1850 to 2015.
- Use
scale_x_date
to improve the \(x\)-axis
- Plot the GMT anomaly for each month on the same plot (as different lines).
- Before starting this, you may need to use the
pivot_
functionality to get this data in the right shape. Recall thatggplot2
expects “data point” per row.
- Plot the monthly GMT anomaly series as one long line (with a point for each month).
- Now focus only on July: plot the July GMT anomaly series. Use the
runmed()
function to add a second series to the plot giving the median July GMT anomaly of the previous 5 years. Is there evidence of an increasing warming trend? - For each year, identify the warmest month (as measured by GMT anomaly); create a histogram showing the probability a given month was the hottest (largest anomaly) in its year.
- Make sure your \(x\)-axis is in reasonable (chronological) order - not alphabetical.
- You will need to use
dplyr
tools to find the warmest month in a given year.
Exercise 3: Animated Graphics (1 hour)
In this question, you will use the gganimate
extension to ggplot2
to create animated graphics. We will use the famous gapminder
data set from the gapminder
package. Install the gganimate
, gapminder
, gifski
, and av
packages before attempting attempting this problem.
- For background, watch Hans Rosling’s talk on human prosperity.
- Create a scatter plot of the relationship between GDP and Life Expectancy in the year 1952.
- Color points by continent and use the size aesthetic to represent population.
- You might want to put quantities on a log-scale.
- There is an outlier country in this data with very high GDP.
- What is it?
- Identify and remove it.
- Using the
transition_time
function, make this an animated plot showing how this data changes over time. - Using the theme machinery, labels, etc. make this a “publication ready” plot.
- Note that you can use
{frame_time}
in the title to get a dynamically changing year.
- Use the
country_colors
data from thegapminder
plot to color the points using Dr. Rosling’s perferred color scheme.
- This is a different color scale than
ggplot2
uses by default, so you will need to override thescale_color_*
functionality. - The help page for
?country_colors
will be helpful here.
Footnotes
CVXR
is actually an incredible piece of software and super-useful for developing and implementing statistical and machine learning techniques. We, sadly, will not explore it in this course.↩︎