STA/OPR 9750 Week 7 In-Class Activity: More Thoughts on Plots

Update Slides: Slides 07

This week, we’re going to break into project groups and do three ggplot2 exercises of increasing difficulty. As you work through these with your teammates, be sure to reflect on what plots and what tools you will need to best present your mini-project and course project findings.

Exercise 1: Basic ggplot2 (15 minutes)

In this exercise, you will create ggplot2 graphics to analyze the diamonds data from the ggplot2 package. This data contains pricing and measurements for 50,000 diamonds sold in the US. (Note that these prices are rather out of date.) Before beginning this exercise, you might want to read about the “4 C’s of Diamonds” commonly used to measure quality.

  1. Make a scatter plot of price vs carat and facet it by cut.

  2. Use geom_smooth to see how the price-carat relationship changes by color.

  3. Create a frequency polygon plot of price, broken out by different diamond cuts.

  4. Create a scatter plot of color by clarity. Why is this plot not useful?

    • Stretch Goal: Make a better plot to visualize this relationship using the ggmosaic package.

Exercise 2: Trend Analysis with ggplot2 (30 minutes)

The Carbon Dioxide Information and Analysis Center studies the effect of carbon dioxide on global and local temperature trends. A key tool in their analysis is the temperature “anomaly”. An anomaly is the difference between observed temperature (in a world with anthropogenic atmospheric CO2) and ‘natural’ temperature (from a world without anthropogenic gases). Note that these anomalies require significant analysis to compute and are not “simple observational” data.

Politicians have adopted the tools of temperature anomaly to set national and international emissions targets, e.g., the 2 Degree Target. Note that 2 degrees is calculated as a global average: in practice, some regions will experience a much larger change in temperature, while others may experience a smaller change or even a negative change.

The CVXR package includes the cdiac dataset, capturing CDIAC’s estimated global temperature anomalies from 1850 to 2015. In this question, you will explore these estimated anomalies. Note that you may need to install the CVXR package before beginning this question.1

install.packages("CVXR")
library(CVXR)
library(tidyverse)
data(cdiac)
glimpse(cdiac)
Rows: 166
Columns: 14
$ year   <int> 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 186…
$ jan    <dbl> -0.702, -0.303, -0.308, -0.177, -0.360, -0.176, -0.119, -0.512,…
$ feb    <dbl> -0.284, -0.362, -0.477, -0.330, -0.280, -0.400, -0.373, -0.344,…
$ mar    <dbl> -0.732, -0.485, -0.505, -0.318, -0.284, -0.303, -0.513, -0.434,…
$ apr    <dbl> -0.570, -0.445, -0.559, -0.352, -0.349, -0.217, -0.371, -0.646,…
$ may    <dbl> -0.325, -0.302, -0.209, -0.268, -0.230, -0.336, -0.119, -0.567,…
$ jun    <dbl> -0.213, -0.189, -0.038, -0.179, -0.215, -0.160, -0.288, -0.310,…
$ jul    <dbl> -0.128, -0.215, -0.016, -0.059, -0.228, -0.268, -0.297, -0.544,…
$ aug    <dbl> -0.233, -0.153, -0.195, -0.148, -0.163, -0.159, -0.305, -0.327,…
$ sep    <dbl> -0.444, -0.108, -0.125, -0.409, -0.115, -0.339, -0.459, -0.393,…
$ oct    <dbl> -0.452, -0.063, -0.216, -0.359, -0.188, -0.211, -0.384, -0.467,…
$ nov    <dbl> -0.190, -0.030, -0.187, -0.256, -0.369, -0.212, -0.608, -0.665,…
$ dec    <dbl> -0.268, -0.067, 0.083, -0.444, -0.232, -0.510, -0.440, -0.356, …
$ annual <dbl> -0.375, -0.223, -0.224, -0.271, -0.246, -0.271, -0.352, -0.460,…
  1. Plot the estimated annual global mean temperature (GMT) anomaly from 1850 to 2015.
  • Use scale_x_date to improve the \(x\)-axis
  1. Plot the GMT anomaly for each month on the same plot (as different lines).
  • Before starting this, you may need to use the pivot_ functionality to get this data in the right shape. Recall that ggplot2 expects “data point” per row.
  1. Plot the monthly GMT anomaly series as one long line (with a point for each month).
  2. Now focus only on July: plot the July GMT anomaly series. Use the runmed()
    function to add a second series to the plot giving the median July GMT anomaly of the previous 5 years. Is there evidence of an increasing warming trend?
  3. For each year, identify the warmest month (as measured by GMT anomaly); create a histogram showing the probability a given month was the hottest (largest anomaly) in its year.
  • Make sure your \(x\)-axis is in reasonable (chronological) order - not alphabetical.
  • You will need to use dplyr tools to find the warmest month in a given year.

Exercise 3: Animated Graphics (1 hour)

In this question, you will use the gganimate extension to ggplot2 to create animated graphics. We will use the famous gapminder data set from the gapminder package. Install the gganimate, gapminder, gifski, and av packages before attempting attempting this problem.

  1. For background, watch Hans Rosling’s talk on human prosperity.
  2. Create a scatter plot of the relationship between GDP and Life Expectancy in the year 1952.
  • Color points by continent and use the size aesthetic to represent population.
  • You might want to put quantities on a log-scale.
  1. There is an outlier country in this data with very high GDP.
  • What is it?
  • Identify and remove it.
  1. Using the transition_time function, make this an animated plot showing how this data changes over time.
  2. Using the theme machinery, labels, etc. make this a “publication ready” plot.
  • Note that you can use {frame_time} in the title to get a dynamically changing year.
  1. Use the country_colors data from the gapminder plot to color the points using Dr. Rosling’s perferred color scheme.
  • This is a different color scale than ggplot2 uses by default, so you will need to override the scale_color_* functionality.
  • The help page for ?country_colors will be helpful here.

Footnotes

  1. CVXR is actually an incredible piece of software and super-useful for developing and implementing statistical and machine learning techniques. We, sadly, will not explore it in this course.↩︎