Today: Lecture #08: Advanced ggplot2 – maps, interactivity, animation
purrrMP#03 - TBD
Due 2026-04-24 at 11:59pm ET
Topics covered:
MP#04 - TBD
Due 2026-05-15 at 11:59pm ET
TBD
Topics covered:
We will owe you:
Per course policy:
If you want to discuss your MPs or project in more detail, come to office hours!
Course Project should be your main focus for rest of course
Final submissions:
Nothing new per se - just more details about grading (4 elements => 10)
Additional notes added on integration of SQs and strategies for estimating causal effects. Ask questions! I’m happy to fill these out further.
Today, I’m looking for:
Mainly, I want to see that you will be able to succeed
Optional discussion of functional programming in R
JSON handling)TBD
quarto) ✅R Basics ✅R ✅R ✅R ⬅️
RUpcoming work from course calendar
purrrFunctional programming - purity
Can go deep into FP world - we’re just dipping a toe in
map and friends, reduce, list_*safely, partial, insistently, in_parallelpluck
Compare:
and
No indexing ([]) or explicit loop management
In R, FP is principally associated with lists
Recall: a list is a generic container in R (can hold anything, even other lists)
R are lists under the hood (including data.frames)Often, we want to do the same thing to several different items:
map and friends let us avoid loops
applies FUNCTION to each element of INPUT and collects the output in a new list
[[1]]
[1] "JANUARY"
[[2]]
[1] "FEBRUARY"
[[3]]
[1] "MARCH"
[[4]]
[1] "APRIL"
[[5]]
[1] "MAY"
[[6]]
[1] "JUNE"
[[7]]
[1] "JULY"
[[8]]
[1] "AUGUST"
[[9]]
[1] "SEPTEMBER"
[[10]]
[1] "OCTOBER"
[[11]]
[1] "NOVEMBER"
[[12]]
[1] "DECEMBER"
Sometimes, we know the type of values to be returned:
map_* lets us put those into a vector:
[1] "JANUARY" "FEBRUARY" "MARCH" "APRIL" "MAY" "JUNE"
[7] "JULY" "AUGUST" "SEPTEMBER" "OCTOBER" "NOVEMBER" "DECEMBER"
[1] 7 8 5 5 3 4 4 6 9 7 8 8
Functions in R can be defined as:
But this is clunky
Anonymous functions (“lambdas”) let us define ‘little functions’ more complactly
function (x)
x + 1
Supposedly \( looks like the Greek \(\lambda\)
Anonymous functions play well with map:
[1] 3 3 1 1 1 2 1 2 3 2 3 3
A common idiom is to return a data frame inside map
[[1]]
month upper n_vowels
1 January JANUARY 3
[[2]]
month upper n_vowels
1 February FEBRUARY 3
[[3]]
month upper n_vowels
1 March MARCH 1
[[4]]
month upper n_vowels
1 April APRIL 1
[[5]]
month upper n_vowels
1 May MAY 1
[[6]]
month upper n_vowels
1 June JUNE 2
[[7]]
month upper n_vowels
1 July JULY 1
[[8]]
month upper n_vowels
1 August AUGUST 2
[[9]]
month upper n_vowels
1 September SEPTEMBER 3
[[10]]
month upper n_vowels
1 October OCTOBER 2
[[11]]
month upper n_vowels
1 November NOVEMBER 3
[[12]]
month upper n_vowels
1 December DECEMBER 3
Combine this list of little DFs rowwise with list_rbind()
month upper n_vowels
1 January JANUARY 3
2 February FEBRUARY 3
3 March MARCH 1
4 April APRIL 1
5 May MAY 1
6 June JUNE 2
7 July JULY 1
8 August AUGUST 2
9 September SEPTEMBER 3
10 October OCTOBER 2
11 November NOVEMBER 3
12 December DECEMBER 3
Often, we will want to map multiple things together:
month.namemonth.abb [1] "Jan is short for January" "Feb is short for February"
[3] "Mar is short for March" "Apr is short for April"
[5] "May is short for May" "Jun is short for June"
[7] "Jul is short for July" "Aug is short for August"
[9] "Sep is short for September" "Oct is short for October"
[11] "Nov is short for November" "Dec is short for December"
Use pmap to go to three or more
Use imap to get the index of the element as well:
[1] "January is month number 1" "February is month number 2"
[3] "March is month number 3" "April is month number 4"
[5] "May is month number 5" "June is month number 6"
[7] "July is month number 7" "August is month number 8"
[9] "September is month number 9" "October is month number 10"
[11] "November is month number 11" "December is month number 12"
map is most useful when the underlying function can’t be vectorized: e.g., file processing or downloading
# A tibble: 5 × 4
type setup punchline id
<chr> <chr> <chr> <int>
1 general What did the fish say when it hit the wall? Dam. 1
2 general How do you make a tissue dance? You put a little b… 2
3 general What's Forrest Gump's password? 1Forrest1 3
4 general What do you call a belt made out of watches? A waist of time. 4
5 general Why can't bicycles stand on their own? They are two tired 5
Often we will want to map several times as we perform steps of an analysis.
Cleaner than one big function:
# A tibble: 5 × 4
type setup punchline id
<chr> <chr> <chr> <int>
1 general What did the fish say when it hit the wall? Dam. 1
2 general How do you make a tissue dance? You put a little b… 2
3 general What's Forrest Gump's password? 1Forrest1 3
4 general What do you call a belt made out of watches? A waist of time. 4
5 general Why can't bicycles stand on their own? They are two tired 5
Sometimes, when we have a complex list, we want to pull out certain elements:
map:[[1]]
[1] "ggplot2"
[[2]]
[1] "lubridate"
[[3]]
[1] "stringr"
[[4]]
[1] "dplyr"
[[5]]
[1] "readr"
[[6]]
[1] "magrittr"
[[7]]
[1] "tidyr"
[[8]]
[1] "nycflights13"
[[9]]
[1] "rvest"
[[10]]
[1] "purrr"
[[11]]
[1] "haven"
[[12]]
[1] "readxl"
[[13]]
[1] "reprex"
[[14]]
[1] "tibble"
[[15]]
[1] "multidplyr"
[[16]]
[1] "dtplyr"
[[17]]
[1] "hms"
[[18]]
[1] "modelr"
[[19]]
[1] "forcats"
[[20]]
[1] "tidyverse"
[[21]]
[1] "tidytemplate"
[[22]]
[1] "blob"
[[23]]
[1] "ggplot2-docs"
[[24]]
[1] "glue"
[[25]]
[1] "style"
[[26]]
[1] "dbplyr"
[[27]]
[1] "googledrive"
[[28]]
[1] "googlesheets4"
[[29]]
[1] "tidyverse.org"
[[30]]
[1] "datascience-box"
Use pluck to access elements of a list in the same way:
From last week,
Anything can be a column of a data.frame, even another data.frame
# A tibble: 3 × 2
# Groups: species [3]
species data
<fct> <list>
1 Adelie <tibble [152 × 7]>
2 Gentoo <tibble [124 × 7]>
3 Chinstrap <tibble [68 × 7]>
data is a set of 3 different data frames (one per species)
Use map to fit the same model to each data separately:
# A tibble: 3 × 3
# Groups: species [3]
species data model
<fct> <list> <list>
1 Adelie <tibble [152 × 7]> <lm>
2 Gentoo <tibble [124 × 7]> <lm>
3 Chinstrap <tibble [68 × 7]> <lm>
Continue using map to analyze each species-model separately
# A tibble: 3 × 4
# Groups: species [3]
species coefficients slope r_sq
<fct> <list> <dbl> <dbl>
1 Adelie <dbl [2]> 32.8 0.219
2 Gentoo <dbl [2]> 54.6 0.494
3 Chinstrap <dbl [2]> 34.6 0.412
So flipper_len explains the most body_mass variation in Gentoo penguins.
When passing functions to map, we might want to handle errors
adverb to modify a functionIf you have a function that sometimes throws errors, wrap it in safely
[1] 7 8 5 5 3 4 4 6 9 7 8 8
[1] 7 8 5 5 3 4 NA 6 9 NA NA 8
possiblyThe safely |> map("result") combo is common, so helper possibly:
[1] 7 8 5 5 NA 4 4 6 9 7 8 8
For functions that fail sporadically (e.g., web access), try insistently:
Will try 3 times by default
(cf, sites that don’t work reliably like in MP#02)
Some websites will get mad if you query too often: slowly will make sure it isn’t called too often
Default is once per second.
For parallel processing, use the in_parallel adverb:
[1] 2 3 4 5
[1] 2 3 4 5
Argument to in_parallel needs to be an anonymous function
Compare:
user system elapsed
0.001 0.000 4.022
user system elapsed
0.001 0.001 1.006
Parallelization is not magic:
safely so if one step errors, you don’t loose everythingGiven a list, the pluck function will pull out elements:
list_obj |> pluck(n) will pull out the \(n^{\text{th}}\) elementlist_obj |> pluck("name") will pull out the element named "name"list_obj |> pluck(func) will apply the “accessor” funclm: Linear regression (and ANOVA)
Call:
lm(formula = body_mass ~ flipper_len, data = penguins)
Residuals:
Min 1Q Median 3Q Max
-1058.80 -259.27 -26.88 247.33 1288.69
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5780.831 305.815 -18.90 <2e-16 ***
flipper_len 49.686 1.518 32.72 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 394.3 on 340 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.759, Adjusted R-squared: 0.7583
F-statistic: 1071 on 1 and 340 DF, p-value: < 2.2e-16
Can use pluck + accessors to get the coefficients
(Intercept) flipper_len
-5780.83136 49.68557
Final form is most robust
pluck has some nice useability features:
pluck(1) |> pluck("a") is the same as pluck(1, "a")NULL: pluck("a", .default=NA)
chuck if you want to error instead of defaultGiven a list, we can ‘combine’ elements with the reduce function:
Useful for combining many data sets in a ‘mega-join’
Use accumulate to keep intermediate results (a la cumsum)
Not everything fits within purrr tooling
But a lot does!
Use it when helpful:
map