[1] "Mario" "Peach" NA "Bowser"
quarto
) ✅R
Basics ✅R
✅R
✅R
R
Today we welcome Prof. Ann Brandwein to our course.
Advisor for MS Stat and MS QMM. If you don’t already know Prof. B, you should!
Thank you for hard work on MP#01 and MP#02!
These are the ‘bigger’ projects
I’m glad to see y’all having fun with these
“People may be getting burned alive on our subways, but at least we’re stopping our riders from burning all that carbon,” said Don Vitiatus, CEO of the MTA, following Jay-Z’s heartfelt rendition of Empire State of Mind.
Peer feedback meta-grades from MP#01 released. Currently reviewing MP#02
Some general comments:
Now Online
Due 2025-04-23 at 11:45pm ET
GitHub post (used for peer feedback) AND Brightspace
Three Weeks: don’t wait until the very end
Should be much less demanding than MP #01 and MP#02
R
Pay attention to the rubric
Proposal feedback a few weeks back - good offline follow up - come to OH to discuss
Next Week: Mid-Term Check-In Presentations
Sharing private comments to one group:
[On spatial subdivisions] It’s hard to say what level you should work at, but the general rule is small as possible. Students often think high resolution (lots of small regions) is harder, but it’s actually much easier. You get more data (there are more ZCTAs than boroughs) and there is more homogeneity within each unit so it’s easier to identify effects.
Big data is hard for computers but easy for analysis. Small data is what makes doing statistics hard.
Sharing private comments to one group:
Data Quality: It is useful to distinguish two things here:
- Is the data representative and useful? Is the survey designed to actually answer the question you want based on the relevant population? Is the sampling actually scientific and represenative or will it have its own biases. Meta question: Does this data actually do what I need it to do?
- Is the data recorded well? Are there tons of missing data? Are there outliers you need to handle? Etc. Meta question: Does this data actually do what it claims to do?
Sharing private comments to one group:
As you read prior literature, you should be asking yourself “what are we adding?” If you find someone who has done exactly what you have done, why are you wasting your time? The novelty of your work can be temporal (redoing an old analysis on new post-Covid data), spatial (recreating a Chicago study in NYC), data-source (using new data to confirm a prior finding) or methodological (using new statistical and visualization techniques to study an old problem), but fundamentally you need to be able to answer “Why would someone hire me to do this? Why is this worth my time to do it and my audience’s time to hear about it?” (These are not the only options for novelty, just some axes students have used in the past.)
Sharing private comments to one group:
The activities of this class are programming related - but the point of the class is to give you the analytical tools to achieve your goals. These are mainly code things, but analytical tools also encompasses modes of thought and critical thinking. (That’s why I try so hard to ‘model’ good analysis in the mini-projects.) You aren’t required to make the step of moving beyond pure descriptive (correlation) analysis to causal claims, but if you go for it, I want you to do it in the very best way possible.
Brightspace - Wednesdays at 11:45
No Pre-Assignment for Next Week (Presentations)
Thank you for FAQs and (honest) team feedback. Keep it coming!
‘Plain text’ files:
Read into R
with readr
functions (e.g., read_csv
)
R
From FiveThirtyEight
Data can be found at https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/candy-power-ranking/candy-data.csv
Read into R
(readr::read_csv
) and make 3 plots:
Room | Team | Room | Team | |
---|---|---|---|---|
1 | Team Mystic + B | 5 | Money Team + CWo. | |
2 | Subway Metrics | 6 | Lit Group | |
3 | Noise Busters | 7 | Cinephiles + VG | |
4 | AI Imp. Coll. |
R
Two topics:
From abstrax.io
JSON
:
JavaScript Object Notation
dict
s of dict
s of dict
s) than R
data.frame
sExample:
{
"data": {
"id": 27992,
"title": "A Sunday on La Grande Jatte — 1884",
"image_id": "1adf2696-8489-499b-cad2-821d7fde4b33"
},
"config": {
"iiif_url": "https://www.artic.edu/iiif/2",
}
}
Read JSON in R
with jsonlite
package (alternatives exist)
[1] "Mario" "Peach" NA "Bowser"
Name Age Occupation
1 Mario 32 Plumber
2 Peach 21 Princess
3 <NA> NA <NA>
4 Bowser NA Koopa
$type
[1] "programming"
$setup
[1] "Why did the programmer always mix up Halloween and Christmas?"
$punchline
[1] "Because Oct 31 equals Dec 25."
$id
[1] 418
Compare to browser access
download.file
function (url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE,
extra = getOption("download.file.extra"), headers = NULL,
...)
NULL
Basic file download capabilities:
url
: sourcedestfile
: where on your computer to store itmethod
: what software to use in the background to downloadHTTP
ftp
, smtp
, ssh
, …“Low-level” mechanism of internet transfer
R
packages add a friendly UXhttr2
for low-level work (today)HTTP has two stages:
Modern (easy) APIs put most of the behavior in the URL
In Firefox: Right-Click + Inspect
In Chrome: Right-Click + Developer Tools
httr2
httr2
(pronounced “hitter-2”) is low-level manipulation of HTTP.
Pretty simple so far:
example_url()
starts a tiny local web host127.0.0.1
is localhost
httr2
RequestsBuild a request:
request
req_method
req_body_*
req_cookies_set
req_auth_basic
/ req_oauth
httr2
RequestsBehaviors:
req_cache
req_timeout
Execution:
req_perform
httr2
ResponsesRequest status
resp_status
/ resp_status_desc
Content:
resp_header*
resp_body_*
Demo: Using httr2
to get a random joke from
See Lab #09