Software Tools for Data Analysis
STA 9750
Michael Weylandt
Week 9 – Thursday 2026-04-16
Last Updated: 2026-04-15

STA 9750 Week 9

Today: Lecture #07: Data Import and File System Usage; APIs

These slides can be found online at:

https://michael-weylandt.com/STA9750/slides/slides09.html

In-class activities can be found at:

https://michael-weylandt.com/STA9750/labs/lab07.html

Upcoming TODO

Upcoming student responsibilities:

Date Time Details
2026-04-21 6:00pm ET Pre-Assignment #10 Due
2026-04-23 6:00pm ET Pre-Assignment #11 Due
2026-04-24 11:59pm ET Mini-Project #03 Due
2026-04-30 6:00pm ET Pre-Assignment #12 Due
2026-05-03 11:59pm ET Mini-Project Peer Feedback #03 Due
2026-05-07 6:00pm ET Final Project Presentation Slides Due
2026-05-14 6:00pm ET Pre-Assignment #14 Due

STA 9750 Week 9

Today: Lecture #07: Data Import and File System Usage; APIs

  • Communicating Results (quarto) ✅
  • R Basics ✅
  • Data Manipulation in R
  • Data Visualization in R ⬅️
  • Getting Data into R ⬅️
    • Files and APIs ⬅️
    • Web Scraping
    • Cleaning and Processing Text
  • Statistical Modeling in R

STA 9750 Week 9

Today: Lecture #07: Data Import and File System Usage; APIs

  • Course Administration
  • Warm-Up Exercises
  • Using the File System
  • How The Web Actually Works
  • JSON and APIs
  • Wrap-Up

Today

Today

  • Course Administration
  • Warm-Up Exercise
  • New Material
    • Files and the File System
    • HTTP and Web Access
    • API Usage
  • Wrap-Up
    • Life Tip of the Day

Course Administration

Mini-Project #03

MP#03 - Who Goes There? US Internal Migration and Implications for Congressional Reapportionment

Due 2026-04-24 at 11:59pm ET

Topics covered:

  • Data Import
    • Using Published Packages (tidycensus)
    • Downloading and Parsing Static Files
    • Using APIs
  • Data Manipulation

Course Support

  • Synchronous: MW Office Hours 2x / week:
    • Wednesdays 5pm in-person
    • Thursdays 5pm on Zoom
  • Asynchronous: Piazza

Future Mini-Projects

  • MP#04: Going for the Gold
    • Deadline: 2026-05-15 at 11:59pm ET
    • Web Parsing (Olympedia) to predict the 2028 Olympics

Already posted - subject to minor revisions before 2026-04-23

Course Project

Course Project should be your main focus for rest of course

  • You have received second round of feedback from me
  • Starting with the tools of today, you can import data into R for EDA
    • Next two weeks will dive deeper into data cleaning and preparation

But you still need to do other mini-projects and pre-assignments(!)

Review Exercise

Flat / Plain Text Files

‘Plain text’ files:

  • Simple human readable and human writeable file formats
  • Not specific to one piece of software
  • Examples: .csv, .txt, .tsv
  • Anti-Examples: .docx, .pdf, .jpg
  • Line isn’t 100% clear

Import into R with read* functions (e.g., readr::read_csv )

SSA Baby Names

From the US Social Security Administration Baby Names records:

  • State, Sex, Name, Number Born

Read into R (readr::read_csv) and make plots to answer various questions:

  1. How popular is the name “Michael” in NY?
  2. Are Marys becoming less common in the past decade?
  3. Did Juan and Jose become popular names at the same time?

Breakout Rooms

Breakout Room Team
1 3-1-Fun! (XC+ML+ER+RJSN)
2 Maniac Braniacs (HHS+KK+FC+DN)
3 Emissions Impossible (LR+MOG+APTL)
4 Water Benders (JE+JABB+MTP+JA+AS)
5 Inspector Gadget (MUO+KN+CM+ID+KM)

Review activities from today’s lab

Files and the File System

Files and the File System

The file system is the way your computer organizes and provides access to files:

  • Tree like-structure:
    • Files in folders in folders in folders …
      • Separated by /
      • e.g. STA9750-2026-SPRING/data/mp01/data_file.csv
    • End point (or starting point) is the root:
      • Called / on Mac/Linux
      • Drive name on Windows (C:/)

Home Directory

Typically, all user files are stored in a “home directory”:

  • /Users/YOURNAME on Mac/Linux
  • C:/Users/YOURNAME on Windows
  • Subfolders include Downloads, Desktop, Documents, etc
  • Commonly abbreviated as ~
    • My desktop is ~/Desktop
    • My general course material is in ~/STA9750
    • Semester specific material is in ~/STA9750-2026-SPRING

Paths

Two ways to specify a file:

  • Absolute path:
    • Starts from root and gives “full name”
    • /Users/michaelweylandt/STA9750/docs/index.html
    • GPS coordinates
  • Relative path:
    • Starts from working directory (getwd()) and gives directions
    • If I am in STA9750, path is just docs/index.html
    • ./ means “this directory”: could also write ./docs/index.html
    • ../ means “up one level”
      • If I were in STA9750/docs, source at ../index.qmd
    • Driving directions

Using the File System in R

Use the fs package to interact with the file system:

  • dir_ls(), dir_create(), dir_exists(), dir_delete()
  • path(), path_home(), path_abs(), path_rel()
  • file_create(), file_exists(), file_delete(), file_info()

Activity #01

Return to breakout rooms to practice file system usage:

  • Convert relative paths to absolute paths
  • List files in a directory
  • Examine metadata the largest file(s) in your STA9750-2026-SPRING directory

Accessing Data from the Web: URLs, HTTP, JSON

URLs

URLs are an extension of file paths for the internet:

  • Protocol / Scheme: How data should be transferred
    • HTTP(S), SMS (Texting), POP3/IMAP (Email)
  • Domain: Name of the other computer
    • In practice, often a ‘placeholder’ for something more complex
  • Path: Files on the other computer
    • In practice, hide functionalty behind file-like paths

URLs

From geeksforgeeks.org

Static Data Transfer

R’s basic download.file can be used for downloading simple files:

args(download.file)
function (url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE, 
    extra = getOption("download.file.extra"), headers = NULL, 
    ...) 
NULL

Basic file download capabilities:

  • url: source
  • destfile: where on your computer to store it

Customizable behavior, but defaults often work well:

  • method: what software to use in the background to download
  • mode: is this a text or binary file
  • cacheOK: are you ok with a cached version of the file
  • headers: do you need to send any additional info in your request

download.file()

download.file("https://raw.githubusercontent.com/michaelweylandt/STA9750/refs/heads/main/births.csv", 
              destfile="births.csv")

Note use of relative path here, so saves in current working directory

Be polite:

  • Try to avoid unnecessarily downloading files
  • Save file and only download if !file_exists(destfile)

Data Transfer - download.file()

if(!file_exists("births.csv")){
    download.file("https://raw.githubusercontent.com/michaelweylandt/STA9750/refs/heads/main/births.csv", 
                  destfile="births.csv")
}

read_csv("births.csv")

JSON

JSON:

  • Short for JavaScript Object Notation
  • Popular plain-text representation for hierarchical data.
  • Closer to Python objects (dicts of dicts of dicts) than R data.frames
  • Widely used for web-based data transfer

JSON

Example:

{
    "data": {
        "id": 27992,
        "title": "A Sunday on La Grande Jatte — 1884",
        "image_id": "1adf2696-8489-499b-cad2-821d7fde4b33"
    },
    "config": {
        "iiif_url": "https://www.artic.edu/iiif/2",
    }
}

JSON

Read JSON in R with jsonlite package (alternatives exist)

library(jsonlite)
# A JSON array of primitives
json <- '["Mario", "Peach", null, "Bowser"]'

# Simplifies into an atomic vector
fromJSON(json)
[1] "Mario"  "Peach"  NA       "Bowser"

JSON

json <-
'[
  {"Name" : "Mario", "Age" : 32, "Occupation" : "Plumber"}, 
  {"Name" : "Peach", "Age" : 21, "Occupation" : "Princess"},
  {},
  {"Name" : "Bowser", "Occupation" : "Koopa"}
]'
mydf <- fromJSON(json)
mydf
    Name Age Occupation
1  Mario  32    Plumber
2  Peach  21   Princess
3   <NA>  NA       <NA>
4 Bowser  NA      Koopa

JSON - A Web Standard

fromJSON("https://official-joke-api.appspot.com/random_joke")
$type
[1] "general"

$setup
[1] "Did you hear about the crime in the parking garage?"

$punchline
[1] "It was wrong on so many levels."

$id
[1] 394

Compare to browser access

Data Transfer: HTTP

HTTP

  • HyperText Transfer Protocol
  • Most common (but not only) internet protocol
  • Also ftp, smtp, ssh, …

“Low-level” mechanism of internet transfer

  • Many R packages add a friendly UX
  • httr2 for low-level work (today)

HTTP

HTTP has two stages:

  • Request
    • URL (Host + Path)
    • Method (VERB)
    • Headers
    • Content
    • Cookies
  • Response
    • Status Code
    • Headers
    • Content

Modern (easy) APIs put most of the behavior in the URL

HTTP in the Browser

In Firefox: Right-Click + Inspect

In Chrome: Right-Click + Developer Tools

HTTP with httr2

httr2 (pronounced “hitter-2”) is low-level manipulation of HTTP.

library(httr2)
request(example_url())
<httr2_request>
GET http://127.0.0.1:61966/
Body: empty

Pretty simple so far:

  • example_url() starts a tiny local web host
  • 127.0.0.1 is localhost

httr2 Requests

Build a request:

  • request
  • req_method
  • req_body_*
  • req_cookies_set
  • req_auth_basic / req_oauth

httr2 Requests

Behaviors:

  • req_cache
  • req_timeout

Execution:

  • req_perform

httr2 Responses

Request status

  • resp_status / resp_status_desc

Content:

  • resp_header*
  • resp_body_*

Live Demo

Demo: Using httr2 to get a random joke from

https://official-joke-api.appspot.com/

API Usage

Exercise - CRAN Logs API

See Part #03 from Today’s Lab

Wrap-Up

Review

Web Data Access

  • Local Files
  • Static Files & download.file
  • HTTP and JSON
  • API Usage

Orientation

  • Communicating Results (quarto) ✅
  • R Basics ✅
  • Data Manipulation in R
  • Data Visualization in R ⬅️ [NEXT TUESDAY]
  • Getting Data into R ⬅️ [Continued Thursday]
  • Statistical Modeling in R

Reminder: we’re finishing plotting on Tuesday. Video recording to be posted for those who can’t make it.

Life Tip of the Week

Date Formatting

Date Formatting

Write your dates as:

YYYY-MM-DD

e.g., Sys.Date()

YYYY-MM-DDTHH:MM:SS for date + time

Default in analytics-world:

  • Unambiguous (DD-MM vs MM-DD)
  • Alphabetical: sort by name => sort by date!

Date Formatting

Additional standards for other units of time:

  • 2026Q1 or 2025H2 for quarters and halves of year
  • 2026W03 for the third week of the year
  • 2026-04-15T03:30:15 for date + time
  • 2026-04-15T03:30:15-05:00 for time zones

Additional specifications for durations

Musical Treat