STA 9750
Week 11 Update
Tue 2025-11-18
Thu 2025-11-13

Michael Weylandt

Agenda

Today

Administrative Business
Review: Identifying HTML Elements
New Material: Extracting HTML with rvest
Looking Ahead

Administrative Business

STA 9750 Mini-Project #03

First few submissions look great!

For the rest of you, extension until tomorrow midnight

Will assign PF on Saturday

Extended until next Sunday (May 4th) at 11:45pm

Pay attention to the rubric

STA 9750 Mini-Project #04

MP#04 released today

Due 2025-11-21 at 11:59pm ET (\(\approx\) 3 weeks)
Topic: Political Mapping
- Recreation of NYT “county shift” map
Format:
- Political Hack - Play the role of a TV “Talking Head”
- GitHub post AND Brightspace submission

STA 9750 Mini-Project #04

Assignment Modification

I’m going to make the ‘writing’ parts of this assignment extra credit
Task 6 is not required (Task 4 still is)

If you don’t want to write propoganda, just write Task 4 as straight answers

Going Forward

Pre-Assignments

Brightspace - Wednesdays at 11:45

Reading, typically on course website
Brightspace auto-grades
- I have to manually change to completion grading

Next pre-assignment is 2025-11-24 at 11:59pm ET

I am behind on reading PA comments:

For anything urgent, please contact me directly🙏

Grading

I owe you:

MP#02 peer meta-review fixes
Mid-Term Check-In Feedback

Course Support

Synchronous
- Office Hours 2x / week
  - MW Office Hours on Tuesdays + Thursday
Asynchronous
- Piazza (\(\approx 20\) minute average response time)

Upcoming

Semester end is coming quickly!

MP#04
Final presentations
Final reports

That’s it!

Observation Comments

Thank you!

Large Files

Several of you have reported issues with git complaining about large files

git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 10

SO on Removing Large Files:

git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch data/**' HEAD

⚠️This is dangerous! I can help with it after class.⚠️

Review: Identifying HTML Elements

Pre-Assignment #11 FAQs

FAQ: HTML vs CSS

What is the difference between HTML and CSS?

HTML is substance; CSS is style

Distinction can be a bit blurry & CSS can live “inside” HTML

Example

FAQ: `a` in SelectorGadget

Why does [SelectorGadget] display “a” in the selector when selecting a web link?

a is for anchor.

Confusingly, anchors are both links and destinations.

Anchors can reference:

Another page (http://URL)
A particular part of another page (http://URL#place)
A particular part of the same page (#place)

Quarto supports cross-linking with anchors

FAQ: SelectorGadget - Multiple Clicks

Why does SelectorGadget go “unique” when I click multiple elements of interest?

Can’t find a common structure:

Typically a problem within lists or common element types

FAQ: Selectors - Nesting

How to avoid unwanted elements such as headers or sidebars, focusing only on the main content I need?

Nest your selectors!

thing1 thing2 will select only thing2s inside a thing1

StarWars page

Try main h2

FAQ: Relationship to Markdown

Is [HTML] similar to Markdown ?

Markdown is an easier way to write (a subset of) HTML

Name is a bad joke: Markup (M in HTML) vs Markdown

HTML can (theoretically) do more, but painful to write by hand

FAQ: Messy HTML

How can we target data with CSS Selectors in messy HTML?

Pain and suffering - depends how messy.

Worst case: a little bit of HTML selection + text processing (next week)

HTML Review

HTML Structure
CSS Selectors (SelectorGadget)
Introduction rvest

Pre-Assignment Exercises

Star Wars

main h2

CUNY Table

table or tbody

Baruch GPS

.geo

New Material: `rvest`

Live Demo: `rvest`

Exercise 1: CUNY Map

Recall Lab 1. Goal: extend map to all CUNYs

Steps:

Read CUNY table and extract links
Follow links and pull coordinates
- To read geo class, use this:

Adapt Lab 1 leaflet to show all locations

Breakout Rooms

Room	Team	Room	Team
1	Team Mystic	5	Money Team + CWo.
2	Subway Metrics	6	Lit Group
3	Noise Busters	7	Cinephiles + VG
4	AI Impact Col	8

Exercise 2: Cocktails

Goal: create a cocktail data frame from Hadley’s Recipies

Today:

How to find them all?
How to extract individual recepies?
How to pull items from each recipie?

Next time:

How to convert text to numeric values + column info
Data wrangling

Looking Ahead

Upcoming Mini-Projects

MP#04: TBD

Seeking suggestions for next semester

Course Feedback Survey

Upcoming

Next Week (After Spring Break):

MP#03 Peer Feedback
Pre Assignment

Longer Term:

MP#04
Final Presentations

Life Tip of the Week

End of the Semester is Upcoming

End of the semester is rough
- More important than ever to plan ahead
- Ask for extensions / accomodations early
- Help us help you
- Faculty are slammed as well
Don’t ‘grade grub’
- Makes it harder for professors to curve in your favor
- Seek extra credit from the syllabus
- Don’t ask for special treatment - ask how to take advantage of existing opportunities
Take care of yourselves
- Nasty bugs going around …

STA 9750 Week 11 Update Tue 2025-11-18 Thu 2025-11-13

Agenda

Today

Administrative Business

STA 9750 Mini-Project #03

STA 9750 Mini-Project #04

STA 9750 Mini-Project #04

Going Forward

Pre-Assignments

Grading

Course Support

Upcoming

Observation Comments

Large Files

Review: Identifying HTML Elements

Pre-Assignment #11 FAQs

FAQ: HTML vs CSS

FAQ: a in SelectorGadget

FAQ: SelectorGadget - Multiple Clicks

FAQ: Selectors - Nesting

FAQ: Relationship to Markdown

FAQ: Messy HTML

HTML Review

Pre-Assignment Exercises

New Material: rvest

Live Demo: rvest

Exercise 1: CUNY Map

Breakout Rooms

Exercise 2: Cocktails

Looking Ahead

Upcoming Mini-Projects

Upcoming

Life Tip of the Week

STA 9750
Week 11 Update
Tue 2025-11-18
Thu 2025-11-13

FAQ: `a` in SelectorGadget

New Material: `rvest`

Live Demo: `rvest`