STA 9750
Week 8 Update
Tue 2025-10-28
Thu 2025-10-23

Michael Weylandt

STA 9750 Mini-Project #02

Submission due yesterday at 11:45pm

Very creative!

STA 9750 MP#02 - Peer Feedback

Peer feedback assigned on GitHub + email this morning

\(\approx 4\) feedbacks each
Take this seriously: around 20% of this assignment is “meta-review”
Goal: rigorous constructive critique

Submissions may not map perfectly to rubric - use your best judgement

Learn from this! What can you adapt for MP#03?

STA 9750 Mini-Project #03

Now online - Due 2025-11-07 at 11:59pm ET

Topic: TBD

GitHub post (used for peer feedback) AND Brightspace
Three Weeks: don’t wait until the very end
Should be less demanding than MP #01 and MP#02
- Lots of little files: practice data management
- Some data limitations, particularly after inner_join

Pay attention to the rubric

Thank You!

Thank you!

A personal note, if you allow me:

I’m really enjoying this class - thank you all!

Your effort is not unnoticed - I know this class starts “pedal-to-the-metal” but hopefully you’ve seen just how powerful these tools R.

More than that - I appreciate your good attitude and willingness to share your frustrations and triumphs. Reading comments on PA quiz this week was uplifting.

Continual Improvement

I’ve set up a TODO file with everything I want to improve for next cohort.

Suggestions welcome.

Every semester, I create new mini-projects. Ideas and suggestions very welcome

Topics and data sets are both great

Going Forward

Upcoming Mini-Projects

MP#04:
- Deadline: 2025-11-21 at 11:59pm ET
- Topic: TBD

Course Project

Project should be your main focus for rest of course

But you still need to do mini-projects and pre-assignments

Pre-Assignments

Brightspace - Wednesdays at 11:45

Reading, typically on course website
This week: getting data into R

Next pre-assignment is 2025-10-29 at 11:59pm ET

Thank you for FAQs and (honest) team feedback. Keep it coming!

Course Support

Synchronous
- Office Hours 2x / week
  - MW Office Hours on Monday + Thursday for rest of semester
  - No OH during Spring Break
Asynchronous
- Piazza (\(<25\) minute average response time)

Today

Warm-Up

“Datasaurus Dozen”:

install.packages("datasauRus") (Note capital R)
library(datasauRus); data(datasaurus_dozen)

Create an animated (gganimate) plot:

\(x, y\) scatterplot
Animate different values of dataset

If you are having trouble with gganimate, facet instead.

Warm-Up

Diving Deeper into `ggplot2`

First topic: maps!

Install the sf package: Simple Features for Spatial Data

Exercise: Lab #08

Breakout Rooms

Room	Team	Room	Team
1	Team Mystic	5	Money Team + CWo.
2	Subway Metrics	6	Lit Group
3	Noise Busters	7	Cinephiles + VG
4	AI Imp. Coll.

Topic Interactive Tools for Data Analysis

Looking Ahead

Due Wednesday at 11:45pm:

Pre-Assignment #09 (Brightspace)
- Data Import
MP #02 Peer Feedback on GitHub AND Brightspace

Next three weeks:

Reading ‘clean data’ into R
Reading and parsing HTML
Parsing messy (text) data

Teaching Observation by Prof. Brandwein - Next Week

Life Tip of the Week

Get Inspired!

The tools of this course are powerful and flexible

To learn more ways to apply them, check out ‘Galleries’:

STA 9750 Hall of Fame

My current side-project: STA 9750 Hall of Frame

Gallery of excellent STA 9750 submissions

‘Signal-boost’ individual portfolios
Demonstrate the high quality of Baruch students
Inspiration for future semesters
2-3 per mini-project

Trying to launch next week - will share a ‘sign up’ if you’re interested

Just FYI:
Old Pre-Assignment #08 FAQs

FAQ: `ggplot2` - `aes()`

What is the aes function - stands between data and geom_

Each geom_ takes a fixed set of “coordinates”
Each data set has its own column names
aes ties these together

FAQ: `ggplot2` - Why do Pie Charts have a bad reputation?

Use of area and angle over length: less accurate perception
Depends on fill to convey category - limited categories

But honestly - “insider smugness” and hate of Excel

FAQ: `ggplot2` - Plot Type Choice

For me:

Exploratory mode:
- Simple: line, scatter, bar, frequency
Publication mode:
- Very context specific

FAQ: `ggplot2` - Font Sizing

Theme machinery!

FAQ: `ggplot2` - Overplotting / ScatterBlobs

Student asked about “scatterblobs” - typo(?) but I love it!

Density based plotting: hexbins, histograms, rugplots
Data reduction: summarization or sub-sampling

FAQ: `ggplot2` - Optimizing Performance

Active project of ggplot2 team - not much you can do

Practical advice: plot less (see previous slide)

FAQ: `ggplot2` - Beyond Scatter and Line

Some favorite semi-advanced plot types:

Violin plots: combination of boxplot and histogram
Ridgelines
Beeswarms

Deep rabbit hole

FAQ: `ggplot2` - Geospatial Visualizations

That’s our goal for today!

FAQ: `ggplot2` - High-Dimensional Data

High-dimensional data: measure many variables per observation (“wide”)

High-dimensional data is hard to visualize

Approaches:

Pair plots for “moderate” HDD
PCA (or similar dimension reduction. Take 9890!)

FAQ: `ggplot2` - Creating a Custom Theme

Advanced:

theme_set() - change ggplot2 defaults
.Rprofile - set code to run every time you start R

FAQ: `ggplot2` - When Not to Use

ggplot2 is designed to make good statistical graphics. Sub-par for:

Advanced interactivity
Really big data
Hardcore customization / “infographics”

FAQ: `git` WTF

Reference: Happy Git with R

STA 9750 Week 8 Update Tue 2025-10-28 Thu 2025-10-23

STA 9750 Mini-Project #02

STA 9750 MP#02 - Peer Feedback

STA 9750 Mini-Project #03

Thank You!

Thank you!

Continual Improvement

Going Forward

Upcoming Mini-Projects

Course Project

Pre-Assignments

Course Support

Today

Warm-Up

Warm-Up

Diving Deeper into ggplot2

Breakout Rooms

Topic Interactive Tools for Data Analysis

Looking Ahead

Looking Ahead

Life Tip of the Week

Get Inspired!

STA 9750 Hall of Fame

Just FYI: Old Pre-Assignment #08 FAQs

FAQ: ggplot2 - aes()

FAQ: ggplot2 - Why do Pie Charts have a bad reputation?

FAQ: ggplot2 - Plot Type Choice

FAQ: ggplot2 - Font Sizing

FAQ: ggplot2 - Overplotting / ScatterBlobs

FAQ: ggplot2 - Optimizing Performance

FAQ: ggplot2 - Beyond Scatter and Line

FAQ: ggplot2 - Geospatial Visualizations

FAQ: ggplot2 - High-Dimensional Data

FAQ: ggplot2 - Creating a Custom Theme

FAQ: ggplot2 - When Not to Use

FAQ: git WTF

STA 9750
Week 8 Update
Tue 2025-10-28
Thu 2025-10-23

Diving Deeper into `ggplot2`

Just FYI:
Old Pre-Assignment #08 FAQs

FAQ: `ggplot2` - `aes()`

FAQ: `ggplot2` - Why do Pie Charts have a bad reputation?

FAQ: `ggplot2` - Plot Type Choice

FAQ: `ggplot2` - Font Sizing

FAQ: `ggplot2` - Overplotting / ScatterBlobs

FAQ: `ggplot2` - Optimizing Performance

FAQ: `ggplot2` - Beyond Scatter and Line

FAQ: `ggplot2` - Geospatial Visualizations

FAQ: `ggplot2` - High-Dimensional Data

FAQ: `ggplot2` - Creating a Custom Theme

FAQ: `ggplot2` - When Not to Use

FAQ: `git` WTF