STA 9750
Week 8 Update
2025-03-27

Michael Weylandt

STA 9750 Mini-Project #02

Submission due yesterday at 11:45pm

Very creative!

STA 9750 MP#02 - Peer Feedback

Peer feedback assigned on GitHub + email this morning

  • \(\approx 4\) feedbacks each
  • Take this seriously: around 20% of this assignment is “meta-review”
  • Goal: rigorous constructive critique

Submissions may not map perfectly to rubric - use your best judgement

Learn from this! What can you adapt for MP#03?

STA 9750 Mini-Project #03

Now online - Due 2025-04-23 at 11:45pm ET

Topic: Creating the Ultimate Playlist

  • GitHub post (used for peer feedback) AND Brightspace

  • Three Weeks: don’t wait until the very end

  • Should be less demanding than MP #01 and MP#02

    • Lots of little files: practice data management
    • Some data limitations, particularly after inner_join

Pay attention to the rubric

Thank You!

Thank you!

A personal note, if you allow me:

I’m really enjoying this class - thank you all!

Your effort is not unnoticed - I know this class starts “pedal-to-the-metal” but hopefully you’ve seen just how powerful these tools R.

More than that - I appreciate your good attitude and willingness to share your frustrations and triumphs. Reading comments on PA quiz this week was uplifting.

Continual Improvement

I’ve set up a TODO file with everything I want to improve for next cohort.

Suggestions welcome.

Every semester, I create new mini-projects. Ideas and suggestions very welcome

  • Topics and data sets are both great

Going Forward

Upcoming Mini-Projects

  • MP#04:
    • Deadline: 2025-05-07 at 11:45pm ET
    • Topic: Exploring Recent US Political Shifts

Course Project

Project should be your main focus for rest of course

  • But you still need to do mini-projects and pre-assignments

Pre-Assignments

Brightspace - Wednesdays at 11:45

  • Reading, typically on course website
  • This week: getting data into R

Next pre-assignment is 2025-04-02 at 11:45pm ET

Thank you for FAQs and (honest) team feedback. Keep it coming!

Course Support

  • Synchronous
    • Office Hours 2x / week
      • MW Office Hours on Monday + Thursday for rest of semester
      • No OH during Spring Break
  • Asynchronous
    • Piazza (\(<25\) minute average response time)

Today

Warm-Up

“Datasaurus Dozen”:

  • install.packages("datasauRus") (Note capital R)
  • library(datasauRus); data(datasaurus_dozen)

Create an animated (gganimate) plot:

  • \(x, y\) scatterplot
  • Animate different values of dataset

If you are having trouble with gganimate, facet instead.

Warm-Up

Diving Deeper into ggplot2

First topic: maps!

Install the sf package: Simple Features for Spatial Data

Exercise: Lab #08

Breakout Rooms

Room Team Room Team
1 Team Mystic 5 Money Team + CWo.
2 Subway Metrics 6 Lit Group
3 Noise Busters 7 Cinephiles + VG
4 AI Imp. Coll.

Topic Interactive Tools for Data Analysis

Looking Ahead

Looking Ahead

Due Wednesday at 11:45pm:

  • Pre-Assignment #09 (Brightspace)
    • Data Import
  • MP #02 Peer Feedback on GitHub AND Brightspace

Next three weeks:

  • Reading ‘clean data’ into R
  • Reading and parsing HTML
  • Parsing messy (text) data

Teaching Observation by Prof. Brandwein - Next Week

Life Tip of the Week

Get Inspired!

The tools of this course are powerful and flexible

To learn more ways to apply them, check out ‘Galleries’:

STA 9750 Hall of Fame

My current side-project: STA 9750 Hall of Frame

Gallery of excellent STA 9750 submissions

  • ‘Signal-boost’ individual portfolios
  • Demonstrate the high quality of Baruch students
  • Inspiration for future semesters
  • 2-3 per mini-project

Trying to launch next week - will share a ‘sign up’ if you’re interested

Just FYI:
Old Pre-Assignment #08 FAQs

FAQ: ggplot2 - aes()

What is the aes function - stands between data and geom_

  • Each geom_ takes a fixed set of “coordinates”
  • Each data set has its own column names
  • aes ties these together

FAQ: ggplot2 - Why do Pie Charts have a bad reputation?

  • Use of area and angle over length: less accurate perception
  • Depends on fill to convey category - limited categories

But honestly - “insider smugness” and hate of Excel

FAQ: ggplot2 - Plot Type Choice

For me:

  • Exploratory mode:
    • Simple: line, scatter, bar, frequency
  • Publication mode:
    • Very context specific

FAQ: ggplot2 - Font Sizing

Theme machinery!

FAQ: ggplot2 - Overplotting / ScatterBlobs

Student asked about “scatterblobs” - typo(?) but I love it!

  • Density based plotting: hexbins, histograms, rugplots
  • Data reduction: summarization or sub-sampling

FAQ: ggplot2 - Optimizing Performance

Active project of ggplot2 team - not much you can do

Practical advice: plot less (see previous slide)

FAQ: ggplot2 - Beyond Scatter and Line

Some favorite semi-advanced plot types:

  • Violin plots: combination of boxplot and histogram
  • Ridgelines
  • Beeswarms

Deep rabbit hole

FAQ: ggplot2 - Geospatial Visualizations

That’s our goal for today!

FAQ: ggplot2 - High-Dimensional Data

High-dimensional data: measure many variables per observation (“wide”)

High-dimensional data is hard to visualize

Approaches:

  • Pair plots for “moderate” HDD
  • PCA (or similar dimension reduction. Take 9890!)

FAQ: ggplot2 - Creating a Custom Theme

Advanced:

  • theme_set() - change ggplot2 defaults
  • .Rprofile - set code to run every time you start R

FAQ: ggplot2 - When Not to Use

ggplot2 is designed to make good statistical graphics. Sub-par for:

  • Advanced interactivity
  • Really big data
  • Hardcore customization / “infographics”

FAQ: git WTF

Reference: Happy Git with R