Software Tools for Data Analysis
STA 9750
Michael Weylandt
Week 4 – Thursday 2026-02-26
Last Updated: 2026-02-26

STA 9750 Week 4

Today: Project Proposal Presentation + Enrichment: Additional Review of R

These slides can be found online at:

https://michael-weylandt.com/STA9750/slides/slides04.html

In-class activities (if any) can be found at:

https://michael-weylandt.com/STA9750/labs/lab04.html

Upcoming TODO

Upcoming student responsibilities:

Date Time Details
2026-03-01 11:59pm ET Mini-Project Peer Feedback #00 Due
2026-03-05 6:00pm ET Pre-Assignment #05 Due
2026-03-12 6:00pm ET Pre-Assignment #06 Due
2026-03-13 11:59pm ET Mini-Project #01 Due
2026-03-19 6:00pm ET Pre-Assignment #07 Due
2026-03-22 11:59pm ET Mini-Project Peer Feedback #01 Due
2026-03-26 6:00pm ET Mid-Semester Check-In Slides Due

Course Project Proposals

Today: Course project proposal presentations (Official Description)

  • 6 minute presentation
  • Key topics:
    • Animating Question
    • Team Roster
  • Also discuss: Possible specific questions, data sources, analytical plan, anticipated challenges

Most important: team names!

Previous: Rat Pack, Subway Surfers, Going for Gold, etc.

After Proposals

Extra Review of R Basics - 100% Optional

  • Variables, Vectors, Types, Control Flow

Mini-Project #00

MP#00 due last Friday - Course Infrastructure Set-Up

  • Setup RStudio Project
  • Create GitHub Account and Repo
  • Connect Local Computer to GitHub
  • Deploy GitHub Pages

I enjoyed getting to know (a bit more about) you!

Mini-Project #00

Mini-Project #00 peer feedback assigned

  • Due 2026-03-01
  • 3 peer feedback assignments per student
    • Give 3 comments, get 3 comments

Peer Feedback Instructions can be found online

Mini-Project #00

Interactive script automates this process:

source("https://michael-weylandt.com/STA9750/load_helpers.R")
mp_pf_perform(0, github="YOUR_GITHUB_ID")

Will request the secret code I gave you via Piazza when you completed MP#00

Will ask you a series of questions and save your responses in a specifically formatted file

MP#00 Peer Feedback

Ungraded assignment, so simple feedback:

  • One strength, one weakness, one suggestion for improvement (3x)

After completion, upload bspf file to Brightspace (for privacy / confidentiality)

MP PF Cycle

Aims of Mini-Project Peer Feedback:

  • Learn to read and evaluate code
  • In analysis, rarely right and wrong; definitely better and worse
  • Learn tricks to improve your own site

“Good artists copy; great artists steal.” – Steve Jobs

Most coding is reading - most reading is reading your own old code

Course Support

Asynchronous Support: Piazza

  • All registered now in Piazza!
  • Sub-1 Hour average time to response

Synchronous Support: Office Hours

  • Wednesdays (in person) and Thursdays (Zoom) at 5pm

Pre-Assignments

Pre-Assignment #05 - Due 2026-03-05 at 6:00pm ET

  • Day before class at 6:00pm
  • Available on course website + Brightspace after 9pm
  • Unlimited re-tries so make sure you get 30/30!

Quick Review

Values

Basic “things” in R (“scalars”):

  • Numeric values: 10, 3.14, 0.0002, 1.234e5, 3 + 4i
    • R distinguishes between integer and numeric / double but you don’t need to
  • Character values: "Baruch", "Michael Weylandt",
    • Arbitrary length - matched quotes (double or single)
    • Mix quotes to put quotes inside a string: "He said to me: 'Code is great!'"
  • Logical values: TRUE, FALSE (no quotes)

Values

Use the class() function to see types:

class(3)
[1] "numeric"
class(3.14159)
[1] "numeric"
class("I love R!")
[1] "character"

Variables and Assignment

We can assign a name to a value:

x <- 3

Now anywhere we use x, the value 3 automatically is introduced:

x^2
[1] 9

Variables and Assignment

Variable names must be:

  • One word (no spaces)
  • All alpha, numeric, or underscores
  • Start with a letter

Avoid special “reserved words”: if, else, for, etc

Variables and Assignment

Assignment (<-) is the last operation, so can use to save results for later use

five_factorial <- 5 * 4 * 3 * 2 * 1
five_factorial
[1] 120

Then

six_factorial <- 6 * five_factorial
six_factorial
[1] 720

Vectors

An ordered collection of the same type is called a vector:

  • Create with c (“concatenate”):
x <- c(1, 2, 3)
class(x)
[1] "numeric"
length(x)
[1] 3

Vectors

Vectors are everywhere in R:

  • The “single” values we saw earlier are just vectors of length 1
length(3)
[1] 1

Vectors

Access specific elements of a vector with []:

x <- c("a", "b", "c")
x[2]
[1] "b"

Or give a vector of indices:

x[c(2, 1, 3)]
[1] "b" "a" "c"
x[c(2, 1, 2, 1, 3, 3)]
[1] "b" "a" "b" "a" "c" "c"

Vectors

Negative indices drop:

x[-2]
[1] "a" "c"

This is different from Python: not count backwards!

Functions

Functions take input (“arguments”) and produce results and side-effects:

x <- c(1, 4, 9, 16)
sqrt(x)
[1] 1 2 3 4

x is an input to the function sqrt

sqrt doesn’t “see” the name x; it sees the vector 1, 4, 9, 16

Not 100% true, but close enough!

Functions

Some examples:

  • Print to screen with some formatting: print
  • “Pure” print to screen (no formatting): cat
  • Combine strings: paste
  • Math: sin, sqrt
  • Load a package: library

Vectorization

Where possible, functions are vectorized:

x <- c(1, 2, 3)
y <- c(4, 5, 6)

x * y
[1]  4 10 18

Operations occur “in parallel” on matched elements

Control Flow

Two useful operations for small code snippets:

if(condition){
  do_if_true
} else { # Optional - can omit this 'side'
  do_if_false
}

This is a conditional operator:

  • Used to run code sometimes
    • Download a missing file, but not a file already present
    • Throw an error if something bad happens
    • Turn off parts of code with (if(FALSE){})
  • Everything between { and } is handled
  • else branch is optional

Compound Conditions

Can do ‘compound’ or nested if/else:

if(x > 10){
  cat("x is very positive.")
} else if(x > 0) {
  cat("x is a little positive.\n")
} else {
  cat("x is negative.\n")
}

Control Flow

Two useful operations for small code snippets:

for(element in vector){
  process_one_at_a_time(element)
}

Goes through vector taking out one element at a time:

nums <- c(1, 2, 3, 4, 5)
for(n in nums){
  cat(n, "squared is", n^2, "\n")
}
1 squared is 1 
2 squared is 4 
3 squared is 9 
4 squared is 16 
5 squared is 25 

Don’t focus on these too much - better alternatives coming soon!

Control Flow

By default, the last line of a function is the returned value:

my_absolute_value <- function(x){
  if(x > 0){
    x
  } else {
    -x
  }
}

my_absolute_value(-3)
[1] 3
my_absolute_value(3)
[1] 3

Control Flow

Can override with return statement - instantly returns and ‘exits’ function:

say_hello <- function(name, scream=FALSE, quiet=FALSE){
  text <- paste("Hello", name)
  
  if(scream){
    text <- paste(toupper(text), "!!!")
  }
  
  if(quiet){return(text)} # Stop here if quiet and skip print
  
  print(text)
}
say_hello("Michael")
[1] "Hello Michael"
say_hello("Michael", scream=TRUE)
[1] "HELLO MICHAEL !!!"
say_hello("Michael", quiet=TRUE)
[1] "Hello Michael"

Control Flow

Overly complicated \(\text{sign}(x)\) function:

sign <- function(x){
  if(x > 0){
    1
  } else {
    if(x < 0){
      -1
    } else {
      0
    }
  }
}

Control Flow

Somewhat better \(\text{sign}(x)\) function:

sign <- function(x){
  if(x > 0){
    1
  } else if(x < 0){
      -1
  } else {
      0
  }
}

Control Flow

Decent \(\text{sign}(x)\) function:

sign <- function(x){
  if(x > 0) return(1) # Use 'return' to stop function here
  if(x < 0) return(-1)
  return(0)
}

Would be even better to vectorize

Optional Review

Programming exercises to practice these concepts

Proposal Presentations

Presentation Order

Presentation Number Team
1 MUO+KN+CM+ID+KM
2 JE+JABB+MTP+JA+AS
3 HHS+KK+FC+DN
4 XC+ML+ER+RJSN
5 LR+MOG+APTL+TN

Wrap-Up

Orientation

  • Communicating Results (quarto) ✅
  • R Basics ✅
  • Data Manipulation in R
  • Data Visualization in R
  • Getting Data into R
  • Statistical Modeling in R

Next Time

Data Frames:

  • Organizing several ‘connected’ vectors into a table
  • Table operations with dplyr

Upcoming Work

Upcoming work from course calendar

Life Tip of the Week

Making the most of Amazon

  1. Free Trial and Discounted Rate Amazon Prime for Students
  2. Amazon Prime Visa by Chase: No annual fee + 5% cash back (or more) on all Amazon/Whole Foods/Chase Travel + 2% on gas, restaurants, and transit
  3. Camel Camel Camel
    • Price history for all Amazon items (see if you’re getting a good deal)
    • Price drop alert emails (get custom messages when an item goes on sale)

Musical Treat

Live Video

Optional Review

Exercise #01

Q: Write a function f that does the following:

f(c(1, 2, 3))
The vector has 3 elements and is of type numeric
f(c("a", "b", "c"))
The vector has 3 elements and is of type character
f(1:5)
The vector has 5 elements and is of type integer

Hint: Use the cat function to print to screen.

f <- function(x){
  cat("The vector has", length(x), "elements and is of type", class(x))
}

Exercise #02

Q: Write a vectorized function to tell if numbers are even.

is_even(3)
[1] FALSE
is_even(c(3, 4, 5))
[1] FALSE  TRUE FALSE
is_even(1:10)
 [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE

Hint: Use the %% operator to get remainders from division

is_even <- function(x)  (x %% 2) == 0

Exercise #03

Q: Write a function to count the even elements of a vector:

count_even(3)
[1] 0
count_even(c(3, 4, 5))
[1] 1
count_even(1:10)
[1] 5

Hint: Combine Q2 with the “sum of logical = count” trick.

count_even <- function(x)  sum(is_even(x))

Exercise #04

Q: The seq() function lets us construct sequences. What is the average (mean) of the first 23 odd numbers?

Hint: Read the documentation for ?seq before trying this question

mean(seq(from=1, by=2, length.out=23))
[1] 23

Exercise #05

Q: The alternating harmonic series is defined as:

\[1 - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \frac{1}{5} + \dots\]

Show that this series converges to \(\ln(2)\) by taking a partial sum of the first one million elements.

Hint: Use recycling to get the signs right:

seq(1, 5) * c(1, -1)
[1]  1 -2  3 -4  5
sum(1/seq(1, 1e6) * c(1, -1))
[1] 0.6931467

Compare to:

log(2)
[1] 0.6931472

Exercise #06

Q: Write a function that computes the mean of a vector (Don’t use the built-in mean function)

Hint: Use the sum and length functions

my_mean <- function(x){
  sum(x) / length(x)
}

my_mean(8)
[1] 8
my_mean(1:10)
[1] 5.5

Exercise #07

Q: Write a vectorized function that returns \(\sqrt{x}\) if \(x\) is positive and 0 otherwise.

pos_sqrt(c(-4, -1, 1, 4, 9, -9))
[1] 0 0 1 2 3 0

Hint: The pmax (“parallel max”) function may be useful here:

x <- c(1, 3, 5,  7)
y <- c(5, 3, 10, 1)
pmax(x, y)
[1]  5  3 10  7
pos_sqrt <- function(x) sqrt(pmax(0, x))

Alternative - use the ifelse function for vectorized conditionals (but this has a warning because it tries to do both conditions)

Exercise #08

Q: Write a function that takes in a vector of characters and returns the longest.

long_string(c("a", "bc", "def"))
[1] "def"
long_string(c("My", "name", "is", "Michael"))
[1] "Michael"

Hint: The nchar and which.max functions may be helpful here.

long_string <- function(x)  x[which.max(nchar(x))]

Exercise #09

Q: Write a function that computes the factorial of a value. (Don’t use the built-in factorial function)

Hint: Use the seq and prod functions

my_factorial <- function(n){
  return(prod(seq(1, n)))
}

my_factorial(8)
[1] 40320

Exercise #10

Q: Write a function that computes the factorial of a value using a loop (instead of the prod function).

my_factorial <- function(n){
  fact <- 1
  for(x in seq(1, n)){
    fact <- fact * x
  }
  return(fact)
}

my_factorial(8)
[1] 40320

Exercise #11

Q: Show that: \[\sum_{k=0}^{\infty} \frac{1}{k!} = e\]

Use the built-in factorial function since it is vectorized.

sum(1/factorial(seq(0, 10000)))
[1] 2.718282

Exercise #12

Q: Write a function that takes a vector and returns the maximum element. (Don’t use the built-in max function.)

Hint: You will need to use a loop and a conditional here.

Hint: We started factorial at 1; start max at -Inf (why?)

my_max <- function(x){
  max_val <- -Inf
  for(v in x){
    if(v > max_val) max_val <- v
  }
  return(max_val)
}

my_max(c(1, -1, 3, -4, 10, -3))
[1] 10