STA 9750 - Week 2

Michael Weylandt

STA 9750 Week 2 Update

Today: 2025-02-06

  • Weekly feature
  • Brief updates and reminders about course logistics
  • Syllabus and Brightspace are binding
    • If something is left out of here, it still happens!

Course Enrollment

Final enrollment: 28

  • \(\approx 8\) final project teams (3-4 each)
  • Approx 4 MPs to review per peer-feedback cycle

Course Administration: Course Project Draft Released

Course project draft description is now online

Detailed discussion of:

  • Project structure
  • Key deadlines
  • Grading rubrics

Will be finalized next week - 2025-02-13.

Please send me questions in advance!

First step: by 2025-03-05, email me your group members.

Graduate Teaching Assistant (GTA)

No GTA this semester.

Piazza

  • 11 sign-ups: 17 still need to sign up
  • Thank you for those of you who already posted questions!
  • Post #05 - Search for Teammates

Instructor Tip: Before committing a team with someone, you can look up their GitHub and see how they did on MP#00 and MP#01. This might be helpful to find teammates whose standards are calibrated to your own.

Special thanks to JLL for finding and reporting issues with MP#00 instructions

Pre-Assignments

Pre-Assignment 02:

  • 20 / 28 submitted
  • Ignore Brightspace’s Grading
    • Brightspace automatically marks all short answers as wrong
    • Look at “Grades” tab to see actual sub-totals (12.5 / 100 = 1/8 = full credit on 1 of 8 graded PAs)
  • I will often give feedback through Brightspace, so make sure to go back and see if I’ve left you any comments. (I might not give comments on every question though.)

Pre-Assignment 03:

  • Before midnight nextweek :
  • Available on course website + Brightspace after 9pm

Mini-Project #00

Mini-Project #00

  • Due in (slightly less than) a week
  • Possible tech issues, so start early
  • 1 GitHub tag + 0 Piazza Messages so far

Verification of Enrollment - Required to stay enrolled in class

Today

  • Review of Questions from Pre-Assign #02
  • Introduction to Markdown and Quarto
  • Introduction to GitHub pages
  • How to ask for help
  • Interactive Use of R and RStudio

FAQs from PA#02

Q1

What is Markdown?

Per Wikipedia: “Markdown is a light-weight, plain-text, markup language specification”

  • Light-weight: relatively simple, focus on content than formatting
  • Plain-text: accessible using almost any text editor (RStudio, GitHub, VS Code, etc)
    • Not locked into specific software (e.g., MS Word)
    • Easily incorporated into a variety of technologies

Q1

What is Markdown?

  • Markup language: a ‘mini-coding language’ for text documents
    • Other famous examples: HTML, XML
  • Specification:
    • CommonMark defines ‘standard’ Markdown
    • Some software allows extensions
    • Pandoc often powers under the hood

Q2

Other than text formatting, does Markdown ha[ve] any other us[]es?

On its own, Markdown is just text formatting (but that’s a lot!)

We will use Quarto which augments markdown for reproducible research. We can embed code-and its output-inside Markdown documents.

Q3

[W]hat documents use[] Markdown?

So much! Markdown is used by Bitbucket, GitHub, OpenStreetMap, Reddit, Stack Exchange, Drupal, ChatGPT, Discord, MS Teams and many more!

With tools like Pandoc/Quarto, Markdown can be rendered to:

  • HTML
  • PDF
  • Web Slides
  • EBooks
  • Research Papers
  • Word Documents
  • PowerPoint slides
  • and so much more!

Q4

[What is] the difference between [a] Code section and [a] Nested List[? A]re they just different ways of indenting?

No. Nested lists are ‘just’ text

Code formatting enables much more if rendering engine supports it:

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()

Q5

[H]ow are we going to use Markdown?

All written work (mini-projects and final project) in this course will be submitted using Markdown (by way of Quarto).

Specifically:

  • Submission pages for 5 mini-projects
  • Individual reports for course project
  • Summary (team) report for final project

You are also encouraged (but not required) to use Markdown for presentation slides (like these!)

Q6

How can I create Tables in Markdown?

Markdown has two table syntaxes:

  • an easy one with minimal control
  • a hard one which allows fine grained control (alignment, column widths, etc.) - “pipe tables”

If you are making complex tables, I recommend using the list-table extension.

(See syllabus.qmd in course repo for examples.)

Q7

How to create images and links?

Basic hyperlinks look like this:

[link text](https://the.url/goes/here)

If you want to embed the contents of a link, prepend it with an exclamation point. This is most useful for images:

![Image Caption](https://the.url/goes/here.png)

You can even put a link inside an image to be fancy:

[![Elephant](elephant.png)](https://en.wikipedia.org/wiki/Elephant)

Q7

How to create images and links?

Quarto automatically embeds the results of plotting code:

plot(1:5, main="Behold, a Plot!", col=2:6, cex=5, 
     pch=16, xlab="", cex.main=5)

Here, Quarto handles all the file creation and link targeting for us. If I change the code, the figure will change automatically.

Introduction to R REPL and RStudio

Life on the Command Line

R REPL

Terminal

RStudio - A Useful IDE

Official Cheat Sheets:

Data Camp RStudio Tutorial (Free)

  • For today, first ~half

Markdown and Quarto

Markdown and Quarto

  • Quarto implements Markdown with data-analytic extensions
  • Seamless (ideally!) integration of code and text
  • No more copy and paste

Quarto user guide is fantastic!

See also source for course materials.

Lab Activity: Part 0

If you haven’t already, install Quarto.

Lab Activity: Part 1

Create a simple PDF quarto document using the RStudio wizard.

(Note that you may need to install tinytex for this to work properly, but Quarto should install it for you automatically.)

Lab Activity: Part 2

Create a 5 slide presentation showing the Houston housing market. This should include:

  • A title slide
  • Three body slides with a figure and some text
  • A conclusion slide

You may use the following code snippets:

if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
txhousing |> filter(city=="Houston") |> 
             group_by(year) |> 
             summarize(sales=sum(sales)) |> 
             ggplot(aes(x=year, y=sales)) + 
                geom_line() + 
                ggtitle("Annual Houses Sold in Houston, TX")

Recall that this code needs to be between three backticks on each end (and start with r in curly braces as well.)

if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
txhousing |> filter(city=="Houston") |> 
    group_by(month) |> 
    summarize(avg_price=sum(volume) / sum(sales)) |> 
    mutate(month=factor(month.abb[month], 
                 levels=month.abb, ordered=TRUE)) |>
    ggplot(aes(x=month, y=avg_price)) + 
    geom_bar(stat="identity") + 
    ggtitle("Average Price of Houses Sold in Texas by Month") + 
    xlab("Month") + 
    ylab("Average Sale Price") + 
    scale_y_continuous(labels = scales::dollar)

Recall that this code needs to be between three backticks on each end (and start with r in curly braces as well.)

if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
txhousing |> filter(year==2015) |> 
    group_by(city) |> 
    summarize(avg_price=sum(volume) / sum(sales),
              num_sales=sum(sales)) |> 
    slice_max(num_sales, n=10) |>
    ggplot(aes(x=city, y=avg_price)) + 
    geom_bar(stat="identity") + 
    ggtitle("Average Price of Houses Sold in 2015 by City in Texas") + 
    xlab("City") + 
    ylab("Average Sale Price") + 
    scale_y_continuous(labels = scales::dollar)

Recall that this code needs to be between three backticks on each end (and start with r in curly braces as well.)

Lab Activity: Part 3

View the Quarto Demo Slides and add one new element to your slides from the previous section.

GitHub Pages

GitHub Pages

In-class discussion of what a static web page is and the role of GitHub Pages as a static web server.

How to Ask for Help

How to Ask for Help

Professional programming is at least half looking things up; at beginning stages, the fraction is even higher.

So it’s important to know how to see help the smart way:

  1. Official documentation. Free software almost never becomes famous without great documentation: R and its packages are no exception. Everything we will use in this class has solid documentation.
  1. Search Engine.

Most programming challenges have been faced by somebody before, so Google it!

Tips:

  • Include R or rstats in your search query
  • It’s better to search what you want to do rather than how you think you should do it.
  • Search programming Q&A sites like StackOverflow for specific code questions; blogs and course materials are better for “big picture” questions
  1. Ask on a Forum with a Reproducible Example

Programming fora, like StackOverflow, are full of great resources. Most of what you need is already there. But if you need to ask a new question, make sure to create a minimal reproducible example

Make it easy for your helper to help you.

  • Minimal: narrow down to as few lines of code as possible
  • Reproducible: self-contained without dependencies on libraries (if can be avoided); load all packages needed; use standard data

Pro-Tip: You’ll solve over 50% of your problems in trying to create an MRE.

Tips:

  • Show the code, even if it doesn’t work
  • Send code as text, not screenshot (so your helper can run it)
  • Smaller examples help narrow down problems
  • Avoid IO (file input and output) unless specifically relevant to problem
  • Remove everything you can

The reprex R package helps with this: see this talk.

For this class, rely on Piazza!

After Class

Next Week

Looking Ahead

Course Project:

  • Start looking for teammates and topics
  • No in-person office hours on 2025-02-18 (MW on travel)

Life Tip of the Week

It’s time to start preparing your taxes. (I know, I know …)

  • Preparing is not the same as filing
    • Preparing is doing the calculations
    • Filing is submitting to IRS
  • Employers and financial institutions should be sending you documents (W2, 1099, etc.)
    • Easier to use them now so you don’t lose them
  • Benefits of starting early:
    • If you get a refund, great!
    • If you owe money, avoid nasty surprise.
  • You can still make certain 2024 tax moves and get the tax benefit(IRA, HSA, etc.)

If your income is less than ~$98K single or ~$113K married, the IRS FreeFile program means you can use TaxAct, etc. for free.1