STA/OPR 9750 - Week 2 Update

Michael Weylandt

STA/OPR 9750 Week 2 Update

  • Weekly feature
  • Brief updates and reminders about course logistics
  • Syllabus and Brightspace are binding
    • If something is left out of here, it still happens!

Course Administration: Google Calendar Help

Thanks to WP (Piazza #15) for delving into Google Calendar formatting

I’ve updated the course homepage to provide a CSV file with all course deadlines suitable for import to Google Calendar.

Course Administration: Course Project Description Released

Course project description is now online

Detailed discussion of:

  • Project structure
  • Key deadlines
  • Grading rubrics

First step: by October 2nd, email me your group members.

Course Enrollment

Total enrollment: 46 = 33 (STA) + 13 (OPR)

  • Courses \(\geq\) 46 are ‘doubles’ in Zicklin
    • Allows additional instructional resources
    • Thanks to (unknown) #46
  • I expect \(\approx 12\) final project teams

Graduate Teaching Assistant (GTA)

  • GTA hiring in process - name TBA
  • GTA to hold weekly virtual office hours (likely 2x)

Piazza

  • 28 sign-ups: 18 still need to sign up
  • Good discussion of GitHub security problems
  • 4 final project teammate searches underway
    • Great tool!
    • Say what you’re interested in as a project topic to help coordination

Pre-Assignments

Pre-Assignment 02: - 40 / 46 submitted - Ignore “5/10 grading” - limitation of Brightspace - I manually adjust

Pre-Assignment 03: - Before midnight: Available on website + Brightspace

Mini-Project 00

  • 7 submissions via Piazza (thank you - I will respond soon)
  • Due in (slightly less than) a week
  • Possible tech issues, so start early

Course Bot

Because this course is a double, I have resources to create a “bot” to help with course organization on GitHub.

Bot will begin to acknowledge completed MP#00 over the weekend.

Current name: CISSOID: CIS and Statistics bot for Organizing Instructional Delivery

  • Particular mathematical shape
  • Name means “Ivy-like”:
    • Use technology to overcome resource limitations of CUNY

Better name suggestions very welcome!

Today

  • Review of Questions from Pre-Assign #02
  • Course Project Overview
  • Introduction to Markdown and Quarto
  • Introduction to GitHub pages
  • How to ask for help

FAQs from PA#02

Q1

What is Markdown?

Per Wikipedia: “Markdown is a light-weight, plain-text, markup language specification”

  • Light-weight: relatively simple, focus on content than formatting
  • Plain-text: accessible using almost any text editor (RStudio, GitHub, VS Code, etc)
    • Not locked into specific software (e.g., MS Word)
    • Easily incorporated into a variety of technologies

Q1

What is Markdown?

  • Markup language: a ‘mini-coding language’ for text documents
    • Other famous examples: HTML, XML
  • Specification:
    • CommonMark defines ‘standard’ Markdown
    • Some software allows extensions
    • Pandoc often powers under the hood

Q2

Other than text formatting, does Markdown ha[ve] any other us[]es?

On its own, Markdown is just text formatting (but that’s a lot!)

We will use Quarto which augments markdown for reproducible research. We can embed code-and its output-inside Markdown documents.

Q3

[W]hat documents use[] Markdown?

So much! Markdown is used by Bitbucket, GitHub, OpenStreetMap, Reddit, Stack Exchange, Drupal, ChatGPT, Discord, MS Teams and many more!

With tools like Pandoc/Quarto, Markdown can be rendered to:

  • HTML
  • PDF
  • Web Slides
  • EBooks
  • Research Papers
  • Word Documents
  • PowerPoint slides
  • and so much more!

Q4

[What is] the difference between [a] Code section and [a] Nested List[? A]re they just different ways of indenting?

No. Nested lists are ‘just’ text

Code formatting enables much more if rendering engine supports it:

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()

Q5

[H]ow are we going to use Markdown?

All written work (mini-projects and final project) in this course will be submitted using Markdown (by way of Quarto).

Specifically: - Submission pages for 5 mini-projects - Individual reports for course project - Summary (team) report for final project

You are also encouraged (but not required) to use Markdown for presentation slides (like these!)

Q6

How can I create Tables in Markdown?

Markdown has two table syntaxes:

  • an easy one with minimal control
  • a hard one which allows fine grained control (alignment, column widths, etc.)

If you are using the advanced (“pipe table”) synatx, I suggest you use RStudio’s Visual editor mode. (DEMO!)

Q7

How to create images and links?

Basic hyperlinks look like this:

[link text](https://the.url/goes/here)

If you want to embed the contents of a link, prepend it with an exclamation point. This is most useful for images:

![Image Caption](https://the.url/goes/here.png)

You can even put a link inside an image to be fancy:

[![Elephant](elephant.png)](https://en.wikipedia.org/wiki/Elephant)

Q7

How to create images and links?

Quarto automatically embeds the results of plotting code:

plot(1:5, main="Behold, a Plot!", col=2:6, cex=5, 
     pch=16, xlab="", cex.main=5)

Here, Quarto handles all the file creation and link targeting for us. If I change the code, the figure will change automatically.

Course Project Overview

Course Project

  • Teams: 3-5 classmates (either section)
  • Stages:
    • Proposal (in class presentation)
    • Mid-semester check-in (in class presentation)
    • Final: in class presentation, individual report, summary report
  • Structure:
    • Shared “Overarching Question”
    • Individual “Specific Question”

Full description online

Finding Data

Presentation Hints

  • Longest time \(\neq\) most important
  • Story, story, story! Why are you making these choices?
  • Hourglass Structure
    • Start big
    • Motivate your overarching question
    • Specific questions
    • Tie specific to overarching
    • From overarching back to big motivation
  • No less than one figure every other slide

Markdown and Quarto

Markdown and Quarto

  • Quarto implements Markdown with data-analytic extensions
  • Seamless (ideally!) integration of code and text
  • No more copy and paste

Quarto user guide is fantastic!

See also source for course materials.

Lab Activity: Part 0

If you haven’t already, install Quarto.

Lab Activity: Part 1

Create a simple PDF quarto document using the RStudio wizard.

(Note that you may need to install tinytex for this to work properly, but Quarto should install it for you automatically.)

Lab Activity: Part 2

Create a 5 slide presentation showing the Houston housing market. This should include:

  • A title slide
  • Three body slides with a figure and some text
  • A conclusion slide

You may use the following code snippets:

if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
txhousing |> filter(city=="Houston") |> 
             group_by(year) |> 
             summarize(sales=sum(sales)) |> 
             ggplot(aes(x=year, y=sales)) + 
                geom_line() + 
                ggtitle("Annual Houses Sold in Houston, TX")

Recall that this code needs to be between three backticks on each end (and start with r in curly braces as well.)

if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
txhousing |> filter(city=="Houston") |> 
    group_by(month) |> 
    summarize(avg_price=sum(volume) / sum(sales)) |> 
    mutate(month=factor(month.abb[month], levels=month.abb, ordered=TRUE)) |>
    ggplot(aes(x=month, y=avg_price)) + 
    geom_bar(stat="identity") + 
    ggtitle("Average Price of Houses Sold in Texas by Month") + 
    xlab("Month") + 
    ylab("Average Sale Price") + 
    scale_y_continuous(labels = scales::dollar)

Recall that this code needs to be between three backticks on each end (and start with r in curly braces as well.)

if(!require("tidyverse")) install.packages("tidyverse")
library(tidyverse)
txhousing |> filter(year==2015) |> 
    group_by(city) |> 
    summarize(avg_price=sum(volume) / sum(sales),
              num_sales=sum(sales)) |> 
    slice_max(num_sales, n=10) |>
    ggplot(aes(x=city, y=avg_price)) + 
    geom_bar(stat="identity") + 
    ggtitle("Average Price of Houses Sold in 2015 by City in Texas") + 
    xlab("City") + 
    ylab("Average Sale Price") + 
    scale_y_continuous(labels = scales::dollar)

Recall that this code needs to be between three backticks on each end (and start with r in curly braces as well.)

Lab Activity: Part 3

View the Quarto Demo Slides and add one new element to your slides from the previous section.

GitHub Pages

GitHub Pages

In-class discussion of what a static web page is and the role of GitHub Pages as a static web server.

How to Ask for Help

How to Ask for Help

Professional programming is at least half looking things up; at beginning stages, the fraction is even higher.

So it’s important to know how to see help the smart way:

  1. Official documentation. Free software almost never becomes famous without great documentation: R and its packages are no exception. Everything we will use in this class has solid documentation.
  1. Search Engine.

Most programming challenges have been faced by somebody before, so Google it!

Tips:

  • Include R or rstats in your search query
  • It’s better to search what you want to do rather than how you think you should do it.
  • Search programming Q&A sites like StackOverflow for specific code questions; blogs and course materials are better for “big picture” questions
  1. Ask on a Forum with a Reproducible Example

Programming fora, like StackOverflow, are full of great resources. Most of what you need is already there. But if you need to ask a new question, make sure to create a minimal reproducible example

Make it easy for your helper to help you.

  • Minimal: narrow down to as few lines of code as possible
  • Reproducible: self-contained without dependencies on libraries (if can be avoided); load all packages needed; use standard data

Pro-Tip: You’ll solve over 50% of your problems in trying to create an MRE.

Tips:

  • Show the code, even if it doesn’t work
  • Send code as text, not screenshot (so your helper can run it)
  • Smaller examples help narrow down problems
  • Avoid IO (file input and output) unless specifically relevant to problem
  • Remove everything you can

The reprex R package helps with this: see this talk.

For this class, rely on Piazza!