STA/OPR 9750 - Mini Projects

In lieu of traditional homework, STA/OPR 9750 has a series of mini-projects designed to achieve several interlocking goals:

  1. Improve your skills at data analysis
  2. Improve your improve your ability to give feedback on data analysis work
  3. Seed a ‘portfolio’ of data science work you can demonstrate to potential employers

Each Mini-Project will be submitted via GitHub, an industry-standard code management platform, as both raw analysis code and as a HTML document hosted on GitHub pages.

After each Mini-Project is submitted, 2-3 peer reviewers will be assigned to give feedback and to assign an initial grade following an instructor provided rubric. This feedback will be given via GitHub Issues.

In order to ensure good peer feedback, the peer feedback will be evaluated by the instructor in a “meta-review” worth a small fraction of the overall grade.

If you believe your mini-project has received inaccurate peer feedback, please contact the instructor directly within 48 hours of the peer feedback deadline. No student-initiated requests for re-grading will be accepted after that time, though the instructor may re-grade the work during the meta-review stage.

Mini-Projects

Mini-Project #00: Course Set-Up

Due Dates:

  • Released to Students: 2024-08-29
  • Initial Submission: 2024-09-11 11:45pm ET
  • Peer Feedback:
    • Peer Feedback Assigned: 2024-09-12
    • Peer Feedback Due: 2024-09-18 11:45pm ET

In the ungraded Mini-Project #00, there is no data analysis required, but you will set up the basic web tooling used to submit projects #01 to #04.

Note that, even though ungraded, Mini-Project #00 must be completed to remain enrolled in this course and before any other Mini-Projects can be submitted.

Mini-Project #01: Fiscal Characteristics of Major US Public Transit Systems

Due Dates:

  • Released to Students: 2024-09-12
  • Initial Submission: 2024-09-25 11:45pm ET
  • Peer Feedback:
    • Peer Feedback Assigned: 2024-09-26
    • Peer Feedback Due: 2024-10-02 11:45pm ET

In Mini-Project #01, students will investigate the fiscal characteristics of US public transit authorities. In this project, I handle the data import and tidying; students are mainly responsible for “single table” dplyr operations (mutate, group_by, summarize, select, arrange, rename) to produce summary statistics.

Mini-Project #02: Business of Show Business

Due Dates:

  • Released to Students: 2024-09-26
  • Initial Submission: 2024-10-23 11:45pm ET
  • Peer Feedback:
    • Peer Feedback Assigned: 2024-10-24
    • Peer Feedback Due: 2024-10-30 11:45pm ET

In Mini-Project #02, students will act as Hollywood development executives, diving deep into Hollywood history to develop a pitch for a new movie. Students develop their skills in working with large (\(\approx 20\) GB) data and in working with relational data structures (joins and their kin). This project uses the IMDb Non-Commerical Data Release.

Mini-Project #03: Exploring the Effect of State-Level Electoral College Vote Allocation on Presidential Elections

Due Dates:

  • Released to Students: 2024-10-24
  • Initial Submission: 2024-11-13 11:45pm ET
  • Peer Feedback:
    • Peer Feedback Assigned: 2024-11-14
    • Peer Feedback Due: 2024-11-20 11:45pm ET

In Mini-Project #03, students will analyze the effect of the US Electoral College on US Presidential elections, with an eye towards “fairness” and proportionality. They will compare different ways that states allocate their Electoral College Votes to determine whether this has any systematic effect on who is elected the President of these United States.

Students will gain experience visualizing spatial data through the use of “red/blue electoral maps”, working with official electoral records, and systematically accessing public data repositories, such as those of the US Census Bureau. This project uses data from the MIT Election Lab and from the Census Bureau’s TIGER repository.

Mini-Project #04: Monte Carlo-Informed Selection of CUNY Retirement Plans

Due Dates:

  • Released to Students: 2024-11-14
  • Initial Submission: 2024-12-04 11:45pm ET
  • Peer Feedback:
    • Peer Feedback Assigned: 2024-12-05
    • Peer Feedback Due: 2024-12-11 11:45pm ET

In Mini-Project #04, students will compare two different retirement plans, using simulation to assess the probabilities of various good and bad retirement outcomes. They will use historical financial and macro-economic data to perform a Monte Carlo analysis of two complex financial instruments. Based on the outputs of the Monte Carlo simulations, students will compare these two retirement plans on a variety of important metrics.

Students will gain experience accessing data from real commercial and government APIs, implementing complex financial simulations, and applying Monte Carlo / bootstrap inference techniques to quantify uncertainty in their predictions. As an additional challenge, students may explore technologies used to create interactive dashboards, allowing for more dynamic and nuanced analysis of simulation results. While the focus of this project is a comparison of two specific CUNY retirement plans, the skills developed here can be applied to a wide variety of personal and commerical financial decisions.