STA 9750 Mini-Project #04: TBD

\[\newcommand{\P}{\mathbb{P}} \newcommand{\E}{\mathbb{E}}\]

Due Dates

Released to Students: 2025-11-01
Initial Submission: 2025-12-05 11:59pm ET on GitHub and Brightspace
Peer Feedback:
- Peer Feedback Assigned: 2025-12-08 on GitHub
- Peer Feedback Due: 2025-12-19 11:59pm ET on GitHub

Estimated Time to Complete: 5 Hours

Estimated Time for Peer Feedback: 1 Hour

Introduction

Welcome to Mini-Project #04! TBD.

Note that, compared to previous mini-projects, the scope of this project is relatively smaller: in light of this, and the more advanced skills you have spent the past 3 months developing, this mini-project should be the least difficult of the course. At this point in the course, you should be spending the majority of your out-of-class hours on your Course Project.

This mini-project completes our whirlwind tour of several different forms of data-driven writing:

There are, of course, many other ways that data can be used to generate and communicate insights, but hopefully this “hit parade” has exposed you to many of the ways that you can use data to evaluate complex qualitative and quantitative claims outside of a binary classroom “correct/incorrect” structure. The tools of quantitative analysis and communication you have developed in this course can be used in essentially infinite contexts– we have only scratched the surface–and I’m excited to see what you do in the remainder of this course, in your remaining time at Baruch, and in your future careers.

Student Responsbilities

Recall our basic analytic workflow and table of student responsibilities:

Data Ingest and Cleaning: Given a single data source, read it into R and transform it to a reasonably useful standardized format.
Data Combination and Alignment: Combine multiple data sources to enable insights not possible from a single source.
Descriptive Statistical Analysis: Take a data table and compute informative summary statistics from both the entire population and relevant subgroups
Data Visualization: Generate insightful data visualizations to spur insights not attainable from point statistics
Inferential Statistical Analysis and Modeling: Develop relevant predictive models and statistical analyses to generate insights about the underlying population and not simply the data at hand.

Students’ Responsibilities in Mini-Project Analyses
	Ingest and Cleaning	Combination and Alignment	Descriptive Statistical Analysis	Visualization
Mini-Project #01			✓
Mini-Project #02		✓	✓	½
Mini-Project #03	½	✓	✓	✓
Mini-Project #04	✓	✓	✓	✓

In this mini-project, you are in charge of the whole pipeline, from TBD to TBD. The rubric below evaluates your work on all aspects of this project.

Rubric

STA 9750 Mini-Projects are evaluated using peer grading with meta-review by the course staff. Specifically, variants of the following rubric will be used for the mini-projects:

Mini-Project Grading Rubric
Course Element	Excellent (9-10)	Great (7-8)	Good (5-6)	Adequate (3-4)	Needs Improvement (1-2)	Extra Credit
Written Communication	Report is well-written and flows naturally. Motivation for key steps is clearly explained to reader without excessive detail. Key findings are highlighted and appropriately given context.	Report has no grammatical or writing issues. Writing is accessible and flows naturally. Key findings are highlighted, but lack suitable motivation and context.	Report has no grammatical or writing issues. Key findings are present but insufficiently highlighted.	Writing is intelligible, but has some grammatical errors. Key findings are obscured.	Report exhibits significant weakness in written communication. Key points are difficult to discern.	Report includes extra context beyond instructor provided information.
Project Skeleton	Code completes all instructor-provided tasks correctly. Responses to open-ended tasks are particularly insightful and creative.	Code completes all instructor-provided tasks satisfactorially.	Response to one instructor provided task is skipped, incorrect, or otherwise incomplete.	Responses to two instructor provided tasks are skipped, incorrect, or otherwise incomplete.	Response to three or ore instructor provided tasks are skipped, incorrect, or otherwise incomplete.	Report exhibits particularly creative insights drawn from thorough student-initiated analyses.
Formatting & Display	Tables and figures are full ‘publication-quality’. Report includes at least one animated visualization designed to effectively communicate findings.	Tables have well-formatted column names, suitable numbers of digits, and attractive presentation. Figures are ‘publication-quality’, with suitable axis labels, well-chosen structure, attractive color schemes, titles, subtitles, and captions, etc.	Tables are well-formatted, but still have room for improvement. Figures are above ‘exploratory-quality’, but do not reach full ‘publication-quality’.	Tables lack significant ‘polish’ and need improvement in substance (filtering and down-selecting of presented data) or style. Figures are suitable to support claims made, but are ‘exploratory-quality’, reflecting minimal effort to customize and ‘polish’ beyond `ggplot2` defaults.	Unfiltered ‘data dump’ instead of curated table. Baseline figures that do not fully support claims made.	Report includes interactive (not just animated) visual elements.
Code Quality	Code is (near) flawless. Code passes all `styler` and `lintr` type analyses without issue.	Comments give context of the analysis, not simply defining functions used in a particular line.	Code has well-chosen variable names and basic comments.	Code executes properly, but is difficult to read.	Code fails to execute properly.	Code takes advantage of advanced `Quarto` features to improve presentation of results.
Data Preparation	Data import is fully-automated and efficient, taking care to only download from web-sources if not available locally.	Data is imported and prepared effectively, in an automated fashion with minimal hard-coding of URLs and file paths.	Data is imported and prepared effectively, though source and destination file names are hard-coded.	Data is imported in a manner likely to have errors.	Data is hard-coded and not imported from an external source.	Report uses additional data sources in a way that creates novel insights.

Note that this rubric is designed with copious opportunities for extra credit if students go above and beyond the instructor-provided scaffolding. Students pursuing careers in data analytics are strongly encouraged to go beyond the strict ambit of the mini-projects to i) further refine their skills; ii) learn additional techniques that can be used in the final course project; and iii) develop a more impressive professional portfolio.

Because students are encouraged to use STA 9750 mini-projects as the basis for a professional portfolio, the basic skeleton of each project will be released under a fairly permissive usage license. Take advantage of it!

Submission Instructions

After completing the analysis, write up your findings, showing all of your code, using a dynamic quarto document and post it to your course repository. The qmd file should be named mp04.qmd (lower case!) so the rendered document can be found at docs/mp04.html in the student’s repository and will be served at the URL:¹

https://<GITHUB_ID>.github.io/STA9750-2025-FALL/mp04.html

You can use the helper function mp_start available at in the Course Helper Functions to create a file with the appropriate name and some meta-data already included. Do so by running the following command at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_start(N=04)

After completing this mini-project, upload your rendered output and necessary ancillary files to GitHub to make sure your site works. The mp_submission_ready function in the Course Helper Functions can perform some of these checks automatically. You can run this function by running the following commands at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_submission_ready(N=04)

Once you confirm this website works (substituting <GITHUB_ID> for the actual GitHub username provided to the professor in MP#00 of course), open a GitHub issue on the instructor’s repository to submit your completed work.

The easiest way to do so is by use of the mp_submission_create function in the Course Helper Functions, which can be used by running the following command at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_submission_create(N=04)

Alternatively, if you wish to submit manually, open a new issue at

https://github.com/michaelweylandt/STA9750-2025-FALL/issues/new .

Title the issue STA 9750 <GITHUB_ID> MiniProject #04 and fill in the following text for the issue:

Hi @michaelweylandt!

I've uploaded my work for MiniProject #**04** - check it out!

<https://<GITHUB_ID>.github.io/STA9750-2025-FALL/mp04.html>

Once the submission deadline passes, the instructor will tag classmates for peer feedback in this issue thread.

Additionally, a PDF export of this report should be submitted on Brightspace. To create a PDF from the uploaded report, simply use your browser’s ‘Print to PDF’ functionality.

NB: The analysis outline below specifies key tasks you need to perform within your write up. Your peer evaluators will check that you complete these. You are encouraged to do extra analysis, but the bolded Tasks are mandatory.

NB: Your final submission should look like a report, not simply a list of facts answering questions. Add introductions, conclusions, and your own commentary. You should be practicing both raw coding skills and written communication in all mini-projects. There is little value in data points stated without context or motivation.

Mini-Project #04: TBD

Extra Credit Opportunities

Footnotes

Throughout this section, replace <GITHUB_ID> with your GitHub ID from Mini-Project #00, making sure to remove the angle brackets. Note that the automated course infrastructure will be looking for precise formatting, so follow these instructions closely.↩︎