STA 9750 Mini-Project #03: TBD

Due Dates

Released to Students: 2026-04-02
Initial Submission: 2026-04-24 11:59pm ET on GitHub and Brightspace
Peer Feedback:
- Peer Feedback Assigned: 2026-04-27 on GitHub
- Peer Feedback Due: 2026-05-03 11:59pm ET on GitHub

Estimated Time to Complete: 13-15 Hours

Estimated Time for Peer Feedback: 1 Hour

Introduction

Welcome to Mini-Project #03! TBD

Student Responsibilities

Recall our basic analytic workflow and table of student responsibilities:

Data Ingest and Cleaning: Given a data source, read it into R and transform it to a reasonably useful and standardized (‘tidy’) format.
Data Combination and Alignment: Combine multiple data sources to enable insights not possible from a single source.
Descriptive Statistical Analysis: Take a data table and compute informative summary statistics from both the entire population and relevant subgroups
Data Visualization: Generate insightful data visualizations to spur insights not attainable from point statistics
Inferential Statistical Analysis and Modeling: Develop relevant predictive models and statistical analyses to generate insights about the underlying population and not simply the data at hand.

In this course, our primary focus is on the first four stages: you will take other courses that develop analytical and modeling techniques for a variety of data types. As we progress through the course, you will eventually be responsible for the first four steps. Specifically, you are responsible for the following stages of each mini-project:

Students’ Responsibilities in Mini-Project Analyses
	Ingest and Cleaning	Combination and Alignment	Descriptive Statistical Analysis	Visualization
Mini-Project #01			✓
Mini-Project #02		✓	✓	½
Mini-Project #03	½	✓	✓	✓
Mini-Project #04	✓	✓	✓	✓

In this project, I am no longer providing code to download and read the necessary data files. The data files I have selected for this mini-project are relatively easy to work with and should not provide a significant challenge, particularly after our in-class discussion of Data Import.

Rubric

STA 9750 Mini-Projects are evaluated using peer grading with meta-review by the course GTAs. The following basic rubric will be used for all mini-projects:

Course Element	Excellent (9-10)	Great (7-8)	Good (5-6)	Adequate (3-4)	Needs Improvement (1-2)
Written Communication	Report is very well-written and flows naturally. Motivation for key steps is clearly explained to reader without excessive detail. Key findings are highlighted and appropriately given sufficient context, including reference to related work where appropriate.	Report has no grammatical or writing issues.¹ Writing is accessible and flows naturally. Key findings are highlighted and clearly explained, but lack suitable motivation and context.	Report has no grammatical or writing issues. Key findings are present but insufficiently highlighted or unclearly explained.	Writing is intelligible, but has some grammatical errors. Key findings are difficult to discern.	Report exhibits significant weakness in written communication. Key points are nearly impossible to identify.
Project Skeleton	Code completes all instructor-provided tasks correctly. Responses to open-ended tasks are especially insightful and creative.	Code completes all instructor-provided tasks satisfactorily. Responses to open-ended tasks are insightful, creative, and do not have any minor flaws.	Response to one instructor provided task is skipped, incorrect, or otherwise incomplete. Responses to open-ended tasks are solid and without serious flaws.	Responses to two instructor provided tasks are skipped, incorrect, or otherwise incomplete. Responses to open-ended tasks are acceptable, but have at least one serious flaw.	Response to three or more instructor provided tasks are skipped, incorrect, or otherwise incomplete. Responses to open-ended tasks are seriously lacking.
Tables & Document Presentation	Tables go beyond standard publication-quality formatting, using advanced features like color formatting, interactivity, or embedded visualization.	Tables are well-formatted, with publication-quality selection of data to present, formatting of table contents (e.g., significant figures) and column names.	Tables are well-formatted, but still have room for improvement in one of these categories: subsetting and selection of data to present, formatting of table contents (e.g., significant figures), column names.	Tables lack significant ‘polish’ and need improvement in substance (filtering and down-selecting of presented data) or style. Document is difficult to read due to distracting formatting choices.	Unfiltered ‘data dump’ instead of curated table. Document is illegible at points.
Data Visualization	Figures go beyond standard publication-quality formatting, using advanced features like animation, interactivity, or advanced plot types implemented in `ggplot2` extension packages.	Figures are ‘publication-quality,’ with suitable axis labels, well-chosen structure, attractive color schemes, titles, subtitles, and captions, etc.	Figures are above ‘exploratory-quality’ and reflect a moderate degree of polish, but do not reach full ‘publication-quality’ in one-to-two ways.	Figures are above ‘exploratory-quality’ and reflect a moderate degree of polish, but do not reach full ‘publication-quality’ in three or more distinct ways.	Figures are suitable to support claims made, but are ‘exploratory-quality,’ reflecting zero-to-minimal effort to customize and ‘polish’ beyond `ggplot2` defaults.
Exploratory Data Analysis	Deep and ‘story-telling’ EDA identifying non-obvious patterns that are then used to drive further analysis in support of the project. All patterns and irregularities are noted and well characterized, demonstrating mastery and deep understanding of all data sets used.	Meaningful ‘story-telling’ EDA identifying non-obvious patterns in the data. Major and pinor patterns and irregularities are noted and well characterized at a level sufficient to achieve the goals of the analysis. EDA demonstrates clear understanding of all data sets used.	Extensive EDA that thoroughly explores the data, but lacks narrative and does not deliver a meaningful ‘story’ to the reader. Obvious patterns or irregularities noted and well characterized, but more subtle structure may be overlooked or not fully discussed. EDA demonstrates competence and basic understanding of the data sets used.	Solid EDA that identifies major structure to the data, but does not fully explore all relevant structure. Obvious patterns or irregularities ignored or missed. EDA demonstrates familiarity with high-level structure of the data sets used.	Minimal EDA, covering only standard summary statistics, and providing limited insight into data patterns or irregularities. EDA fails to demonstrate familiarity with even the most basic properties of the data sets being analyzed.
Code Quality	Code is (near) flawless. Intent is clear throughout and all code is efficient, clear, and fully idiomatic. Code passes all `styler` and `lintr` type analyses without issue.	Comments give context and structure of the analysis, not simply defining functions used in a particular line. Intent is clear throughout, but code can be minorly improved in certain sections.	Code has well-chosen variable names and basic comments. Intent is generally clear, though some sections may be messy and code may have serious clarity or efficiency issues.	Code executes properly, but is difficult to read. Intent is generally clear and code is messy or inefficient.	Code fails to execute properly.
Data Preparation	Data import is fully-automated and efficient, taking care to only download from web-sources if not available locally. All data cleaning steps are fully-automated and robustly implemented, yielding a clean data set that can be widely used.	Data is imported and prepared effectively, in an automated fashion with minimal hard-coding of URLs and file paths. Data cleaning is fully-automated and sufficient to address all issues relevant to the analysis at plan.	Data is imported and prepared effectively, though source and destination file names are hard-coded. Data cleaning is rather manual and hard-codes most transformations.	Data is imported in a manner likely to have errors. Data cleaning is insufficient and fails to address clear problems.	Data is hard-coded and not imported from an external source.
Analysis and Findings	Analysis demonstrates uncommon insight and quality, providing unexpected and subtle insights.	Analysis is clear and convincing, leaving essentially no doubts about correctness.	Analysis clearly appears to be correct and passes the “sniff test” for all findings, but a detailed review notes some questions remain unanswered.	Analysis is not clearly flawed at any point and is likely to be within the right order of magnitude for all findings.	Analysis is clearly incorrect in at least one major finding, reporting clearly implausible results that are likely off by an order of magnitude or more.

Note that the “Excellent” category for most elements applies only to truly exceptional “above-and-beyond” work. Most student submissions will likely fall in the “Good” to “Great” range.

At this point, you are responsible for the ‘Data Preparation’ portion of the project, but I am still providing a set of basic EDA activities. Accordingly, reports completing all tasks described under Data Integration and Exploration below should receive an automatic 10/10 for the ‘Exploratory Data Analysis’ rubric element.

Taken together, you are only really responsible for these portions of the rubric in this assignment:

Written Communication
Project Skeleton
Tables & Document Presentation
Data Visualization
Code Quality
Data Preparation
Analysis and Findings

Reports completing all key steps outlined below essentially start with 10 free points.

For this mini-project, no more than 4 total points of extra credit can be be awarded. Opportunities for extra credit exist for students who go above and beyond the instructor-provided scaffolding. Specific opportunities for extra credit can be found below.

Students pursuing careers in data analytics are strongly encouraged to go beyond the strict ambit of the mini-projects to

further refine their skills;
learn additional techniques that can be used in the final course project; and
develop a more impressive professional portfolio.

Because students are encouraged to use STA 9750 mini-projects as the basis for a professional portfolio, the basic skeleton of each project will be released under a fairly permissive usage license. Take advantage of it!

Submission Instructions

After completing the analysis, write up your findings, showing all of your code, using a dynamic quarto document and post it to your course repository. The qmd file should be named mp03.qmd (lower case!) so the rendered document can be found at docs/mp03.html in the student’s repository and will be served at the URL:²

https://YOUR_GITHUB_ID.github.io/STA9750-2026-SPRING/mp03.html

You can use the helper function mp_start available at in the Course Helper Functions to create a file with the appropriate name and some meta-data already included. Do so by running the following command at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_start(N=03)

After completing this mini-project, upload your rendered output and necessary ancillary files to GitHub to make sure your site works. The mp_submission_ready function in the Course Helper Functions can perform some of these checks automatically. You can run this function by running the following commands at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_submission_ready(N=03)

Once you confirm this website works (substituting YOUR_GITHUB_ID for the actual GitHub username provided to the professor in MP#00 of course), open a GitHub issue on the instructor’s repository to submit your completed work.

The easiest way to do so is by use of the mp_submission_create function in the Course Helper Functions, which can be used by running the following command at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_submission_create(N=03)

Alternatively, if you wish to submit manually, open a new issue at

https://github.com/michaelweylandt/STA9750-2026-SPRING/issues/new .

Title the issue STA 9750 YOUR_GITHUB_ID MiniProject #03 and fill in the following text for the issue:

Hi @michaelweylandt!

I've uploaded my work for MiniProject #**03** - check it out!

<https://<GITHUB_ID>.github.io/STA9750-2026-SPRING/mp03.html>

At various points before and after the submission deadline, the instructor will run some automated checks to ensure your submission has all necessary components. Please respond to any issues raised in a timely fashion as failing to address them may lead to a lower set of scores when graded.

Additionally, a PDF export of this report should be submitted on Brightspace. To create a PDF from the uploaded report, simply use your browser’s ‘Print to PDF’ functionality.

NB: The analysis outline below specifies key tasks you need to perform within your write up. Your peer evaluators will check that you complete these. You are encouraged to do extra analysis, but the bolded Tasks are mandatory.

NB: Your final submission should look like a report, not simply a list of facts answering questions. Add introductions, conclusions, and your own commentary. You should be practicing both raw coding skills and written communication in all mini-projects. There is little value in data points stated without context or motivation.

Students pursuing careers in data analytics are strongly encouraged to go beyond the strict ambit of the mini-projects to

further refine their skills;
learn additional techniques that can be used in the final course project; and
develop a more impressive professional portfolio.

Submission Instructions

https://YOUR_GITHUB_ID.github.io/STA9750-2026-SPRING/mp03.html

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_start(N=03)

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_submission_ready(N=03)

The easiest way to do so is by use of the mp_submission_create function in the Course Helper Functions, which can be used by running the following command at the R Console:

source("https://michael-weylandt.com/STA9750/load_helpers.R"); mp_submission_create(N=03)

Alternatively, if you wish to submit manually, open a new issue at

https://github.com/michaelweylandt/STA9750-2026-SPRING/issues/new .

Title the issue STA 9750 YOUR_GITHUB_ID MiniProject #03 and fill in the following text for the issue:

Hi @michaelweylandt!

I've uploaded my work for MiniProject #**03** - check it out!

<https://<GITHUB_ID>.github.io/STA9750-2026-SPRING/mp03.html>

Additionally, a PDF export of this report should be submitted on Brightspace. To create a PDF from the uploaded report, simply use your browser’s ‘Print to PDF’ functionality.

Mini-Project #03: TBD

Data Acquisition

TBD

Data Cleaning and Preparation

TBD

Data Integration and Initial Exploration

TBD

Final Deliverable: TBD

TBD

AI Usage Statement

At the end of your report, you must include a description of the extent to which you used Generative AI tools to complete the mini-project. This should be a one paragraph section clearly deliniated using a collapsable Quarto “Callout Note”.

E.g.,

AI Usage Statement

No Generative AI tools were used to complete this mini-project.

AI Usage Statement

GitHub Co-Pilot Pro was used via RStudio integration while completing this project. No other generative AI tools were used.

AI Usage Statement

ChatGPT was used to help write the code in this project, but all non-code text was generated without the use of any Generative AI tools. Additionally, ChatGPT was used to provide additional background information on the topic and to brainstorm ideas for the final open-ended prompt.

Recall that Generative AI may not be used to write or edit any non-code text in this course.

These blocks can be created using the following syntax:


::: {.callout-note title="AI Usage Statement" collapse="true"}

Your text goes here. 

:::

Please contact the instructor if you have any questions about appropriate AI usage in this course.

Extra Credit Opportunities

There are optional Extra Credit Opportunities where extra points can be awarded for specific additional tasks in this mini-project. The amount of the extra credit is typically not proportional to the work required to complete these tasks, but I provide these for students who want to dive deeper into this project and develop additional data analysis skills not covered in the main part of this mini-project.

For this mini-project, no more than 4 total points of extra credit may be awarded. Even with extra credit, your grade on this mini-project cannot exceed 80 points total.

TBD

Footnotes

This the level of “ChatGPT-level” prose, without obvious flaws but lacking the style and elegance associated with true quality writing.↩︎
Throughout this section, replace YOUR_GITHUB_ID with your GitHub ID from Mini-Project #00. Note that the automated course infrastructure will be looking for precise formatting, so follow these instructions closely.↩︎
Throughout this section, replace YOUR_GITHUB_ID with your GitHub ID from Mini-Project #00. Note that the automated course infrastructure will be looking for precise formatting, so follow these instructions closely.↩︎