Congratulations! Done with Mini-Projects!
Peer reviews to be assigned by EoW
Due at midnight tonight - take a moment to do it now if you haven’t already!
Done with pre-assignments!
This is in addition to the Baruch central course assesments.
Used to improve future course offerings. This semester:
Returned:
We owe you:
December 12 - Last Day of Class!
Review the Rubric!
Non-Technical Presentation - Think of yourself as a “consultant” asked by a client to investigate a topic.
Example Presentation Structure:
Example Presentation Structure (continued):
Group and Individual Reports
Deadline extended to the day of the ‘final’
Registrar’s office has not released Final Exam schedule … grumble, grumble
Tentatively: December 19th
No late work accepted (I have to submit grades!)
On Brightspace, I have opened an additional quiz for peer evaluation of your teammates.
Please submit a copy for each of your teammates.
If you don’t submit these, you will receive a 0 for your peer evaluations
No late work accepted (I have to submit grades!)
Rubric is set high to give me flexibility to reward teams that take on big challenges
Hard rubric => Grades are curved generously
Multiple paths to success
If your problem is “easy” on an element (data import in particular), that’s great! Don’t spend the effort over-complicating things. Effort is better spent elsewhere
tidymodels
Adapted from (Case Study)[https://www.tidymodels.org/start/case-study/]
Order | Team | Order | Team | |
---|---|---|---|---|
1 | Rat Pack | 6 | Ca$h VZ | |
2 | Subway Surfers | 7 | Listing Legends | |
3 | Chart Toppers | 8 | TDSSG | |
4 | Metro Mindset | 9 | Broker T’s | |
5 | Apple Watch | 10 | EVengers |
tidymodels
Strength of R
:
Weakness of R
:
No two modeling packages have exactly the same API. Makes changing between interfaces cumbersome
tidymodels
tidymodels
attemps to provide a uniform interface to a wide variety of predictive Machine Learning tools
Advantages:
Disadvantages:
I have dedicated my academic life to the differences in these methods, but 99% of the time, “black-box” prediction is good enough. In STA 9890, we get into the weeds - not here.
Statistics / Data Science:
Machine Learning:
Validation based on of-of-sample or test predictions
How to check whether a model predicts well?
Need more data! But where to get more data?
Today, we’ll primarily use a combination: Test/Train split & Cross-Validation!
Cross-Validation is done on the estimator, not the fitted algorithm
tidymodels
tidymodels
workflow:
tidymodels
is very punny, so a bit hard to tell which step is which…
holidays <- c("AllSouls", "AshWednesday", "ChristmasEve", "Easter",
"ChristmasDay", "GoodFriday", "NewYearsDay", "PalmSunday")
recipe <-
recipe(children ~ ., data = hotel_other) |>
step_date(arrival_date) |>
step_holiday(arrival_date, holidays = holidays) |>
step_rm(arrival_date) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_predictors())
Find a grid of parameters
Perform CV splits:
Define a workflow:
Fit workflow to a grid of parameters:
Visual examination
Work through the random forest components of https://www.tidymodels.org/start/case-study
You’ll need to work through the data import elements as well
tidymodels
toolshttps://www.tidymodels.org/start/