STA 9890 - Course Prediction Competition

In lieu of a final exam, STA 9890 has a prediction competition worth approximately one-third of your final grade. For this year’s competition, students will be predicting property valuations in a mid-sized city on the US West Coast.

Use only Provided Information

You may not use any external data beyond what the instructor provides. You will be required to attest to this in your final submission.

Students found to use data outside instructor-provided files will receive a zero on all grades related to the course competition.

The 200 total points associated with this competition will be apportioned as:

100 points: Prediction Accuracy on private leaderboard
50 points: Final Presentation
50 points: Final Report

Data

In the United States, local taxes are typically collected using an ad valorem tax on real property (land and buildings). Ad valorem means that the tax amount is determined as a fixed percentage of the value of the property, almost always taken to be the fair market value.¹ Hence, to determine taxes, government authorities must undertake an annual assessment process.

Typically, assessment procedes as follows:

Early in the year, the local assessment authority estimates the fair market value of each property in the relevant area and informs the owner
The owner then has the option to contest (variously called “grieve”, “protest”, or “challenge”) the assessment in front of a neutral party
The neutral party then assigns an updated valuation to contested properties
The local assessment authority collects unchallenged and new assessment values and uses them to create property tax bills
The assessment authority sends tax bills to the relevant tax collector who then collects taxes from property owners

For large regions, this assessment process typically relies (at least in part) on some sort of automated process. In this task, you will be predicting the (finalized) assessment values of residental properties for 2019. You will be provided with assessment values for 2015-2018 (4 years) as well as various additional information about the underlying property.

While assessment values are typically relatively constant over time (growing at a rate proportional to local housing prices), large assessment shifts can occur if the property is remodeled, rebuilt, or otherwise significantly altered. In the region of interest, assessments are the sum of a building assessment and a land assessment, which may not move in lockstep.

The following files are provided:

building_details_2015.csv: Details of the relevant properties in 2015
building_details_2016.csv: Details of the relevant properties in 2016
building_details_2017.csv: Details of the relevant properties in 2017
building_details_2018.csv: Details of the relevant properties in 2018
building_details_2019.csv: Details of the relevant properties in 2019
assessment_history_train.csv: A subset of realized assessments from 2015 to 2019, along with some additional useful information
assessment_history_test.csv: A subset of realized assessments from 2015 to 2018. Your task is to predict the 2019 assessments for these properties.

Let us review these in some detail:

library(readr); library(dplyr)
glimpse(read_csv("competition_data/building_details_2015.csv.gz"))

Rows: 990,765
Columns: 27
$ acct                <chr> "bb75f25168addc1117840b10c0fd6cd0c2a7b7c6", "5dd76…
$ floor_area_primary  <dbl> 1658, 912, 1496, 1517, 1508, 1670, 1164, 1434, 272…
$ floor_area_upper    <dbl> 879, 0, 0, 1870, 0, 0, 1164, 0, 272, 1176, 272, 27…
$ floor_area_lower    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10…
$ garage_area         <dbl> 0, 0, 0, 529, 420, 0, 0, 0, 380, 0, 380, 380, 0, 0…
$ porch_area          <dbl> 266, 48, 182, 174, 0, 64, 0, 28, 60, 332, 60, 60, …
$ deck_area           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ mobile_home_area    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ floors              <dbl> 2, 2, 1, 2, 1, 1, 2, 1, 3, 2, 3, 3, 3, 2, 1, 1, 3,…
$ half_bath           <dbl> 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1,…
$ full_bath           <dbl> 2, 2, 1, 3, 2, 2, 3, 1, 3, 2, 3, 3, 3, 2, 1, 1, 3,…
$ total_rooms         <dbl> 8, 6, 6, 6, 5, 5, 8, 6, 5, 9, 5, 5, 12, 8, 5, 6, 8…
$ bedrooms            <dbl> 3, 3, 3, 3, 3, 2, 4, 2, 3, 4, 3, 3, 6, 4, 2, 3, 3,…
$ fireplaces          <dbl> 0, 0, 0, 0, 1, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0,…
$ elevator            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ quality             <chr> "B", "D", "D", "B", "C", "D", "B", "D", "B", "B", …
$ quality_description <chr> "Good", "Low", "Low", "Good", "Average", "Low", "G…
$ year_built          <dbl> 2004, 1949, 1917, 2015, 2003, 1940, 2014, 1920, 19…
$ year_remodeled      <dbl> 0, 0, 0, 0, 0, 0, 0, 2004, 0, 1996, 0, 0, 0, NA, 0…
$ building_condition  <chr> "Fair", "Fair", "Poor", "Average", "Fair", "Fair",…
$ foundation_type     <chr> "Slab", "Crawl Space", "Crawl Space", "Slab", "Sla…
$ grade               <chr> "B-", "D-", "D", "B", "C", "D", "B", "D+", "B+", "…
$ has_cooling         <lgl> TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE,…
$ has_heat            <lgl> TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE,…
$ physical_condition  <chr> "Fair", "Fair", "Poor", "Average", "Average", "Fai…
$ exterior_walls      <chr> "Stucco", "Concrete Block", "Concrete Block", "Bri…
$ year                <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 20…

Features here are generally self-explanatory with the exception of acct, a unique identifier I have created to identify assessment parcels. You will use this as the “key” to identify your predictions.

Note that these data are obtained from government records and are likely full of errors and irregularities (e.g., the year_remodeled column has many zeros); part of your task is to deal with this messiness in building your ML pipeline. I will not further clean or prepare this data for you.

The training file looks like:

library(readr); library(dplyr)
glimpse(read_csv("competition_data/assessment_history_train.csv.gz"))

Rows: 628,287
Columns: 37
$ acct                <chr> "44b70ecbb15f5815f76a5982f4a80f4c48cd6958", "a126c…
$ building_area_2015  <dbl> 1853, 1282, 1705, 1525, 2873, NA, 3955, 1659, 1935…
$ land_area_2015      <dbl> 4263, 6780, 11400, 6050, 12371, NA, 9360, 2250, 86…
$ building_area_2016  <dbl> 1853, 1282, 1705, 1525, 2873, NA, 3955, 1659, 1935…
$ land_area_2016      <dbl> 4263, 6780, 11400, 6050, 12371, NA, 9360, 2250, 86…
$ building_area_2017  <dbl> 1853, 1282, 1705, 1525, 2873, NA, 3955, 1659, 1935…
$ land_area_2017      <dbl> 4263, 6780, 11400, 6050, 12371, NA, 9360, 2654, 86…
$ building_area_2018  <dbl> 1853, 1282, 1705, 1525, 2873, NA, 3955, 1659, 1935…
$ land_area_2018      <dbl> 4263, 6780, 11400, 6050, 12371, NA, 9360, 2654, 86…
$ region              <chr> "05946c0909f5c241db6179659dc763e14544cf75", "7fb55…
$ building_area_2019  <dbl> 1853, 1282, 1705, 1525, 2873, 2434, 3955, 1659, 19…
$ land_area_2019      <dbl> 4263, 6780, 11400, 6050, 12371, 5292, 9360, 2654, …
$ building_value_2015 <dbl> 80939, 74056, 49042, 85632, 141722, NA, 536000, 33…
$ land_value_2015     <dbl> 22061, 13526, 11325, 21368, 23175, NA, 464000, 770…
$ building_value_2016 <dbl> 80939, 74308, 53543, 116825, 141722, NA, 490000, 2…
$ land_value_2016     <dbl> 22061, 13526, 11325, 25988, 23175, NA, 464000, 770…
$ building_value_2017 <dbl> 97988, 78545, 77254, 132614, 141722, NA, 551000, 3…
$ land_value_2017     <dbl> 24512, 13526, 11325, 25988, 23175, NA, 464000, 847…
$ building_value_2018 <dbl> 97988, 78545, 77254, 128475, 141722, NA, 551000, 3…
$ land_value_2018     <dbl> 24512, 13526, 11325, 25988, 23175, NA, 464000, 847…
$ building_value_2019 <dbl> 120488, 70374, 77390, 127337, 166403, 202963, 5260…
$ land_value_2019     <dbl> 24512, 12580, 11325, 25988, 23175, 22631, 464000, …
$ assessed_2015       <dbl> 103000, 92171, 64968, 107000, 167732, NA, 1000000,…
$ protested_2015      <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, NA, TRUE, TRUE, F…
$ assessed_2016       <dbl> 103000, 92171, 64968, 142813, 167732, NA, 954000, …
$ protested_2016      <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, NA, TRUE, FALSE,…
$ assessed_2017       <dbl> 122500, 92171, 93038, 158602, 167732, NA, 1015000,…
$ protested_2017      <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, NA, TRUE, FALSE,…
$ assessed_2018       <dbl> 122500, 92171, 93038, 154463, 167732, NA, 1015000,…
$ protested_2018      <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, NA, TRUE, FALSE, …
$ assessed_2019       <dbl> 145000, 87291, 93038, 153325, 192413, 225594, 9900…
$ protested_2019      <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALS…
$ school_dist         <dbl> 8, 9, 1, 8, 24, 5, 25, 9, 9, 5, 4, 1, 9, 1, 16, 1,…
$ zone                <chr> "1b2fc12a8ee5147b030dae24f76bfb63bf401fdd", "34827…
$ subneighborhood     <chr> "9bd56df07606b41647719f2022859ef0571cf82c", "25f65…
$ neighborhood        <chr> "78bde8025e65c00938c9f5a63f652c33c7b7fc61", "5495e…
$ TARGET              <dbl> 145000, 87291, 93038, 153325, 192413, 225594, 9900…

Here the acct identifier matches the building information files. Here, I have given you assessment history (both building and land) where available as well as a record of years in which the property owners protested their initial assessment. The School District, Zone, Region, Subneighborhood, and Neighborhood identifiers can be used geographically similar properties, which may be helpful to identify ‘hot’ neighborhoods.² The TARGET column is simply a repeat of the assessed_2019 column, emphasizing the prediction goal of this task.

Finally, you receive the test set file:

library(readr); library(dplyr)
glimpse(read_csv("competition_data/assessment_history_test.csv.gz"))

Rows: 418,858
Columns: 33
$ acct                <chr> "e8edfed00598c883053399cfc54b207b42a519f9", "3090f…
$ building_area_2015  <dbl> 2537, 1496, 1508, 1670, 1944, 1944, 1732, 3841, 11…
$ land_area_2015      <dbl> 5000, 5000, 6250, 6250, 1765, 1720, 6050, 5000, 50…
$ building_area_2016  <dbl> 2537, 1496, 1508, 1670, 1944, 1944, 1732, 3841, 11…
$ land_area_2016      <dbl> 5000, 5000, 6250, 6250, 1765, 1720, 6050, 5000, 50…
$ building_area_2017  <dbl> 2537, 1496, 1508, 1670, 1944, 1944, 1732, 3841, 11…
$ land_area_2017      <dbl> 5000, 5000, 6250, 6250, 1765, 1720, 6050, 5000, 50…
$ building_area_2018  <dbl> 2537, 1496, 1508, 1670, 1944, 1944, 1732, 3841, 11…
$ land_area_2018      <dbl> 5000, 5000, 6250, 6250, 1765, 1720, 6050, 5000, 50…
$ region              <chr> "02a37f2eadcf42e9ab0748a5814e8d2f43319ecd", "02a37…
$ building_area_2019  <dbl> 2537, 1496, 1508, 1670, 1944, 1944, 1732, 3841, 11…
$ land_area_2019      <dbl> 5000, 5000, 6250, 6250, 1765, 1720, 6050, 5000, 50…
$ building_value_2015 <dbl> 279405, 2618, 158027, 4899, 222791, 217277, 40861,…
$ land_value_2015     <dbl> 75000, 75000, 84375, 84375, 88956, 88236, 110500, …
$ building_value_2016 <dbl> 233935, 2797, 132288, 4153, 210044, 217277, 31457,…
$ land_value_2016     <dbl> 125000, 125000, 140625, 140625, 88956, 88236, 1381…
$ building_value_2017 <dbl> 235375, 2870, 113805, 4153, 210044, 211567, 32141,…
$ land_value_2017     <dbl> 125000, 125000, 140625, 140625, 88956, 88236, 1381…
$ building_value_2018 <dbl> 235375, 2909, 113805, 4153, 210044, 211567, 32823,…
$ land_value_2018     <dbl> 125000, 125000, 140625, 140625, 88956, 88236, 1381…
$ assessed_2015       <dbl> 355945, 78118, 242402, 89274, 311747, 305513, 1538…
$ protested_2015      <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE…
$ assessed_2016       <dbl> 360475, 128297, 272913, 144778, 299000, 305513, 17…
$ protested_2016      <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRU…
$ assessed_2017       <dbl> 360475, 128297, 254430, 144778, 299000, 299803, 17…
$ protested_2017      <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE…
$ assessed_2018       <dbl> 360475, 128297, 254430, 144778, 299000, 299803, 17…
$ protested_2018      <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE…
$ protested_2019      <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRU…
$ school_dist         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ zone                <chr> "ea08b1b7cad521f20b71379bcef05c2741eaeee8", "ea08b…
$ subneighborhood     <chr> "6f23373854222ed999c4c2114f5cb4929f22773e", "6f233…
$ neighborhood        <chr> "6315b211033894c97f758c5b2d0e53a00ca66f81", "6315b…

This is similar to the training file, but with certain 2019 data deleted. (Note that you retain building and land area as well as the protest indicator.)

Team Submissions

Students may, but are not required to, work in teams of 2 for this competition. Teams will submit one (shared) final presentation and final report and will receive the same grade for all course elements. Students will create teams via Kaggle’s team functionality.

Prediction Accuracy (100 Points)

Prediction accuracy will be assessed using Kaggle Competitions. I have provided Kaggle with a private test set on which your predictions will be evaluated. Kaggle will further divide the test set into a public leaderboard and a private leaderboard. You will be able to see your accuracy on the public leaderboard but final scores will be based on private leaderboard accuracy so don’t overfit.

Grades for this portion will be assigned as:

\[\text{Grade} = \left(\frac{\text{Your Best RMSE} - \text{Constant Model RMSE} }{\text{Class Best RMSE} - \text{Constant Model RMSE} }\right)^{\alpha}\]

where \(\alpha \approx 1\) will be selected by the instructor to ensure a suitable distribution of grades.

That is, your grade will essentially be a ‘rescaled’ \(R^2\) where the best performance in the class is 100% and everyone else is some fraction thereof. (Note that this means the top performance automatically gets 100% on this component.)

You will be allowed to submit only three sets of predictions to Kaggle per day so use them wisely. At the end of the competition, you will be able to select two sets of predictions to be used for your final evaluation (taking the better of the two) so track your submissions carefully.

Example Prediction

To submit your predictions for grading on Kaggle, they must be formatted in a two column CSV with columns:

ACCOUNT - the unique identifier (acct) used for each property
TARGET - your prediction of the 2019 total appraisal

Below, I show how to download and create a very simple baseline prediction. Specifically, I implement the ‘constant model’, where the average 2019 appraisal of the training samples is used as a constant predictor for every point in the test set. (This is the constant model you must beat to get a non-zero score on this element of the course.)

In R,

library(readr)
library(dplyr)
TRAINING_SAMPLES <- read_csv("http://michael-weylandt.com/STA9890/competition_data/assessment_history_train.csv.gz")
TEST_POINTS <- read_csv("http://michael-weylandt.com/STA9890/competition_data/assessment_history_test.csv.gz")

AVERAGE_ASSESSMENT <- mean(TRAINING_SAMPLES$TARGET)

TEST_PREDICTIONS <- TEST_POINTS |> 
    select(acct) |>
    mutate(TARGET = AVERAGE_ASSESSMENT) |>
    rename(ACCOUNT = acct)

write_csv(TEST_PREDICTIONS, "kaggle_sub_r.csv")

In python,

import pandas as pd

TRAINING_SAMPLES = pd.read_csv("http://michael-weylandt.com/STA9890/competition_data/assessment_history_train.csv.gz")
TEST_POINTS = pd.read_csv("http://michael-weylandt.com/STA9890/competition_data/assessment_history_test.csv.gz")

AVERAGE_ASSESSMENT = TRAINING_SAMPLES["TARGET"].mean()

TEST_POINTS[["acct"]].rename(columns={"acct":"ACCOUNT"}).\
    assign(TARGET=AVERAGE_ASSESSMENT).\
    to_csv("kaggle_sub_py.csv", index=False)

If your submission file is not formatted in this way, Kaggle may not accept it.

To submit your predictions, you need to upload them to Kaggle through the web interface. A link to register for the course competition will be distributed through Brightspace. Note that you must use your @cuny.edu email so that I can match your Kaggle ID with your student records.

Final Presentation (50 Points)

On the final day of class (2025-05-13), each competition unit (either an individual or a team) will give a 6 minute presentation, primarily focused on the models and methods they used to build their predictions. This presentation should include discussion of:

What ‘off-the-shelf’ models were found to be the most useful?
What (if any) extensions to standard models were developed for this project? (These need not be truly novel - if you take an idea from a pre-existing source and adapt it for use in this problem, e.g. because it does not appear in sklearn, this is a contribution worth discussing.)
What (if any) ensembling techniques did you apply?
What features or feature engineering techniques were important for the success of your model?
What techniques for data splitting / model validation / test set hygiene did you use in your model development process?

Presentations must be submitted as PDF slides by midnight the night before class. The instructor will aggregate slides into a single ‘deck’ to be used by all students. (Teams of two should both submit a copy of their slides. The instructor will de-duplicate submissions.)

Students will be graded according to the following rubric:

Report Element	Excellent. “A-” to “A” (`90% to 100%`)	Great. “B-” to “B+” (`80% to 89%`)	Average. “C” to “C+” (`73% to 79%`)	Poor. “D” to “C-” (`60% to 72%`)	Failure. “F” (`0% to 59%`)
Quality of Presentation (20 points)	Excellent and Engaging Presentation. Visualizations and script clearly convey content in detail without obscuring the bigger picture.	Great presentation. Visualizations and script convey content well with only minor flaws. Balance of detailed and big-picture exposition is lost.	Solid presentation. Visualizations or script have one to two notable flaws. Insufficient discussion of details OR big-picture.	Poor presentation. Visualizations or script have 3 or more notable flaws. Underwhelming discussion of both details AND big-picture.	Unacceptable presentation. Significant weaknesses in visualization and script. Significant omissions in details or big-picture analysis.
Pipeline Design (10 points)	Excellent pipeline design. Allows for effective re-use of training data without risk of overfitting. Allows for more detailed queries than overall RMSE.	Great pipeline design. Allows for effective re-use of training data without risk of overfitting, but only allows queries of RMSE.	Solid pipeline design. Takes active steps to minimize chance of ‘leakage’ but may allow issues.	Poor pipeline design. Attempts made at avoiding leakage and overfitting, but approach is fundamentally flawed.	Unacceptable pipeline design. Little or no attention paid to data hygiene.
ML Methodology (10 points)	Excellent Methodology. Uses advanced methodologies not covered in class in a way that is well-suited for the prediction task. Methodology uses features and time structure in interesting and creative ways.	Great Methodology. Uses ‘black box’ methodologies not covered in class, but with little specialization for the prediction task. OR Applies and combines methods covered in class with particularly insightful approaches to tuning and specialization for the prediction task.	Solid Methodology. Applies and combines methods covered in class with moderate attempts to tune and specialize for task at hand.	Poor Methodology. Applies methods covered in class without any attempt to improve or specialize for task at hand.	Unacceptable Methodology. Fails to apply any advanced methodology (e.g., only uses linear regression and/or basic ARMA-type time series models).
Feature Engineering and Analysis³ (5 points)	Excellent FEA. Impressive feature engineering creating significant improvements in predictive performance. Careful analysis of feature importance comparing and contrasting ‘model-specific’ importance and ‘model-agnostic’ importance.	Great FEA. Meaningful feature engineering leading to non-trivial improvements in predictive performance. Analysis of feature importance compared across multiple models.	Solid FEA. Features are treated appropriately, with elementary analysis of feature importance for the model(s) used.	Poor FEA. Features are treated appropriately for their modality, but little to no feature analysis or engineering.	Unacceptable FEA. No attempt to analyze features.
Timing (5 points)	Presentation lasts between 5:45 and 6:15	Presentation lasts between 5:25 and 5:45 or between 6:15 and 6:35	Presentation lasts between 5:00 and 5:25 or between 6:35 and 7:00	Presentation lasts between 4:30 and 5:00 or between 7:00 and 7:30	Presentation runs shorter than 4:30 or longer than 7:30

Students will also vote on an “Audience Choice” award; the winning presentation will automatically receive a score of 50.

Final Report (50 Points)

By 2025-05-20 11:45pm ET, each competition unit (either an individual or a team) will submit a final report of no more than 10 pages (10-12 point, single- or double-spaced) providing an After-Action Report of their competition performance. This report should focus on three topics:

1. Are there any systematic errors in model predictions that need to be addressed before this model could be applied broadly. (E.g., is it systmatically low or high in a particular neighborhood; does it undervalue especially high value homes; etc?)
1. What insights into the underlying data can be gleaned from the model? E.g., are certain features especially important for making predictions? Or are certain features which you would expect to be important not actually particularly important?
1. What steps did your team take that were particularly helpful to maximizing predictive performance? Or, what parts of your model development cycle were weak and could be the most improved?⁴

Note that this report is not solely focused on predictive performance. Analysis that dives deep into the underlying economics of real estate assessment and generates novel insights will score as well (or perhaps even better) than a highly performant but noninterpretable model.

To assist in developing this After-Action Report, the instructor will provide non-anonymized versions of the data (as well as a mapping to the anonymized data) after the Kaggle competition ends.

The report should include all code used to prepare the data and train and predict from the best performing models in an Appendix. Significant penalties may be applied if the instructor is unable to reliably reproduce your predictions. (You may choose to submit this Appendix in the form of an iPython Notebook, Quarto document, Docker container, etc to maximize reproducibility.) Note that this appendix does not count against your 10 page limit.

You may, but are not required to, share your code with the instructor via an emailed Zip file or link to a public code hosting platform such as GitHub.

Reproducibility Info

To maximize the reproducibility of your code, make sure to:

Avoid hard-coding any file paths. It is better to download and/or read directly from hosted copies whenever possible.
Save random seeds used to create data splits, initialize training, etc.
List all software and packages used, including version information.
Have a clear set of ‘reproduction steps’ and accompanying documentation.

Teams of two should both submit a copy of their final report. The instructor will de-duplicate submissions.

The report will roughly be assessed acccording to the following rubric though the instructor may deviate as necessary.

Report Element	Excellent. “A-” to “A” (`90% to 100%`)	Great. “B-” to “B+” (`80% to 89%`)	Average. “C” to “C+” (`73% to 79%`)	Poor. “D” to “C-” (`60% to 72%`)	Failure. “F” (`0% to 59%`)
Quality of Report (15 points)	Excellent Report. Report has excellent writing and formatting, with particularly effective tables and figures. Tables and Figures are “publication-quality” and clearly and succinctly support claims made in the body text. Text is clear and compelling throughout.	Great Report. Report has strong writing formatting. Text is generally clear, but has multiple minor weaknesses or one major weakness; tables and figures make their intended points, but do not do so optimally.	Solid Report. Report exhibits solid written communication, key points are made understandably and any grammatical errors do not impair understanding. Code, results, and text could be better integrated, but it is clear which elements relate. Formatting is average; tables and figures do not clearly support arguments made in the text and/or are not “publication quality”.	Poor Report. Written communication is below standard: points are not always understandable and/or grammatical errors actively distract from content. Code, results, and text are not actively integrated, but are generally located ‘near’ each other in a semi-systematic fashion. Poor formatting distracts from substance of report. Tables and Figures exhibit significant deficiencies in formatting.	Unacceptable Report. Written communication is far below standard, possibly bordering on unintelligible. Formatting prohibits or significantly impairs reader understanding.
Analysis of Predictive Accuracy (10 points)	Excellent Analysis. Team is able to clearly identify strengths and weaknesses of their model and to propose extensions / next steps that could use de-anonymized structure to further improve model performance.	Great Analysis. Team identifies strengths and weaknesses of their model, but without clear ‘next steps’ for model improvement.	Solid Analysis. Accuracy analysis successfully identifies patterns of error, but does not connect these to modeling.	Poor Analysis. Accuracy analysis attemps to identify patterns of error, but fails to distinguish systematic error from randomness.	Unacceptable Analysis. Accuracy analysis is superficial and does not take advantage of data structure in a meaningful way.
Model-Driven Insights into Data Generating Process (10 points)	Excellent Insights. Modeling process creates significant new insights into the economics of real property assessment. Insights are then used to further improve predictive modelling in a virtuous cycle.	Great Insights. Modeling process creates significant new insights into the economics of real property assessment, but insights do not improve predictive modeling.	Solid Insights. Modeling process creates new non-trivial insights, but not ones that have major impact on predictive performance. (E.g., grey houses have a much higher chance of having rooftop solar than other house colors because grey was the most popular ‘builder spec color’ by the time that residential solar became commonplace. Interesting, but not especially helpful.)	Poor Insights. Modeling process only reproduces known / trivial insights about data generating process (e.g., bigger houses are worth more than smaller houses ceteris paribus).	Unacceptable Insights. No attempt is made at generating meaningful insights from models.
Reflection on Competition Workflow (10 points)	Excellent Reflection. Clear identification of all important good and bad decisions made over the course of the competition, with insightful ‘take aways’ that can be used by self and other teams to significantly improve performance on future prediction tasks. Importance of key decisions is clearly demonstrated.	Great Reflection. Impressive reflection on key decisions (good and bad) made over the course of the competition. ‘Take Away’ messages would be useful if this competition were re-run as is (or with minor changes) but do not necessarily generalize to other similar tasks. Importance of key decisions is partially demonstrated.	Solid Reflection. Reflection on key decisions identifies major choices made throughout competition, but fails to fully analyze their impact. ‘Take away’ messages are useful, but generic and not particularly relevant to this course or this competition. (E.g., advice on the best way to tune the lasso) Minimal effort to demonstrate importance of key decisions.	Poor Reflection. Reflection seems to miss one or more major choices made over the course of the semester OR attributes too much importance to an unimportant decision. Fails to demonstrate importance of key decisions. ‘Take away’ messages are of limited general applicability.	Unacceptable Reflection. Minimal or shallow reflection. ‘Take away’ messages are trivial or misleading.
Reproduction Code (5 points)	Excellent Reproduction Code. Code is easy to read and execute, with excellent commenting, formatting, etc. and clearly reproduces submitted predictions.	Great Reproduction Code. Code is easy to read, but requires some effort to execute and reproduce submitted predictions.	Solid Reproduction Code. Code lacks clarity, but still appears to reproduce submitted predictions with reasonable effort.	Poor Reproduction Code. Instructor cannot reproduce submitted predictions without significant effort.	Unacceptable Reproduction Code. Code cannot reproduce submitted predictions.

Footnotes

Unsurprisingly, New York makes this more complicated.↩︎
I will not clarify which geographic features are bigger or larger than others. All I wil guarantee is that the different hexadecimal codes are consistent across files.↩︎
Feature Engineering and Analysis includes ‘classical’ feature engineering and analysis to identify key features (e.g. feature importance rankings or variable selection).↩︎
In this ‘self-evaluation’ section, you are encouraged to be truthful and honest in your reflections. I already know how well you did and you won’t be able to convince me otherwise, so if you made fundamental errors that hindered your performance, I would rather see them discussed honestly (indicating understanding of how you could improve) rather than minimized.↩︎