Live by the Variance, Die by the Variance (and why I hate Duke [and Mercer] for that matter

The first weekend of the NCAA Tournament was a wild one. In our competition, we chose models with high variance, knowing full well we could be in a world of hurt if a game or two did not go our way. Being scored on a log loss scale was new to us, and we knew of the risks, but did not really think things could get too bad.

We knew Wichita State was rated too high, but we thought, “hey, maybe they are just that good,” and rolled with it. Anyhow, we had figured they would eventually lose, but one game would not kill us since we could make it up somewhere else.

That was of course until Mercer took it to the Dukies. Most models had Duke as a very strong 3-seed. I believe the fivethirtyeight blog even had Duke as a 93-94% favorite. Those kinds of numbers are for wimps. We had them as a prohibitive 98.4% favorite in one model and 98.9% in another. When you do something like this in a competition where the scoring is based on log loss, you better hope that they win, or else you are dead in the water.

The average score using this model (at least your goal) should be in the 0.4 to 0.5 range. The Duke loss was a 4.1 to 4.5 point penalty that just destroyed us. Unfortunately, no one else was dumb enough to match our probability.

By comparison, last year’s Cinderella, Florida Gulf Coast, had only a 4% chance to beat Georgetown. Using the same formula, you penalty is in the 3’s.

We could have easily solved all of this and handled the huge variance by increasing the variance when calculating our probabilities. But we were in win or go home mode… oh well.

Here is a quick updated simulation of the remaining teams. I ran a somewhat newer updated model 50,000 times.

UPDATED MONTE CARLO SIMULATION WITH NEW PROBABILITIES

Team Sweet 16 Elite 8 Final 4 Final Champion
Louisville 9137 8891 12333 7167 12472
Arizona 7871 9126 13607 7425 11971
Florida 12205 4016 11482 10622 11675
Virginia 16372 7740 11536 7621 6731
Wisconsin 16889 22709 6769 2276 1357
Michigan St. 33628 6133 6169 2755 1315
Michigan 23402 18883 5046 1715 954
UCLA 37795 3240 5447 2581 937
Tennessee 26598 17198 4236 1266 702
Iowa St. 21098 20109 5932 2172 689
Kentucky 40863 5028 2921 810 378
Connecticut 28902 16018 3673 1139 268
San Diego St. 42129 4587 2510 562 212
Baylor 33111 13578 2578 570 163
Stanford 22166 23422 3430 860 122
Dayton 27834 19322 2331 459 54

My 50,000 Monte Carlo Simulation Results for the NCAA Basketball Tournament

With March Madness upon us, I have been in a solid state of sleep-deprivation. It all started with a class project assigned in late February that suggested we enter the Kaggle competition of our choice or create a similar type project.

I was immediately drawn to the March Machine Learning Mania being hosted by Kaggle and Intel. For the past three weeks, in any spare time, I have been trying to find and clean data to run models. I thought things were slowing down last week until I decided to try some new data I had found.

That being said, I was running and testing models and way overthinking this whole competition down to the very last few hours, right up to the deadline yesterday afternoon. The competition requires you to submit a probability for all 2278 (68 teams) potential matchups. Only the 63 games are actually played (they do not count the play-in games) are scored.

Since I had all of these probabilities, I decided I should write my own Monte Carlo simulation to see what would happen. I meant to post the results yesterday, but as it seems to always happen when you write code without much sleep, debugging becomes a painful mess. I had a midterm yesterday as well and may have been a bit exhausted.

But here it is – 50,000 simulated runs of the tournament, based on the last data I generated for the contest. (You were allowed to submit to sets of scores with your best score becoming your entry).

I am still a little too tired to give much insight, but will post more as the tournament goes on….

Team First Second Sweet 16 Elite 8 Final 4 Final Champion
Arizona 76 3716 2635 8198 15541 6957 12877
Louisville 461 810 16698 4892 10298 5408 11433
Florida 82 4645 5746 6711 10486 11375 10955
Virginia 57 1858 9416 9911 13620 9194 5944
Wichita St. 86 5425 27100 4049 6536 2974 3830
Villanova 307 6967 9237 20123 8362 3727 1277
Creighton 682 6139 12844 20991 6396 1793 1155
Kansas 789 8190 10874 20648 5694 2934 871
Duke 831 12439 10829 19701 4117 1280 803
Michigan St. 1645 10072 28384 5140 3438 1089 232
Wisconsin 1174 11253 22337 11972 2635 467 162
Michigan 363 6456 26221 14597 1873 358 132
UCLA 3733 19990 21612 2534 1550 505 76
Syracuse 1839 16022 20145 9813 1648 468 65
VCU 5821 21730 18738 2082 1194 384 51
Iowa St. 4069 16082 20478 7517 1497 314 43
Pittsburgh 4715 40614 2617 1263 617 150 24
Tennessee 6498 30742 6326 5693 599 118 24
Ohio St. 11895 22636 11142 3751 475 89 12
Oklahoma 10482 17021 20399 1596 451 44 7
North Carolina 15357 19287 11678 3107 485 81 5
Kentucky 8253 36494 4727 338 157 26 5
Oklahoma St. 24608 23467 758 846 286 31 4
Gonzaga 25392 22744 815 766 250 30 3
Oregon 14064 25546 8095 2018 250 24 3
Connecticut 13243 30497 3583 2340 290 45 2
San Diego St. 7763 19228 21008 1557 411 31 2
New Mexico 17082 26126 4091 2373 285 42 1
Cincinnati 21385 21488 6385 557 162 22 1
Dayton 38105 9663 1985 234 11 1 1
Baylor 13150 30926 4120 1634 151 19 0
Harvard 28615 16988 4026 289 72 10 0
Stanford 32918 14955 1552 541 30 4 0
Providence 34643 11130 3680 505 39 3 0
Memphis 24284 24724 818 155 17 2 0
George Washington 25716 23364 778 124 17 1 0
Texas 22213 23781 3589 404 13 0 0
Saint Joseph’s 36757 12239 784 211 9 0 0
BYU 35936 12131 1734 190 9 0 0
Saint Louis 17891 31293 778 32 6 0 0
Arizona St. 27787 19413 2573 222 5 0 0
North Dakota St. 39518 7502 2897 78 5 0 0
Nebraska 36850 12319 702 126 3 0 0
New Mexico St. 42237 6249 1485 26 3 0 0
Massachusetts 43502 6034 405 57 2 0 0
Stephen F. Austin 44179 5000 796 23 2 0 0
Kansas St. 41747 7998 248 5 2 0 0
Manhattan 49539 313 145 2 1 0 0
North Carolina Central 45931 3501 550 18 0 0 0
Tulsa 46267 3280 440 13 0 0 0
Western Michigan 48161 1679 152 8 0 0 0
North Carolina St. 32109 17584 301 6 0 0 0
Colorado 45285 4659 51 5 0 0 0
Delaware 48355 1452 190 3 0 0 0
Louisiana Lafayette 49318 616 64 2 0 0 0
Mercer 49169 785 44 2 0 0 0
Eastern Kentucky 49211 729 59 1 0 0 0
American 48826 1070 104 0 0 0 0
Wofford 49637 350 13 0 0 0 0
Milwaukee 49693 297 10 0 0 0 0
Cal Poly 49914 83 3 0 0 0 0
Weber St. 49924 73 3 0 0 0 0
Coastal Carolina 49943 54 3 0 0 0 0
Albany 49918 82 0 0 0 0 0