Saturday marks the running of the 2021 NCAA D1 Cross Country National Championships in Tallahassee with Northern Arizona on the brink of their fifth team title in six years.
It has been a while since I have publically published any new models in any sport, but yesterday on Twitter, Citius Magazine posted something about a video they had done with Isaac Wood at The Wood Report on his prediction.
Being relatively new to the world of collegiate track and cross country, I had no idea who Isaac was and immediately went and subscribed to his website to see what he had built. I also like to see how others model sport and Isaac has an interesting website.
From both their tweet and checking out the website and his simulator, I began to wonder what I could produce before Saturday’s meet. My initial thought was to create my own individual runner ratings and simulate from there, but to be honest, that is something I have been thinking about for a few months now and is just too big a project. What I could do is take Isaac Woods’s individual runner rankings and try to expand on the team result.
I decided I would build a quick Monte Carlo simulator using the top-7 runners for each team racing Saturday, based on The Wood Report, and calculate probabilities for how each team will do. I also figured I might as well look at individual runners and how the top-10 may look. (HUGE CAVEAT – I am not including any of the individual runners who are not competing within the team standings. I just didn’t have enough time to build everything from scratch.)
EDITOR’S NOTE – While writing this, I realized that most of you could care less about the methodology section, so please feel free to skip all of that and see what I came up with. I understand. I do these types of projects mostly to share my thinking so I can improve my methods at a later date as questions/data improvements come.
The first step is to take the 31 teams and find the top-7 runner ratings. With such a short time horizon and not really having a chance to build my own ratings, I have to combine a little art to the science. The main weakness of doing something like this is that I have no idea how Isaac Wood (and his PhD student) created these and really no idea about the variability of each individual runner.
Cross Country is such a great event because every course is different every day. Hills, terrain, altitude, temperature and humidity vary even from day-to-day on the same course. I am not even going to get into teams racing at a variety of distances leading up to this weekend or where individual runners were within their training when they raced various events. That is not being captured here.
We have what we have. Each runner has been given a rating that from my point of view appears to be really solid.
What I can do is simulate variability. This is where art blends with science.
Let’s take BYU’s Connor Mantz with a rating of 9.97. Sure, he’s a favorite, but how much? In reality, we would have all of the variables I mentioned above already baked into his rating, plug Saturday’s expected variables into a formula, and see what his expected time would be. We would then do that for every runner and build the projected results.
But I don’t have that. I have a bunch of individual ratings. Wood simply uses those to build a final result. Instead of doing that, I prefer to throw some variability into the ratings and run the race thousands of times.
How much variability and where do you model this?
Back to Mantz. Sure he’s a favorite, but there are several guys who can win this race. Some people will have the race of their lives while others will struggle for one reason or another (Think of the three H’s: hills, heat and humidity).
I took the top runner for each school participating and found the standard deviation. I did this as well for the rest of the runners. Not surprisingly, as you go from the first runner for each team to the seventh, the variability explodes. This makes sense. Depth is where this thing is won.
I finally settled on a number closer to the standard deviation of the top runners for each team and used it for every runner in the field. This isn’t the best method, as each runner should have their own variance, but it’ll do.
FINALLY, I use a function to generate a random number within +/- 1 ‘standard deviation’ for each runner to figure out their ‘speed’ for the race, rank the runners and score the race. To understand this better, imagine some runners will run better and some worse, but they won’t always run right at their rating. Odds are they will be somewhere close. Of course there will be outliers, but let’s assume they stay ~ +/-34% of their rating in a ‘standard’ way. I also don’t want to talk about Outliers too much as this may send shivers down Chris Chavez’ spine. #TeamGladwell. just Kidding.
I do this 10,000 times.
That is, I simulate the race 10,000 times and see how it all plays out. This should give us a pretty good indication of the probability each team has to finish this weekend in a particular place.
After running the simulation 10,000 times, the overwhelming favorite is Northern Arizona who wins the title 48.78% of the time. Oklahoma State captures the title 22.64%, with Iowa State winning 12% of the time.
Below shows how many times each team placed in the team standings.
How good is Northern Arizona? They have over and 87% chance to finish in the top-3.
Possibly the more interesting part of all of this is the fact that after the top two, the teams are very bunched together. The probabilities for Iowa State, Colorado, Notre Dame, BYU and Stanford are very close. Fighting through the end will be key and I wonder if something that was mentioned on the podcast could be a factor. Will the course favor track runners over those ‘mudders’ like Colorado?
This is even clearer when we look at the average team finish below.
And don’t ignore Tulsa! They actually won the whole thing 8 times out of 10,000. Sure, that’s only 0.08% of the time, but there’s a chance.
Here’s a table of each team’s AVERAGE FINISH within the simulation.
|SCHOOL||AVERAGE TEAM FINISH|
I find it interesting to see the Big 10 Conference anchoring the bottom. If this pans out, then maybe the committee overvalued their quality. If they perform much better than the model, then I would suspect this means the Wood model possibly held those down too much.
Now, remember, I did not include the true individuals, running the event without their teams.
Here are the top-10 runners based on the simulation.
|Adiaan Wildschutt||Florida State||2272||1839||1096||834||646||551||460||442||431||339|
|Wesley Kiptoo||Iowa State||1840||1727||1193||826||688||593||501||480||435||401|
|Abdihamid Nur||Northern Arizona||415||932||988||805||715||599||603||542||491||450|
|Nico Young||Northern Arizona||397||853||896||782||685||570||567||531||527||490|
|Ahmed Muhumed||Florida State||47||210||397||532||568||505||505||512||503||511|
The race is expected to be very close and Mantz (BYU), Wildschutt (FSU) and Kiptoo (ISU) are the clear favorites, but all of the big names are there. Once you venture past those first three, it appears to be wide open.
My model actually had 18 different runners who won the title at least once (way to go, Ky Robinson!) and 21 total who grabbed second at least one time.
I hope you enjoyed this and I understand this was a long and winding road, but I enjoyed diving into this for a day or so and seeing what the numbers showed. Best of luck to all of the runners, especially the ones I know… (go BTR!)