Predicting Federer-Tursunov and other Friday French Open Matches Using Markov Chain

Today I was enamored with the article, Inside the Shadowy World of High-Speed Tennis Betting. The article mentions the courtsiders who would sit court side at a tennis match and try to relay information quicker than the tournament computers to betting partners. Great read. Not sure these courtsiders were really doing anything illegal.

Buried deep in the article was a mention of the system this one organization created to predict the outcome of tennis matches for betting purposes. It links to a website, Summer of Jeff, and a post, Python Code for Tennis Markov. If you follow the links to the gitHub site, there is some pretty elaborate Python code for generating probabilities based on Markov Chain theory. The code is pretty easy to use, if you understand Python and statistics, although it needs some cleaning up if you plan on using it for entire match prediction (hint: the matchProbs function needs some fixes to run).

The biggest issue is determining the initial probabilities. You need to create each server’s probability to win a point.

To do this, I decided to hit the trusty website and pulled that information up.

For the year Roger Federer has won 90% of all service games, but only 70% of his service points. On clay this season, he is 89% and 67%. On the other hand, Dmitry Tursunov has won 22% of return games and 36% of return points. On clay he is 24% and 37%. Assuming the majority of these results came from ‘inferior’ players, we might suggest that these numbers regress to each other. I am going to say that Federer is likely to win 65% of his service points. One down.

Now when Tursunov serves, he’s won 75% of service games and 61 of service points, 70%-60% on clay. Federer has won 29% of service return games and 41% of points, 27%-40% on clay. That seems to work out quite nicely to 60-40, so Federer’s return probability will be 40%.

Plugging this into the handy code mentioned above, we get that Federer is a 78.5% favorite to win tomorrow.

Jo-Wilfried Tsonga has won 68% of service points, 65% on clay, while Jerzy Janowicz has won 34% of return points all season and an improved 36% on clay. What is crazy about this is you might suggest that Janowicz is a better clay court than hard court player. Well, amazingly, he had not won a single clay court match this spring before winning his first two rounds at Roland Garros. Oh well. I am still going to give his the benefit and place Tsonga as 65% to win a point on serve.

Returning, Tsonga has been 34% for the year and 35% on clay, while Janowicz has won 62% on serve and 68% on clay. Again Janowicz stats are much better on the terre battue. I am going to just split this straight and leave Tsonga’s return percentage at 34%.

We all know the French crowd will be pulling for their man, so that may be the edge, however, the stats say that Janowicz looks to be a slight favorite at 56.1%. Moving Tsonga’s serve percentage up just a point makes this a dead heat.

Looking at the odds at, Federer is -2500, so that’s a ridiculous bet, but Janowicz is actually +325 v. Tsonga, so that may be worth a play. I hope to look into this more as the tournament progresses.

NCAA Men’s DI Tennis Regionals Simulated 50,000 times

This is posted on my college tennis website so aptly named, I decided to post it here as well.. why not, right?

I’m sitting in the middle of exams and term projects looking for ways to relax. What better way than to run a Monte Carlo Simulation of each of the men’s regionals, based on my year-end ratings?

So I ran each regional 50,000 times and you can see the results. One of the more intriguing of course is the Nashville Regional, where I could not really account for the home court (and probably outdoor) advantage the Vanderbilt will have over Columbia. That probably sways things a little, I am guessing in the range of 5-15%.

The first number is how many times each team won the entire regional. The second is their probability of coming out of the regional.

BEST FIRST ROUND MATCH-UPS: The Vanderbilt-Virginia Tech match-up looks good, as do the Oklahoma State-Michigan, Memphis-Drake, South Florida-Florida State,  Northwestern-Mississippi, Wake Forest-Louisville, Boise State-USD and Auburn-Harvard. There are a few more, but that’s my quick take.

USC Regional

University of Southern California 45307 90.614%
University of Idaho 122 0.244%
Oklahoma State University 2283 4.566%
University of Michigan 2288 4.576%


Nashville Regional

Vanderbilt University 9815 19.630%
Virginia Tech 7818 15.636%
East Tennessee State University 1917 3.834%
Columbia University 30450 60.900%


Austin Regional

University of Texas 39580 79.160%
Marist College 224 0.448%
University of Louisiana at Lafayette 1140 2.280%
Mississippi State University 9056 18.112%


College Station Regional

California 11649 23.298%
Texas Tech University 5420 10.840%
Alcorn State University 42 0.084%
Texas A&M University 32889 65.778%


Waco Regional

Baylor University 43520 87.040%
Texas A&M-Corpus Christi 497 0.994%
Stanford University 3601 7.202%
University of Tulsa 2382 4.764%


Champaign Regional

University of Memphis 6515 13.030%
Drake University 5669 11.338%
Ball State University 374 0.748%
University of Illinois 37442 74.884%


South Bend Regional

University of Notre Dame 35693 71.386%
Univ. of Wisconsin-Green Bay 381 0.762%
Northwestern University 6925 13.850%
University of Mississippi 7001 14.002%


Charlottesville Regional

Penn State University 3489 6.978%
UNC Wilmington 636 1.272%
U.S. Military Academy 104 0.208%
University of Virginia 45771 91.542%


Columbus Regional

Ohio State University 45064 90.128%
Bryant University 72 0.144%
Wake Forest University 2384 4.768%
University of Louisville 2480 4.960%


Gainesville Regional

University of South Florida 7877 15.754%
Florida State University 7821 15.642%
St. John’s University 1541 3.082%
University of Florida 32761 65.522%


Durham Regional

Duke University 35589 71.178%
Winthrop University 606 1.212%
University of Tennessee 12034 24.068%
Elon University 1771 3.542%


UCLA Regional

University of San Diego 2321 4.642%
Boise State University 3591 7.182%
Cal Poly 375 0.750%
UCLA 43713 87.426%


Chapel Hill Regional

North Carolina 40147 80.294%
South Carolina State 233 0.466%
University of South Carolina 8986 17.972%
George Washington University 634 1.268%


Athens Regional

North Carolina State 7642 15.284%
University of Oregon 5472 10.944%
Jacksonville State University 161 0.322%
University of Georgia 36725 73.450%


Lexington Regional

University of Kentucky 32487 64.974%
University of Denver 948 1.896%
Clemson University 9801 19.602%
Purdue University 6764 13.528%


Norman Regional

Auburn University 2313 4.626%
Harvard University 2361 4.722%
Montana 88 0.176%
University of Oklahoma 45238 90.476%