Oakland, Pittsburgh slight favorites in Wild Card probabilities

With the MLB Playoffs beginning this evening, I figured it was time to test my rankings and pull out the old probability calculator. I created the MLB Ratings based on a simple least squares NLP Optimization that I have discussed before.

Oakland at Kansas City

The Royals are in the playoffs for the first time in ages and they get to host a game. Unfortunately, they didn’t seem to have a home field advantage during the regular season, so I am not sure how much this helps (although in reality we can assume it does, at least a little). The numbers say the A’s are the better team by almost 0.7 of a run (per game, for the season). I show them as a 63.5% favorite.

San Francisco at Pittsburgh

These teams appear to be very evenly match. On a neutral field, the Giants look to be a 0.15 run favorite. However, this game is not on a neutral field and Pittsburgh has one of the few home field advantages in the playoffs (if we assume the regular season is any indication). This swing makes the Pirates about a 0.215 run favorite tomorrow night, giving them about a 54.3% chance of winning.

Detroit v. Baltimore

Neither team appears to have a home field advantage, so looking at it straight-up, we find that Baltimore looks to be about a 0.4 run favorite (or 57.9%) per game. In a five-game series, the results look like this:

([0.0747, 0.1297, 0.1501], 0.3545, [0.194, 0.2451, 0.2064], 0.6455)

Overall, Baltimore is 64.6% to win the series. The most likely outcome is a Baltimore 3-1 win (24.5%).

Los Angeles v. St. Louis

With neither team holding a home field advantage, the Dodgers look to be about 0.445 runs (or 58.8%) better than the Cards. The five-game series probabilities are:

([0.2033, 0.2512, 0.207], 0.6615, [0.07, 0.1234, 0.1451], 0.3385)

Los Angeles looks about 66.2% to win the series overall. Again, the highest likelihood for an outcome is a 3-1 Dodger win (25.1%).

I will update the probabilities and try to run a Monte Carlo simulation with the data later in the week after we see who wins the Wild Card games.

Generic Sports Series Probability Calculator

With the baseball playoffs upon us, I have decided to start building a simulator to determine series outcomes once they start. I decided to make this as generic as possible. This simulator is not specific to baseball or even to a particular series length.

Obviously, the first parts to think about I addressed in my previous post relating to home field advantage, ratings and the probability a team would win a single game versus a specific opponent.

I will come back to this later in the month, as we get closer to the playoffs and I tie this all together.

Let’s assume for today that we know the probability a specific that Team A will defeat Team B. Let’s also assume, for matters of simplicity, that this single-game probability remains the same throughout the a series, regardless of any possible home field advantage.

Since we are dealing with a single probability and no perceived home field advantage, all we need for inputs are: p(Team A wins a single game), the current series record of the two teams and the numbers of games to win the series (e.g., 1 for a one-game series, 3 for a five-game series and 4 for a seven-game series).

All of my code is listed here on github, https://gist.github.com/sixmanguru

INPUTS
Like I said, let’s keep this simple. Probabilities, current series record, length of series.

seriesProb(.54,0,0,4)

The function calls for the series probabilities, give Team A holding a 54% chance to win a single game, the series is just beginning (0-0) and it takes for games to win the series (seven-game series).

That’s all.

OUPUT
Here’s the abbreviated (rounded to four digits).

([0.085, 0.1565, 0.1799, 0.1655], 0.5869, [0.0448, 0.0967, 0.1306, 0.141], 0.4131)

The first list contains the probabilities that Team A wins the series EXACTLY 4-0, 4-1, 4-2 or 4-3. The number trailing is the total probability Team A wins the series.

The second list contains the probabilities Team A loses the series EXACTLY 0-4, 1-4, 2-4, 3-4, with the total probability they lose the series following.

ALTERNATE EXAMPLES
Let’s assume the only thing you change is the fact that Team A now leads the series 3-0.

seriesProb(.54,3,0,4)

([0.54, 0.2484, 0.1143, 0.0526], 0.9553, [0, 0, 0, 0.0448], 0.0448)

As you can see above, there exists no change for Team B to win the series now 4-0, 4-1 or 4-2 and they have a 4.5% chance to even win the series at all. This can be verified by 0.46^4, which is approximately 0.0448.

Now let’s assume that it is a one game series.

seriesProb(.54,0,0,1)

([0.54], 0.54, [0.46], 0.46)

As you can see, it is one game, so the original probabilities are returned.

Finally, as a test, we say Team A trails the series 3-4 in a seven-game series.

seriesProb(.54,3,4,4)

It quickly returns (0,1). It is impossible for Team A to win and certain that Team B will win.

LIMITATIONS
The two biggest limitations to resolve (assuming you accept the theory that you can actually assign a probability to the function at all) remain to be the possibility of a home field advantage and how it would play out based on the series’ format (i.e., 2-3-2 vs. 2-2-1-1-1 and such)

Lastly, I would like to thank Jeff Sackmann, the author of Tennis Abstract and several other endeavors. His original python code for simulating a tennis match was the foundation for this project. His Python code for tennis Markov Chains can be found here, http://summerofjeff.wordpress.com/2011/01/13/python-code-for-tennis-markov/

MLB Home Field Advantage this season

Honestly, it is hard to get fired up about the MLB Playoffs these days as a Houston Astros fan. But I figure it may be a way to test a few models and work on my programming.

After scrubbing the internet for scores, I decided to do a simple non-linear programming model to create some rankings. If you want to read more about NLP Optimization, please read my earlier posts I ran during last year’s NFL season.

I tried to apply home field advantage as a singular term, but found there wasn’t a generic home field advantage as in football. I then decided to try and determine if each teams’ individual HFA would have any effect on the ratings. With so many more games, this number had a better likelihood of showing some importance.

In general, the average score of a MLB game this year has been 4.11-4.09 in favor of the home team.

When you look at individual HFA, results are pretty amazing. As expected, the Colorado Rockies get almost a run and a half (1.47) bump at home. The Rockies are a solid 19 games better at home.

Next on the list are the Florida Marlins. First off, does anyone really call them the Miami Marlins? The Marlins have a little over a run per game advantage at home (1.14). Like the Rockies, they appear to be out of the hunt for the playoffs.

The team most likely to be able to take advantage of the home field advantage in the playoffs appears to be the Oakland A’s, who are more than 3/4 of a run (0.76) better at home. The A’s have nine more games at home in the regular season. Also, they get to finish the season at Texas, who are rating only slightly ahead of Colorado, Arizona and Miami as the worst teams in baseball. The Rangers also have no effective HFA either.

Washington (0.429), Pittsburgh (0.333) and Atlanta (0.154) are the only other teams in the playoff picture with significant home field advantages.

Here are a list of the current home field advantages. Those not listed have no significant HFA (0).

Team HFA
COL 1.473564
MIA 1.14011
OAK 0.760433
SDP 0.704609
WSN 0.429156
PIT 0.33299
CHC 0.25553
CIN 0.239817
TBR 0.210181
ATL 0.153853
PHI 0.052209
TOR 0.016576

Here are the current team ratings, as we head into the final few games of the season.

Team Rating
LAA 5.137348
SEA 4.943415
OAK 4.925794
BAL 4.771614
WSN 4.513368
DET 4.462223
LAD 4.363542
SFG 4.326927
KCR 4.305852
TOR 4.229679
CLE 4.184073
TBR 4.164056
NYM 4.038186
STL 4.031671
NYY 3.997041
ATL 3.988839
PIT 3.944171
MIL 3.930484
MIN 3.825875
CIN 3.783494
HOU 3.759729
BOS 3.720102
PHI 3.676738
CHW 3.599442
CHC 3.442758
SDP 3.341322
TEX 3.337711
MIA 3.30692
ARI 3.278469
COL 2.669157

The next step in the coming weeks will be to use the rating and home field advantage numbers to create a simulation of the playoffs.