Evaluating The 2014 Projection Systems

Nate Silver developed the PECOTA projection system. (via Randy Stewart)

The 2014 season is in the books. The San Francisco Giants once again reign as the World Series Champions. Most baseball people are looking toward the offseason and 2015. Projections are key to our understanding of both the offseason and upcoming 2015 season. There are a lot of systems to choose from, and if you’re like me you have used all at one point or another, often interchangeably. We should, however, be sure we have a good understanding of each system and how they actually work and perform. I will look back at 2014 and evaluate which projection system can be crowned the 2014 champion. First, a little background on each of the projection systems I examined.

Background

Most projection systems work largely the same way, with only minor variations. For instance, most use three to four seasons of data to calculate their forecast. However, there are nuances to each that make them unique. FanGraphs does an excellent job explaining the differences among the projection systems here. I will briefly summarize.

Marcel

Created by Tom Tango, Marcel is the simplest among the projection systems. Marcel uses only major league data, giving heavier weights to more recent seasons. It takes into account age and regression towards the mean. Marcel does not project players with no major league experience. Marcel gives explicit instructions to assign league average projections to unprojected players. Because of this, Marcel projects far fewer players than the other systems.

ZiPS

Created by Dan Szymborski, ZiPS uses weighted averages of the previous seasons. It takes into account batting average on balls in play when regressing player performance. It adjusts for age by finding historical player comparisons.

Oliver

Created by Brian Cartwright at The Hardball Times, Oliver also uses weighted averages to project players. Oliver differs in that it calculates major league equivalencies by taking in the raw numbers and adjusting based on park and league.

PECOTA

Property of Baseball Prospectus and developed by Nate Silver, rather than using weighted averages, PECOTA uses a system of historical player comparisons to calculate its projections.

Steamer

Created by Jared Cross, Dash Davidson and Peter Rosenbloom, Steamer also uses a system of weighted averages. Steamer differs in that it weights different components differently and regresses some more heavily than others. Steamer does not explicitly take aging into account.

Methodology

Data Sources

It is worth mentioning where these data came from. I downloaded the 2014 actual data via FanGraphs and removed all players who pitched that season, apologies to Adam Dunn. ZiPS and Oliver both came to me via Will Larson and the Baseball Projection Project. Due to rounding issues with third party data sources, Jared Cross himself provided me with the Steamer projections. Marcel forecasts, no longer produced by Tom Tango, are unofficially maintained by David Pinto who makes them publicly available. PECOTA was simply downloaded from Baseball Prospectus.

What Metric?

The first step was to find a common metric to look at. Given the common statistics projected by all systems considered, wOBA seemed like the obvious choice. Some projections were kind enough to include sac flies, but most did not, leaving us with the just walks, hit by pitch, singles, doubles, triples, home runs, and plate appearances. Using the 2014 wOBA coefficients, I arrived at this simplified formula:

(BB(.69) + HBP(.72) + S(.89) + D(1.28) + T(1.64) + HR(2.14))/(PA)

Merges and Missing Players

Next I had to assign unique identification to all players. This is always an arduous task, ahem Chris Young, but I was able to match most players. There was a distinction to be made between players who simply weren’t projected and players I failed to correctly match to the actual 2014 data. The percentage of total plate appearances that were not matched was pretty small. Players who were not projected or not matched were given a wOBA projection 20 points below league average. This is close to the actual mean for that subgroup of players. The Marcel projections unprojected/unmatched players were given a projection of league average performance. This table summarizes the results of the merges.

Projection Systems, Merges and Misses, 2014

System	Players Unprojected	PA per player	PA Unprojected	PA total	Share	Given
Actual	0	0	0	0	0.00%	N/A
ZIPS	34	78	2,645	177,967	1.49%	Mean-.020
Steamer	27	76	2,042	177,967	1.15%	Mean-.020
Oliver	20	78	1,562	177,967	0.88%	Mean-.020
Marcel	152	137	20,830	177,967	11.70%	Mean
PECOTA	62	74	4,602	177,967	2.59%	Mean-.020

The fact that more players were missing from PECOTA could be my fault for not matching well enough; it could also be that these players just weren’t projected by PECOTA. Take it for what you will, but this is a potential source of bias. The overall portion of plate appearances not matched was so small that whatever we projected these players at hardly affected the results.

Common Baseline

Correctly merging my data sets was the bulk of the work, yet there was still more to be done before we could have fun with it. To compare systems we need to adjust them to a common mean. We only care how a player performed in the context of the projections league average. If Mike Trout was projected to hit .425 in a league with a projected .340 mean we want to count this as the same as if he were projected to hit .400 in a league with a projected .315 mean. In 2014, disregarding pitchers, the properly weighted league wOBA was .315. To do this I first calculated the population mean of players who actually played in 2014, weighting by their 2014 plate appearances, then I filled in the missing players with a projection 20 points below the projected weighted mean. Then the mean was recalculated and scaled up to .315.

Results

And now the part you’ve been waiting for, the results. This part was simple enough; I calculated the mean absolute error, weighted by plate appearances, for each of the five systems. This is how they fared projecting the league population.

Overall

System	Mean
Actual	0.0000
ZIPS	0.0274
Steamer	0.0277
PECOTA	0.0279
Oliver	0.0280
Marcel	0.0289

The projection systems all did pretty well and, as usual, are relatively close together. ZiPS takes home the crown with the lowest mean absolute error. While Marcel comes in last, the result demonstrates Marcel’s original intended purpose; it shows us that a simple projection system will get us most of the way there in the aggregate.

What is more interesting however, is how the models performed on different subsets of players. I have split the players into groups based on experience and age as well as binary identifiers breakout and breakdown.

Experience

At the heart of it, what to do with past performance is the question all projection systems are trying to solve. Thus it is natural to group based on career playing time. I bunched players into three categories: rookies (0-300 plate appearances), middlers (300-1,800 PA), and veterans (1,800+ PA).

Rookies (n=126)

System	Mean	wOBA
Actual	0.0000	0.2909
Steamer	0.0282	0.2977
ZIPS	0.0290	0.2973
Oliver	0.0293	0.2973
PECOTA	0.0303	0.2973
Marcel	0.0346	0.3069

The rookies are interesting because for the most part these projections were going off minor league data, so we want to see who best translated minor league performance to major league performance. Steamer was able to break away from the pack here and projected these unknown quantities quite impressively. On the whole, Steamer was not far off its overall performance. Meanwhile, due to the fact that it does not take into account minor league data at all, Marcel unsurprisingly over-projected this group of players and performed the worst.

Middlers (n=217)

System	Mean	wOBA
Actual	0.0000	0.3127
PECOTA	0.0273	0.3092
ZIPS	0.0278	0.3080
Oliver	0.0282	0.3107
Steamer	0.0285	0.3094
Marcel	0.0296	0.3102

The middling players are those with some major league experience, but not the full range that most of the projections like to use. PECOTA did well here, perhaps because history is a better indicator than the other systems’ algorithms on such a small sample of major league data. Again, without the full three seasons of data available, Marcel lagged behind the pack.

Veterans (n=296)

System	Mean	wOBA
Actual	0.0000	0.3221
ZIPS	0.0269	0.3239
Steamer	0.0271	0.3230
Marcel	0.0271	0.3203
Oliver	0.0276	0.3221
PECOTA	0.0278	0.3231

The veterans have over 1,800 career plate appearances and for the most part have played the full three seasons to be used by the projections. This is where Marcel got to shine. Marcel slips in right into the middle of the pack here tied with Steamer. Notice now the best, ZiPS, and the worst, PECOTA, are only .009 away from each other compared to .064 with the Rookies and .023 with the middlers. When players rack up larger sample sizes, we can project them much more accurately.

No one system stands out above ZiPS, our overall winner, in any bracket. When it comes to experienced players you can’t go wrong, use whichever system is available to you, but when it comes to the rookies steer clear of Marcel and opt for ZiPS if you can.

Age

Age is another interesting subgroup to look at because different projections handle player aging differently. PECOTA and ZiPS rely on historical comparisons while Marcel uses an age factor. Steamer does not explicitly take aging into account at all. I examined how the projections did for really young players, really old players, and each age in between.

Mean Absolute Error

Age	Actual	PECOTA	Marcel	Oliver	Steamer	ZiPS
24 and Below	0	0.0297	0.0311	0.0315	0.0288	0.0294
25	0	0.0292	0.0317	0.0267	0.0311	0.0288
26	0	0.0319	0.0343	0.0323	0.0318	0.0329
27	0	0.0292	0.0325	0.029	0.0285	0.0283
28	0	0.0283	0.0261	0.0238	0.0281	0.0271
29	0	0.0309	0.0343	0.0331	0.0308	0.0324
30 and Above	0	0.0252	0.0252	0.026	0.0249	0.0243

Mean wOBA

Age	Actual	PECOTA	Marcel	Oliver	Steamer	ZiPS
24 and Below	0.3108	0.3066	0.3135	0.3133	0.3100	0.3135
25	0.3016	0.3040	0.2995	0.3030	0.3062	0.3010
26	0.3230	0.3090	0.3085	0.3096	0.3094	0.3074
27	0.3143	0.3148	0.3148	0.3164	0.3166	0.3154
28	0.3202	0.3179	0.3197	0.3187	0.3152	0.3173
29	0.3159	0.3157	0.3133	0.3092	0.3115	0.3123
30 and Above	0.3168	0.3215	0.3194	0.3191	0.3205	0.3201

There are a few things to take away from examining the breakdown by age. The first is that projections do better on older players. Given our results above, this should come as no surprise. It appears that for all systems the challenges of dealing with player aging are more than outweighed by the advantage gained by the added data to draw on.

Among the oldest players ZiPS takes the cake with Steamer coming in second and PECOTA and Marcel tied for third. There seems to be no direct connection with the performance among these players and the specific method used to account for aging. ZiPS did the best by using historic equivalences, but Steamer came in second without explicitly looking at age at all. PECOTA, also using historic equivalences tied exactly with Marcel who uses a simple age factor.

Interestingly Oliver, which does better than Marcel overall, struggled with the extremes. Oliver came in dead last among the youngest and among the oldest players. This indicates that perhaps Oliver needs to revise the way it takes into account age.

Overall ZiPS does the best on older players and Steamer does the best on youngsters. However, the difference isn’t large enough for me to want to go out of my way to use two different projection systems for young and old players. I’ll stick to our overall winner and use ZiPS.

Breakout and Breakdown Players

Another thing we might be interested in is how each system did at predicting the extremes. I examined how each system fared in projecting breakout players. I defined a breakout player as a player whose wOBA increased by 30 points or more from 2013 to 2014. These could be young guys coming into their own or veterans coming off of an injury-plagued season.

Breakout

System	Mean	wOBA
Actual	0.0000	0.3411
PECOTA	0.0374	0.3080
Marcel	0.0409	0.3054
Oliver	0.0413	0.3031
Steamer	0.0373	0.3081
ZiPS	0.0381	0.3078

All projections will be cautious to (or won’t at all) project something drastically different from what they have done in the recent past. Thus, none did a very good job predicting a breakout. This is one area where it may be better to use subjective measures.

We do, however, see two similar systems that use historical equivalences, ZiPS and PECOTA, do well. However, Steamer did the best without using historical equivalences. With both the systems that incorporate historical equivalences doing well we might want to start to think there is something to equivalency systems doing better at predicting large swings in performance.

Similarly, I looked at how each system did on the opposite type of players, players who experienced breakdown seasons. A player was flagged to have a breakdown season if his 2014 wOBA had decreased 30 points or more.

Breakdown

System	Mean	wOBA
Actual	0.0000	0.2885
PECOTA	0.0383	0.3120
Marcel	0.0408	0.3128
Oliver	0.0397	0.3158
Steamer	0.0384	0.3122
ZiPS	0.0378	0.3136

Again we see the same three systems as the top performers, this time with ZiPS taking the number one spot, and PECOTA as the runner up. This is more evidence that when looking for big advances or declines in players, using equivalences might be the way to go.

As a whole, the systems do a slightly better job at predicting breakdowns than breakouts, but not by as much as I would have expected. Intuitively it makes sense that a breakdown is easier to predict than a breakout, but in reality both are challenges for algorithms that tend towards the mean. It appears that systems like ZiPS and PECOTA will do slightly better for this type of player. If you are looking at players you think are about to do something unusual, I would tend toward ZiPS or PECOTA.

Variation

One note in defense of Oliver is that Oliver’s projections have the largest variation, which may be preferable to some when choosing a projection system. Think about if you have one system that projected all players to be league average and another that varied. If they both had the same absolute error, you would prefer the one that includes more variation; the one with no variation essentially tells you nothing (and if you adjust to league average like I did, it does tell you nothing). This final table summarizes the variation of the adjusted projection system. The mean for all the systems was .3152 after being adjusted to league average, what is important here is the spread in the projections.

Adjusted Projections Summary

System	St. Dev.
Actual	0.0444
ZIPS	0.0285
Steamer	0.0287
Oliver	0.0317
Marcel	0.0266
PECOTA	0.0280

Conclusions

Overall projection systems may be improving as their creators revise their algorithms. In Tango’s study on 2007-2010 projections Marcel was right in the mix with the others, but four years later we see some more separation. No matter how you slice it, all these projections do a fine job and the differences between them are subtle, but not negligible. We saw that among inexperienced players Marcel struggles and should be avoided. For players who have racked up lots of at-bats and years, it’s hard to go wrong, but ZiPS performed the best. Again ZiPS proved to be the best when looking at breakouts and breakdowns with PECOTA also doing well. This gives credence to the idea that historical equivalences are especially useful for predicting players who are about to break from a normal trajectory.

Given all of the above we can decisively say that in 2014 ZiPS did the best job. It performed the best overall and in most of our subsets while never stumbling into the bottom half. You can’t go wrong with any of these systems, but as we look ahead to 2015, I’ll be using ZiPS.

References and Resources

ZiPS and Oliver provided Baseball Projections Project.
PECOTA provided by Baseball Prospectus.
Marcel provided by David Pinto.
Steamer provided by Jared Cross.
The general structure of my evaluation was based on a 2010 study on projections by Tom Tango.

Evaluating The 2014 Projection Systems

Background

Marcel

ZiPS

Oliver

PECOTA

Steamer

Methodology

Data Sources

What Metric?

Merges and Missing Players

Common Baseline

Results

Experience

Age

Breakout and Breakdown Players

Variation

Conclusions

References and Resources

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112