Team WAR/pos and WAR/pitch, 1995-2013

Continuing my series on WAR in the wild card era, this post looks at team WAR, broken out among position players and pitchers. There’s a long preface here, dealing with terms and such; those in a hurry can scroll down to the first major heading, “WAR/pos and WAR/pitch.” (Just look for the charts.)


Terms and Conditions

WAR/pos = WAR from position players

WAR/pitch = WAR from pitchers

wWAR = Team Wins predicted by WAR = Team WAR + (Games x .294)

  • .294 is the winning-percentage definition of replacement level, equivalent to 47.6 wins in 162 games.
  • So, for a 162-game schedule, wWAR = Team WAR + 47.6.

wPyth = Pythagorean wins

wAct = Actual wins

All figures herein are based on a 162-game season.

wWAR Correlates Best with wPyth

wWAR should correlate better with Pythagorean wins than with actual wins, because WAR is meant to account for the building blocks of runs scored and runs allowed, which are the components of wPyth. Actual wins varies from wPyth due to different efficiencies of (a) turning those building blocks into actual runs (like the Cardinals’ .330 average with RISP this year) and (b) turning runs into actual wins (like the 2012 Orioles’ incredible one-run record). And grit, of course. WAR is mostly blind to those things.

Anyway, wWAR does correlate better with Pythagorean wins: For 1995-2013, the average difference between wWAR and wPyth was plus/minus 3.3, while the average difference between wWAR and wAct was 4.3. Therefore, when I look at the relationship of team WAR to performance, I mean wPyth.

The correlation of wWAR and wPyth holds pretty steady across the spectrum of team quality. I found the average gap between wWAR and wPyth for five tiers of teams, based on their wPyth. Tier 1 is the top 10% of all teams, by wPyth; Tier 2 is the next 20%; Tier 3 is the middle 40%; Tier 4 is the next 20%; and Tier 5 is the bottom 10%. The average gap between wWAR and wPyth (plus or minus) ranged from 3.6 for Tier 1, down to 3.2 for Tier 3, and back up to 3.5 for Tier 5.

The straight averages of wPyth and wWAR of those tiers:

  • Tier 1: 98.4 wPyth … 99.0 wWAR … +0.6 wWAR
  • Tier 2: 90.6 wPyth … 90.5 wWAR … -0.1 wWAR
  • Tier 3: 81.5 wPyth … 81.5 wWAR … No difference
  • Tier 4: 71.0 wPyth … 71.2 wWAR … +0.2 wWAR
  • Tier 5: 63.2 wPyth … 63.7 wWAR … +0.5 wWAR

I see no obvious tendency for wWAR to favor or disfavor any particular caliber of team, at least in this era.

Leagues Apart

The average American League team has 6.6 WAR more than the NL average, 37.1 to 30.5. The WAR formula for this era explicitly favors AL teams, due to their .524 W% in interleague play, equivalent to 85-77 in a team season. (See B-R’s “War Explained,” 2nd paragraph below the graph; the numbers there are out of date after this year’s lowering of replacement-level W% to .294.)

AL superiority creates just a slight difference in wAct or wPyth, because interleague play is a small part of the schedule. But the edge is applied to their whole season WAR: they’re still better teams, even while playing each other to a net .500 record. Consequently, AL teams average 3.2 more wWAR than wPyth, while NL teams average 2.6 less.

This bias creates some of the gap between wWAR and wPyth. Without it, the average gap between wWAR and wPyth falls to about +/- 2.1, strengthening the correlation.

Is the size of the AL’s WAR edge appropriate? Applying their interleague W% to 162 games yields an AL edge of 7.8 wins, while the WAR edge is 6.6. It would be hard to argue that the WAR formula is too generous to the AL.

I have chosen to disregard the AL/NL gap for most parts of this study.

What WAR Misses

Clutch hitting. That’s because offensive WAR is based on offensive components (singles, doubles, etc.), not actual runs. Teams that score many runs more or less than those components suggest should have the biggest gaps between wPyth and wWAR, and vice versa. These teams had the biggest gaps between wPyth and wWAR:

  • +11.7, 2013 Cardinals — Did I mention their historic .330 BA with men in scoring position?
  • -10.9, 2001 Angels — AL’s worst in all RISP slashes; RISP BA was .022 less than with bases empty.
  • -10.6, 2010 Mariners — AL’s worst in all RISP slashes, though not much worse than they were with bags empty. This was the worst offensive team in the period, by most measures; maybe the formula is less accurate with such extremes?
  • -9.8, 1998 Rays — Worst in this era for RISP BA relative to overall BA. And, although that suppressed their wPyth, they still undershot their wPyth by 4, giving them the worst (wAct-wWAR) for the period. But I’ve digressed long enough; on with the post!


WAR/pos and WAR/pitch

There are 564 teams in this pool. Per 162 games, the average team had:

  • 33.6 total WAR
  • 19.9 WAR/pos (59.3% of total)
  • 13.7 WAR/pitch (40.7%)

The 3:2 ratio of WAR/pos to WAR/pitch reflects that some of run prevention is attributed to defensive WAR, which goes to position players. But in another sense, there’s almost as much WAR “in play” on both sides, since the spread from the highest to lowest figures is similar: 52.0 in WAR/pos (from 50.9 for the sublime 2001 Mariners to -1.1 by the ridiculous 2013 Marlins), 49.0 in WAR/pitch (from 37.2 by the 2011 Phillies to -11.8 by the 1995 Giants). The discrepancy between the 3:2 ratio and the similar spread is because negative values are far more common in WAR/pitch than WAR/pos (16 to 3 for teams in this pool).

The average WAR figures for the five tiers based on total WAR:
(Tier 1 = top 10%, Tier 2 = next 20%, Tier 3 = middle 40%, Tier 4 = next 20%, Tier 5 = bottom 10%)

Team WARpos and WARpitch numbers 1995-2013

(Note that numbers do not always add up, due to rounding.)

And the percentages of WAR/pos and WAR/pitch:

Team WARpos and WARpitch percentages 1995-2013

For Tiers 1-3, the proportions of WAR/pos and WAR/pitch are nearly identical, and quite close to the overall average. Tier 4 tilts a tad more towards Pos, and Tier 5 even more so; most really bad teams have disastrous pitching. But even with that dip at Tier 5, the old saw that pitching is the shortest path to success is not borne out by this data. The increments between tiers mostly tilt toward WAR/pos:

Team WAR increments between Tiers 1995-2013

The gap from the middle pack to the elite is 11.2 WAR/pos, 7.7 WAR/pitch; from bottom to top, it’s 20.8/17.3. Every team has different needs, so there’s no “right” way to improve. But if recent history is a guide, then a team wanting to reach the next level is likely to need more gains on the positional side than in pitching.


Among Tier 1, the highest concentrations of WAR in pos or pitch:

Heavy on WAR/pos:

  • 2001 Mariners, 75% (68.0 WAR, 50.9 pos/17.1 pitch), record 116-46 — The team that won the most games in this era (or any) also scored the most WAR and WAR/pos (by far), but ranked 6th in AL WAR/pitch.
  • 1999 Indians, 74% (47.8, 35.2/12.6), record 97-65 — The only team since 1950 to score 1,000 runs, but ranked 10th in AL WAR/pitch.
  • 2011 Rays, 74% (50.2, 36.9/13.3), record 91-71, and (next on the list) 2010 Rays, 73% (49.2, 35.8/13.4), record 96-66 — These Tampa teams were above-average in run prevention but subpar in WAR/pitch, as much of the credit went to their innovative defense (#1 in dWAR for the two years combined). These teams ranked 2nd and 3rd in AL offensive WAR, but dWAR made them 1st in WAR/pos.

Heavy on WAR/pitch:

  • 2011 Phillies, 70% (53.2 WAR, 16.0 pos/37.2 pitch); record 102-60 — By far the highest concentration in WAR/pitch for any very good team in this era, maybe in any era. Roy Halladay, Cliff Lee and Cole Hamels ranked #1, 2 and 6 in MLB WAR/pitch, averaging 8.0 WAR and a 2.51 ERA, and the team’s WAR/pitch and 3.02 ERA were this era’s best by far. Philly led the majors in wins and run differential, but they were NL-average in OPS+ and 11th in WAR/pos.
  • 1997 Yankees, 58% (56.5, 23.7/32.8); record 96-66 — 2nd-best WAR/pitch in this era, but 6th in AL WAR/pos that year. Defensive WAR hated them, especially their outfield.
  • 2008 Cubs, 58% (50.3, 21.1/29.2); record 97-64 — Clear NL leader in WAR/pitch, but 6th in WAR/pos. Cubs led NL in ERA+, and their WAR/pitch was further boosted by a subpar defensive WAR.

It’s probably a coincidence, but despite excellent records, none of these seven teams reached the World Series, and all but Seattle lost in the first round. We’ll look at more World Series stuff in another post.


One-year Turnarounds

Intuitively, pitching seems more volatile — a few pitchers often control the bulk of a team’s WAR/pitch, and pitchers show more year-to-year variance in performance, in part due to a higher injury rate than position players. So one might think pitching plays the lead role in sudden team improvement. But actually, teams with sharp one-year gains in total WAR got a little more of that edge from WAR/pos:

  • +10 or more WAR from a year ago: +7.8 WAR/pos, +7.2 WAR/pitch
    (96 teams, avg. +15.0 WAR)

Breaking that into two classes:

  • +15 or more WAR from a year ago: +10.2 WAR/pos, +9.4 WAR/pitch
    (37 teams, avg. +19.6 WAR)
  • +10.0 to 14.9 WAR from a year ago: +6.2 WAR/pos, +5.8 WAR/pitch
    (59 teams, avg. +12.0 WAR)

In sharp one-year declines, the role of WAR/pos is even bigger:

  • -15 or more WAR from a year ago: -11.0 WAR/pos, -9.1 WAR/pitch
    (33 teams, avg. -20.1 WAR)
  • -10.0 to 14.9 WAR from a year ago: -7.8 WAR/pos, -4.2 WAR/pitch
    (51 teams, avg. -11.9 WAR)

And again, while it might seem easier to achieve sharp gains in pitching, it just isn’t so:

  • +10 or more in: WAR/pos, 61 teams; WAR/pitch, 46 teams
  • +15 or more in: WAR/pos, 22 teams; WAR/pitch, 18 teams

Incidentally, teams that made sharp gains in either category tended to lose a little ground in the other one:

  • +10 or more WAR/pos: average +14.0 WAR/pos, but -2.8 WAR/pitch
  • +10 or more WAR/pitch: average +14.2 WAR/pitch, but -1.9 WAR/pos

This isn’t surprising, since teams often address a weakness by dealing from their strength.


An Optimal Ratio?

Could there be hidden value in having a certain ratio of WAR/pos to WAR/pitch, creating more wins (Pythagorean or actual) relative to total WAR?

I sorted the teams by their Pos/Pitch ratio (highest to lowest), then split them into three groups of 188 teams: Pos-Heavy, Normal, and Pitch-Heavy. For each group, I found the average net wins above WAR wins — in terms of both wPyth and wAct — and expressed that as compared to the average of other two groups.

In this first chart, the main thing is the small scale of the differences; none approaches one full win:

Net W over wWAR for 3 Pos-Pitch ratio Tiers

What to make of this? While wPyth seems to favor Pos-Heavy and disfavor Pitch-Heavy, the Pitch-Heavy teams fare best in wAct.

I won’t look for much meaning in the actual wins data. I think gaps between actual wins and wWAR mainly come from clutch hitting and one-run efficiency, neither of which has been shown to be a repeatable skill, in the main.

The wPyth data seem a bit counterintuitive, because of the Pythagorean formula itself: A given amount of runs improvement has more wPyth impact as run prevention than as run production. Take an average 2013 NL team, with 650 runs scored and allowed; their wPyth is 81. Add 80 runs scored, their wPyth is 89.6. But if you instead subtract 80 runs allowed, their wPyth is 90.7.

That’s not a perfect model, because runs scored and allowed are not perfect stand-ins for WAR/pos and WAR/pitch. Some of WAR/pos is defensive WAR, which is run prevention. Still, offensive WAR is the lion’s share of WAR/pos. So it’s surprising to see Pos-Heavy teams fare better in wPyth than Pitch-Heavy.

But again, the actual differences are small.

To refine this approach, I split the teams into five equal groups of 113 teams, again by their Pos/Pitch ratio: Hi-Pos, Pos, Normal, Pitch, and Hi-Pitch. Again, we’re comparing each group’s net wins above wWAR against the average of the other groups:

Net W over wWAR for 5 Pos-Pitch ratio Tiers

Now, the scale is bigger, and the picture’s a bit more consistent between wPyth and wAct. Teams skewing somewhat towards WAR/pos fare best in both wPyth and wAct. But this one seems odd: Those with the most normal proportions fare worst in both wPyth and wAct.

Lastly, the reverse approach: Now the five tiers are based on net wins above wWAR, for wPyth and wAct. The graph shows each tier’s average WAR/pos as a percentage of total WAR. The first set is based on net wPyth, the second on net wAct:

Pos-Pitch ratio for 5 tiers Net W over wWAR

As in the first Net Wins chart, very different results for Pythag wins and actual wins. Those faring best in wPyth are skewed towards WAR/pos — but so are those faring worst in wAct.

But again, the scale of the differences is pretty small. The entire range in this chart is just 57.0% to 62.8%. Six out of seven teams in this pool are outside that range.

I’d love to hear your interpretations. For now, I think it’s mostly noise. Consider that second Net Wins chart, especially the Pythag side. Can you build a narrative for those first three bars? Suppose you take a team in the Hi-Pos group, and then start steadily trading small bits of WAR/pos for WAR/pitch. If these data mean something, then somewhere along that progression, your Pythagorean expectation goes up measurably; but watch out! — if you trade yourself into a normal ratio, your wPyth plunges. That just doesn’t smell right.

Well, we’ve come to the end of another long post. I hope you enjoyed it, and please share your thoughts and suggestions below!



Team WAR/pos and WAR/pitch, 1995-2013 — 4 Comments

  1. I’m focusing for the moment on the interesting result that shows up in your second-to-last multi-color graph and which you point out: that is, the slightly disappointing win rate (both pythag and actual) for teams that are well-balanced between pitching and hitting, as opposed to those that emphasize one of the other. I wonder if this odd-seeming result might be a bit of evidence for something you were looking for in your first post in this series: a disproportionate team benefit from the presence of star-level players.

    I would hypothesize that teams with more star level players are less likely to be well-balanced. It’s presumably difficult to produce multiple star seasons, on one team in one year, in enough quantity to get a nice balance between pitching and hitting talent. So it may be that those well-balanced (and on average slightly underperforming) teams are relatively light, on average on big star seasons. If stars tend to equal imbalance, and imbalance tends to equal a bit better team performance, maybe that is suggesting that the stars are causing a (slightly disproportionate) improvement in team performance.

    Of course as you also point out, the differences are small, and I may be looking for signal where there’s just noise.

  2. John this is valuable stuff — something keeps happening to my browser as i attempt to post, so i’ll try one more time, with desperate brevity
    – pitch-heavy teams outperforming their War projection actually may not just be noise, but a bias in the WAR calculations, since something outside of WAR s need to set the proportion of WAR attributed to offecnse and defense.
    – what is the role of below-replacement performance in the mix — contrast the 2013 Braves with the 2013 White sox
    both teams had 2 stars producing 6.8 and 5.4 WAR (approx) -they were approximately 10 total WAR apart, but 33 wins apart in the standings– both clubs underperformed their pythag wins, by the way –
    baaaaaat the sox gave 2000+ PA to position players that performed below replacement —

    one example doesn’t mean anything , of course, but if we are going to tease a signal out of all this noise — I think we have to look at extremes, not averages – i remember Bill James back in the 1980s toying with something he called signature significance- or some such – don’t look at no-hitters, which are relatively common – but look for things way out on the tail of the distribution- his example was a rookie pitcher named Clemens who had struck out 15 while walking none in a CG — this almost never happens , and is much more reliable evidence that this particular rookie was good, than if he had pitched a no -hitter.

    the Braves Had a GREAT shortstop – maybe Simmons fielding runs are an underestimate, since we have little data from players that good to go on? —

    the white sox had 2 regulars with an RS% under 20- does WAR sewriously overestimate the value of their OBP given they almost never score from the basepaths?

    sorry to clutter your great data with more questions, but my guess is the signal is hiding in the extremes ,not the averages, I just don’t have the chops to tease it out.

  3. sorry for typos in @2 , but racing against browser failure fun stat = scoring average ( SA ) defined as
    (RS-HR)/ (H+BB+HBP-HR) expressed as an average , like batting average , say .368 (Tim Raines Career) or .163
    (Adam Dunn 2013) Bref has it as RS% as a baserunning stat , expressed as a percentage (16% , Dunn, 2013) which is less fun…

  4. Tour de force, John. A very welcome New Year’s treat.

    Valuable result to show that win expectancy derived from team WAR does match up well with actual and Pythag wins. Lends more support to WAR as a primary analysis tool.

    Would WAA correlate even better than WAR to actual wins? If so, I suspect this is where you might see an optimal pitch/hit balance for championship teams. To birtelcom’s point @1, having team WAR mostly concentrated in a few star players is probably better than having it evenly spread over the whole team.

    You mentioned in passing that saving a given number of runs while scoring the same shows more Pythag win improvement than scoring the same amount more runs while giving up the same. But, the data didn’t reflect that as high offense teams with average pitching seemed to fare better than the reverse. I think the Pythag observation is correct and that more defense/same offense works better than more offense/same defense. But, it may be difficult to see this effect in the wild card era of high run scoring.

Leave a Reply

Your email address will not be published. Required fields are marked *