In 1996, Roger Clemens had an offseason by his standards, off enough that it may have been the spur to get him on steroids.  It certainly earned him a ticket out of Boston, off to a new team and a career rebirth in Toronto, and at the time, his departure might not have seemed unwarranted. By traditional metrics, 1996 was but a mediocre prelude for Clemens to winning back-to-back Cy Young awards and going 41-13 with a 2.33 ERA over 1997 and 1998. Clemens went 10-13 with a 3.63 ERA for the Red Sox in 1996, walking the most batters of his career with 106. Pushing 35, he looked to be on the decline, a shell of his once-dominant self.

Clemens did lead the American League in strikeouts in 1996 with 257. And in hindsight, we also know that he led the AL in strikeouts per nine innings with 9.5 and finished second in WAR with 7.7. In fact, it’s one of the best  losing seasons for a starting pitcher in baseball history.

One of my colleagues here, Doug, did a post a few days ago on if Matt Cain was the unluckiest pitcher ever. The post got me thinking. Doug looked at Cain’s career numbers compared to other unlucky hurlers, so I decided to take another look and compile some of unluckiest seasons for pitchers in baseball history.

The first is the ten highest WARs posted by starting pitchers with sub .500 winning percentages:

Rk Player SO WAR ▾ W-L% Year Age Tm G GS CG SHO W L SV IP H R ER BB ERA ERA+
1 Ed Walsh 258 8.7 .474 1910 29 CHW 45 36 33 7 18 20 5 369.2 242 87 52 61 1.27 189
2 Jon Matlack 195 8.6 .464 1974 24 NYM 34 34 14 7 13 15 0 265.1 221 82 71 76 2.41 149
3 Phil Niekro 262 8.5 .444 1977 38 ATL 44 43 20 2 16 20 0 330.1 315 166 148 164 4.03 111
4 Dave Roberts 135 8.5 .452 1971 26 SDP 37 34 14 2 14 17 0 269.2 238 79 63 61 2.10 157
5 Roger Clemens 257 7.7 .435 1996 33 BOS 34 34 6 2 10 13 0 242.2 216 106 98 106 3.63 139
6 Turk Farrell 203 7.4 .333 1962 28 HOU 43 29 11 2 10 20 4 241.2 210 91 81 55 3.02 124
7 Nap Rucker 151 7.4 .462 1912 27 BRO 45 34 23 6 18 21 4 297.2 272 101 73 72 2.21 151
8 Ned Garver 85 7.1 .419 1950 24 SLB 37 31 22 2 13 18 0 260.0 264 120 98 108 3.39 146
9 Irv Young 156 7.0 .488 1905 27 BSN 43 42 41 7 20 21 0 378.0 337 146 122 71 2.90 106
10 Bert Blyleven 219 6.7 .448 1976 25 TOT 36 36 18 6 13 16 0 297.2 283 106 95 81 2.87 125


And here’s a list that looks at the ten best ERA+ scores for starting pitchers with losing records and at least 162 innings pitched:

Rk Player SO ERA+ ▾ W-L% IP Year Age Tm G GS CG SHO GF W L SV H R ER BB ERA
1 Ed Siever 36 197 .421 188.1 1902 27 DET 25 23 17 4 2 8 11 1 166 73 40 32 1.91
2 Ed Walsh 258 189 .474 369.2 1910 29 CHW 45 36 33 7 7 18 20 5 242 87 52 61 1.27
3 Ben Sheets 264 162 .462 237.0 2004 25 MIL 34 34 5 0 0 12 14 0 201 85 71 32 2.70
4 Hal Newhouser 103 162 .364 183.2 1942 21 DET 38 23 11 1 14 8 14 5 137 73 50 114 2.45
5 Joe Magrane 100 161 .357 165.1 1988 23 STL 24 24 4 3 0 5 9 0 133 57 40 51 2.18
6 Dave Koslo 64 160 .440 212.0 1949 29 NYG 38 23 15 0 12 11 14 4 193 72 59 43 2.50
7 Curt Schilling 194 159 .471 168.0 2003 36 ARI 24 24 3 2 0 8 9 0 144 58 55 32 2.95
8 Ned Garvin 94 159 .238 193.2 1904 30 TOT 25 24 16 2 1 5 16 0 155 85 37 80 1.72
9 Dave Roberts 135 157 .452 269.2 1971 26 SDP 37 34 14 2 3 14 17 0 238 79 63 61 2.10
10 [tie] Dolf Luque 140 156 .471 291.0 1925 34 CIN 36 36 22 4 0 16 18 0 263 109 85 78 2.63
10 [tie] Dutch Leonard 92 156 .414 225.2 1948 39 PHI 34 30 16 1 2 12 17 0 226 85 63 54 2.51


And here are the ten best strikeouts per nine inning rates for starting pitchers with losing records and 162 innings pitched:

Rk Player SO SO/9 ▾ W-L% IP Year Age Tm G GS CG SHO GF W L SV H R ER BB ERA ERA+
1 Nolan Ryan 270 11.48 .333 211.2 1987 40 HOU 34 34 0 0 0 8 16 0 154 75 65 87 2.76 142
2 Curt Schilling 194 10.39 .471 168.0 2003 36 ARI 24 24 3 2 0 8 9 0 144 58 55 32 2.95 159
3 Nolan Ryan 327 10.35 .486 284.1 1976 29 CAL 39 39 21 7 0 17 18 0 193 117 106 183 3.36 99
4 Randy Johnson 241 10.31 .462 210.1 1992 28 SEA 31 31 6 2 0 12 14 0 154 104 88 144 3.77 105
5 Sandy Koufax 197 10.13 .381 175.0 1960 24 LAD 37 26 7 2 7 8 13 1 133 83 76 100 3.91 101
6 Ben Sheets 264 10.03 .462 237.0 2004 25 MIL 34 34 5 0 0 12 14 0 201 85 71 32 2.70 162
7 Nolan Ryan 260 9.97 .435 234.2 1978 31 CAL 31 31 14 3 0 10 13 0 183 106 97 148 3.72 98
8 Andy Benes 189 9.87 .300 172.1 1994 26 SDP 25 25 2 2 0 6 14 0 155 82 74 51 3.86 107
9 Jonathan Sanchez 177 9.75 .400 163.1 2009 26 SFG 32 29 1 1 2 8 12 0 135 82 77 88 4.24 101
10 Jake Peavy 215 9.56 .440 202.1 2006 25 SDP 32 32 2 0 0 11 14 0 187 93 92 62 4.09 99


Is this to suggest every man on these lists got screwed by his respective team? Maybe not. A number of factors can influence a pitcher’s win-loss record, and WAR, ERA+ and K/9 are all relative metrics that have varied between different eras in baseball history. Still, they offer a glimpse at pitchers who might have thrived in better environs.

Subscribe to: RSS feed

97 Comments

  1. 1
    AlbaNate says:

    Maybe better than K/9 for that last list would be K/BB. Ryan usually had a lot of both.

  2. 2
    Doug says:

    Nobody is on all three lists, but Walsh, Roberts, Schilling and Sheets are on two of them.

    What a difference a decade made for Ryan. Comparing his 1987 and 1976 seasons, his K/9, a league-leading 10.4 in 1976, was still 10% higher as a 40 year-old in 1987. And, but his BB/9, even at an elevated 3.7, was down by more than a third. Net result: ERA+ up 43%.

    • 3
      Doug says:

      Other big differennce – Ryan had 21 CG in 1976, none in 1987.

      • 4
        Graham says:

        Strange, since Ryan averaged five complete games a season the next three years.

        I took a look at Houston’s bullpen in 1987, and it wasn’t anything particularly special. I may need to dive into Retrosheet a bit. I’m curious how many times Ryan left with a lead and lost a win because of that bullpen.

        • 8
          Evan says:

          Ryan was 8-16 with a league leading ERA in 1987. Clearly his team wasn’t scoring runs for him. It being the NL and his team being behind he was probably pinch hit for in many games he might have completed had his team scored a few more runs. It doesn’t explain 0, but lack of run support should definitely suppress CG in the NL.

  3. 5
    no statistician but says:

    My problem with accepting WAR (Wins Above Replacement) as a very accurate assessment of pitching—if you accept it for what it purports to be—is exemplified by Clemons 7.7 in this season. I simply can’t accept the notion that any credible replacement for Clemons would have a record of 2 and 20 on a team that won more games than it lost. WAR seems to favor pitchers of a particular range of skills and/or circumstance, and I don’t argue that it tells us something about those things. I don’t think it says much about who is the most valuable pitcher in a given season, however, and that goes against the consensus view of posters here.

    I’ll remain a heretic.

    • 7
      Graham says:

      WAR certainly has its limitations, such as Aloysius Travers’ -2.1 lifetime mark for his one-game career.

      I think there are dangers in shooting for absolutes in baseball research. I think stats like WAR are best used to promote better understanding of the game and its history rather than a be-all, end-all.

    • 15
      Neil L. says:

      No stat, you are not a heretic at all. You must hold our feet to the fire and give the metrics a reality check.

      When you put Clemon’s WAR in such stark replacement terms, it does expose the limitations of pitcher’s WAR.

      I’m still a believer, though, not a heretic.

    • 62
      BryanM says:

      No S but.. ; Scepticism is always an appropriate reaction when one is confronted with a complex chain of math reasoning, so Keep on with it. That said, Accepting that WAR for pitchers has limitations, and it does, It is not trying to say that a replacement pitcher would have gone 2-20 in Clemens particular starts; It says nothing about what would have happened with another pitcher in those individual games. There are several gaps to close to get to what it is saying. First, it is talking about team wins , not wins attributed to the starting pitcher; Clemens started 34 games in 1996, and pitched 257 innings, an excellent 7 2/3 innings per start. His WAR of 7.7 is an estimate that a replacement pitcher pitching as often and as many innings as RC for a hypothetical average team (not the Sawks) playing in Fenway in American League 1996 scoring conditions, would , on average , if you ran a large number of simulations , lose about 7.7 more games per year than that same hypothetical team would have with Roger on the mound.
      Every statistic is an estimate; my main beef with WAR is that there is no attempt to include an estimate of a confidence interval As in : we are 90 % confident that Roger’s 1996 WAR is 7.7 plus or minus 3.1. i made the 3.1 up, but accepting it for the sake of argument, It would say we are very confident that Roger was better in 1996 than the guy they didn’t call up from Pawtucket , but have no clue whether he was better than somebody with a WAR of 6.1

      • 63
        no statistician but says:

        Let’s see what minds greater than ours have to say, not on precisely this topic, but on the nature of what lies beneath the strategies and assumptions involved in something like “a complex chain of math reasoning.”

        John Stuart Mill: “Strange it is that men should admit the validity of the arguments for free discussion, but object to their being ‘pushed to an extreme’; not seeing that unless the reasons are good for an extreme case, they are not good for any case.”

        It is the last half of Mill’s argument that I believe applies here.

        Alfred Korzybski: “The map is not the territory.”

        WAR is a map of sorts, and it attempts to depict the territory of baseball performance, but it isn’t baseball performance itself. Measurements of reality aren’t reality, but at least a home run is a home run. A complex chain of math reasoning created with—forgive me but it is true—suppositions about what is or is not important and what values to assign, no matter how carefully considered, is never going to depict with accuracy real events. Biasses are built in to the endeavor, and the endeavor, no matter how useful, can’t be relied upon for accuracy, or at least not the sort of accuracy the proponents of WAR, especially for pitchers, maintain. All the recent discussion of “Luck” on this site is really an offshoot of the defects of statistical measures that for some damn reason don’t come up with the right results.

        • 64
          Neil L. says:

          No stat, I haven’t a dose of philosphophy and logic like your post since my university days. :-)

          So which side do you come down on with respect to WAR? Are we relying on it blindly to provide a degree of certainty that it is not capable of providing or is it still a step forward from traditional counting stats?

          • 65
            no statistician but says:

            My opinion, since you ask, is that the problem lies with uncritical acceptance of it as a perfect measure for all things. It works far better for batting stats, I feel, than pitching and fielding, although, even with batting, where there is far more that is genuinely quantifiable, someone still is deciding what weight to give, what to include or ignore. Used alongside traditional stats but not as a replacement, WAR gives a view that can help clarify what went on in general terms, just as OPS+ and the other mathematical formulas (if formulas is the right term) do.

            Several contributors to this site and its predecessor have suggested that the MVP and Cy Young awards should simply be handed to the player and pitcher with the highest WAR for the year. Let me point out a parallel: Andre Dawson was virtually handed the NL MVP in 1987 because of his home run and RBI production by a set of voters who saw his preeminence in those areas(HR by a margin of 5 over 2nd place and 10 over 3rd; RBI by 14 over 2nd place) as clear-cut excellence. WAR would give the award to Tony Gwynn, whose 8.1 far exceeded Dawson’s. Gwynn’s supposed excellence, though, came with a team that finished far worse than Dawson’s Cubs, who were contenders for the first 2/3 of the season, Dawson’s clutch hitting being a prime factor. I don’t think either was the most valuable, but I’d take Dawson as more valuable than Gwynn that year.

            Bill James talks about something called “park illusion.” I don’t remember what it means. But I do think that there is something that could be called WAR illusion.

            Next time you’l know better than to ask my opinion, but thanks.

        • 66
          BryanM says:

          No Stat — the map is not the territory, but anyone trying to negotiate an unknown territory would prefer a map to no map, and of course a good map to a bad map. I agree with what i think are your main points, and have a minor quibble with what I think is a side issue.
          1) WAR for pitchers is a not-very-relaible estimate of the value of the player’s contribution over time ,born of the strong desire by some people to sum up complex issues in a single number . We have little or no idea whether a player who had a 7 WAR season had a better one than a player who had a 6 WAR season.
          2) It is true as you say, that many posters apparently are unaware of the fact that a statistic is by definition an estimate, or more crudely, a guess, and tend to treat it in discourse as a fact.
          3) It is also true , in spite of the limitations of WAR, that a list of the pitchers with the highest career WAR would coincide moderately well with a consensus of informed expert opinion as to the greatest pitchers in history.
          and now the quibble – Mill was talking of logic when he referred to extreme cases – and if logic is not perfect , it is nothing ; over in the messier world of guesses, a good guess remains better than a bad guess. IMHO, WAR for Pitchers is a decent guess over a period as short as a season; certainly the “proponents” of WAR, as you call them, would say it is an excellent measure, not a decent guess , tools are not accountable for the behavior of those who misuse them.

        • 67
          BryanM says:

          this is a reply to #63 – I somehow screwed up… No Stat — the map is not the territory, but anyone trying to negotiate an unknown territory would prefer a map to no map, and of course a good map to a bad map. I agree with what i think are your main points, and have a minor quibble with what I think is a side issue.
          1) WAR for pitchers is a not-very-relaible estimate of the value of the player’s contribution over time ,born of the strong desire by some people to sum up complex issues in a single number . We have little or no idea whether a player who had a 7 WAR season had a better one than a player who had a 6 WAR season.
          2) It is true as you say, that many posters apparently are unaware of the fact that a statistic is by definition an estimate, or more crudely, a guess, and tend to treat it in discourse as a fact.
          3) It is also true , in spite of the limitations of WAR, that a list of the pitchers with the highest career WAR would coincide moderately well with a consensus of informed expert opinion as to the greatest pitchers in history.
          and now the quibble – Mill was talking of logic when he referred to extreme cases – and if logic is not perfect , it is nothing ; over in the messier world of guesses, a good guess remains better than a bad guess. IMHO, WAR for Pitchers is a decent guess over a period as short as a season; certainly the “proponents” of WAR, as you call them, would say it is an excellent measure, not a decent guess , tools are not accountable for the behavior of those who misuse them.

          • 68
            John Autin says:

            Having only skimmed this thread, I’ll have to assume that the statement “every statistic is an estimate” makes sense in the context of the exchange. Taken by itself, it’s a head-scratcher.

          • 69
            Ed says:

            Interesting discussion guys! Bryan..that’s a great point re: the lack of confidence intervals. As someone whose taken multiple classes in advanced stats, that thought never crossed my mind. I’m now going to hang my head in shame.

          • 71
            no statistician but says:

            Mill wasn’t talking about logic but about truth. Otherwise, I don’t disagree with most of what you’re saying, but I have one proviso:
            lots of players have great half-seasons or great months interspersed with mediocrity, so even as a measure of “a period as short as a season” WAR as a map can fall short of depicting the territory with accuracy.

  4. 6
    Doug says:

    I wonder if the D-Backs after the 2003 season were thinking the same way as the Red Sox after 1996.

    Schilling was soon to be 37, had lost time to injury in the year, and did have that 8-9 W-L. But, he also just had the best ERA+ season of his career, his 3rd best K/9 and 4th best WHIP. Schilling also pitched into the 6th inning of ALL 24 of his starts, and into the 8th inning in 14 of them.

    Nevertheless, Arizona just couldn’t resist Boston’s package offer of Michael Goss, Casey Fossum, Brandon Lyon and Jorge De La Rosa, presumably making up in quantity what it lacked in quality.

  5. 10

    When I think of Clemens and unlucky seasons, I always immediately think of his start to the 2005 season.

    11 starts, 76 innings pitched, 1.30 ERA, and a 3-3 record to show for it. The Astros scored a whopping 18 runs for him in those 11 starts, and he got three straight no decisions in 1-0 games in which he went 7 scoreless innings.

    Clemens would go on to lead the NL with a 1.87 ERA and a 226 ERA+ (both personal bests), but his mediocre 13-8 record thanks to that unlucky start probably cost him the Cy Young that year…

    Then again, in retrospect, maybe that was karma….

    • 12
      bstar says:

      It’s only been seven years, but I think if a pitcher puts up a 223 ERA+ and only wins 13 games in 2012, he’ll win the Cy hands down. Just ask Greinke and King Felix.

    • 14

      What version of karma, Evil?
      Hindu, Buddhist, Falun Gong, Sikh, or Western Casual?
      ________________________

      I’m a big Niekro fan, and it certainly was bad luck for him to be on a team with a 78 ops+, (and he didn’t help that cause (-6 ops+)), but this seems like a case where WAR is more of a Counting Stat than an indicator of quality.

      The Braves had 13 starters that year.
      Knucksie had twice the IP of the #2 guy, and more than thrice the #3.

      He led the league in almost every counting stat, both good and bad, except for wins.

      • 16
        Neil L. says:

        Voomo, a great comment, if I fully catch your meaning.

        Under what conditions, do you think WAR for pitchers exhibits all the weaknessness of a counting stat like wins or losses?

      • 21
        bstar says:

        Of course, WAR is a counting stat for hitters, too.

        • 25

          @21
          Exactly. So what would the equivalent be for a hitter?
          Niekro pitched 330 innings.

          Here’s the NL starters for 1977:
          http://www.baseball-reference.com/leagues/NL/1977-starter-pitching.shtml

          41 guys started at least 25 games.
          The low inning count was 150.

          8 guys topped 240
          240 Sutton
          252 Reuschel
          258 Ed Halicki
          261 Tom Seaver
          267 J.R. Richard
          283 Steve Carlton
          302 Steve Rogers
          330 Philip Henry Niekro

          Just eyeballing it I’m saying that the Average Full Time Starter pitched 200 innings.
          There were a handful of elite workhorses and Niekro was 10% beyond the one other anomaly.

          Niekro pitched 1.6x the average.

          12 team league
          41 full time pitchers (I’m just picking 25 starts as our “fulltime” number)
          There are roughly double the full-time position players as there are full-time starters (8 positions, some platooned.

          So, look at the top 80 batters in terms of Plate Appearances.

          #1 – 732 Pete Rose (outlying anamoly)

          #80 – 453 John Milner

          So, the average PA for a full time starter is between 450 and 700,

          That’s 575.

          Multiply 575 x 1.6

          If Niekro was a batter he would have had 920 PA.

          His era+ was 111
          Has a batter with a 111 ops+ ever gotten to 8.5 WAR?
          No.

          Could a batter with a 111 ops+ get to 8.5 with 920 PA?
          I’d say very likely.

          ______

          As to the question “was Niekro unlucky?”
          Well, he tied with Koosman for the lead in Tough Losses.
          He had 9.

          So, yes.

          He also pitched 12 more games and over 100 more innings than Koos.

          Mark Lemongello had 8
          (9-14, 103 era+)

          • 27
            Neil L. says:

            ~Bows in awe to Voomo.~

            Wow, how do you pack so much provocative knowledge into one comment?

          • 29

            I am procrastinating in a major way right now, that’s how.

            Here’s another delightful waste of time, if you’re into it. Just set up a free seven player challenge at fanduel for tomorrow’s afternoon games:

            http://www.fanduel.com/entry/PVHGXX

          • 38
            Ed says:

            I hear where you’re coming from Zoomo. But doesn’t Niekro deserve some credit for being able to throw more innings than other pitchers? Also, as others on here have taught me, there’s not a direct relationship between ERA+ and WAR. The latter takes into account defense and supposedly the Braves had a particularly bad defense back then.

          • 39
            Richard Chester says:

            Just for comparison purposes since 1920 there have been 27 seasons of pitchers with > 330 IP. They all have WAR of 5.1 or higher. Niekro ranks 12th in WAR (8.5), 26th in ERA+ (111), 27th in wins (16) and tied for 1st in losses (20).

          • 43
            Doug says:

            I’m still trying to understand how Niekro could have a 111 ERA+ with a 4.03 ERA. When the NL average ERA that year was 3.91.

            Can anybody explain that?

          • 46
            Graham says:

            Ballpark effects, Doug?

          • 50
            Doug says:

            Thanks, Graham.

            That would make sense in homer-friendly Fulton County Stadium. The pitching park factor that year was a stratospheric 117.

          • 51
            Graham says:

            Jeez, that’s grotesque.

            I put Niekro’s ’79 season through the BB-ref stat converter. On the 1968 Dodgers, whose pitching park factor was 92, I think, Niekro’s ’79 numbers convert to 24-15 with a 2.21 ERA and 219 strikeouts.

          • 53
            Ed says:

            Highest ERA with an ERA+ above 110…Pedro Astacio with the 1999 Rockies. (5.04 ERA, 115 ERA+).

            At the 100 ERA+ threshold, the highest ERA was 5.26 by Tommy Thomas with the 1926 Browns. (5.26 ERA, 102 ERA+).

      • 40

        What version of karma, Evil?
        Hindu, Buddhist, Falun Gong, Sikh, or Western Casual?

        I don’t know. I’m lazy, so I only use Instant Karma…

        • 57
          kds says:

          LoL. Of course, Joe Morgan always insisted that sabermetric karma could not run over his “I was a player so I know everything about baseball analysis” dogma.

  6. 11
    vivaeljason says:

    Man…Ned Garvin had an especially rough 1904. A 1.72 ERA was really good even in the deadball era…but to go 5-16?!? AND ON MULTIPLE TEAMS?!? Jeez.

  7. 13
    Mike L says:

    The Niekro 1977 season jumps out at me. 8.5 WAR with a ERA of 4.03 and an ERA+ of 111?

  8. 17
    Neil L. says:

    Graham, a very technical post. I am trying to wrap my head around what Roger Clemons’ position on the first list means.

    First off, he didn’t appear on either of the other lists. Second, his ERA for the 1996 season was the second highest on your list and was the highest for a power pitcher.

    My point, Graham, is that was he not the author of his own unluckiness? His walk total and ERA, at least partially, contributed to his poor record.

    Granted that he could not control his team’s offense in his starts, but your WHIP will catch up with you every time.

    This is not a knock on your blog, it is a good discussion starter and gets us to think about the performance metrics that we use for pitchers.

    • 18
      Graham says:

      I appreciate your comment, Neil. I wasn’t aware of the correlation between WHIP and W-L, though I did a Play Index search on WHIP while I was preparing this post. Surprisingly, there weren’t very many losing pitchers with WHIPs under 1.2. I omitted the list in part because the numbers weren’t that impressive. Your comment’s making me think I maybe should’ve included it.

      As for Clemens, I led with him because of his WAR total and name recognition. I maybe could’ve written more on Nolan Ryan’s 1987 season but his WAR total wasn’t as high.

  9. 19
    Neil L. says:

    Not at all, Graham. Don’t backpeddle in the slightest. We all refine our understanding of the game by having something to reflect on and react to.

    Your blog, with its lists, is a nice brain-builder. It requires a careful look at the various columns in the tables to try and discern cause and effect.

    I personally think that the WAR’s of pitchers in the 1900-1920 era is artificially boosted by the number of innings they pitched. By artificially, I mean the quality of batters they faced. Even though they come up in P-I searches, I don’t think it is a level playing field with “modern” players in terms of racially integrated leagues, international drafting of players and physical fitness of batters in big-contract times.

    Keep on authoring in HHS!

    • 20
      Graham says:

      Will do. For now, I’m contributing one post a week. I may look to do more as I can. This is a great site to write for. I like how engaged, knowledgeable, and positive readers are here.

      I tend to place less stock in early pitchers. From what I understand, Lefty Grove was the first pitcher to throw with his full body. I assume Deadball Era greats were more or less tossing the ball in, perhaps 10-15 MPH slower than today’s pitchers. It helps me take a dimmer view of pitchers like Christy Matthewson or Grover Cleveland Alexander.

  10. 22
    Shping says:

    Ahhh, yes, my first time back here in many months and it’s great to see that the well-rounded baseball discussions continue! And to see that the WAR debate continues as well. Thanks NoStat for the heretical/helpful example that helps validate some of my misgivings too. And yet the WAR proponents continue to respond in polite, reasonable fashion. Curse them and their open-mindedness :)

    If the WAR-ists really want input, tho, my incredibly mundane suggestion/first step would be to multiply it by 10. It just sounds more impressive to compare an 87 score to a 77 and is easier to get one’s head around. Plus 100 works out to a nice round number to shoot for as “perfect” or at least incredibly impressive. And everybody would know that an 87 meant 8.7, just like we know that a .332 avergage(or “three-thirty-two”) actually means 33 percent, or about 1 out of 3.

    Step two would be to make the actual WAR formula more available, publishable, readable, understandable, digestible, etc., whatever you want to call it. Can anyone possibly recite or describe the formula in simple, statistical or categorical terms? (that’s a rhetorical question that doesnt need a reply, but if you actually can, please go for it!)

    I guess i’m just one of those people who likes to look at my paycheck details now and then and actually know what the math/reasons are for the exact dollars and cents i’m receiving (“oh, dang, i got 115 less dollars this time because…”) and not just the result, or some vague explanation as to to why. And WAR doesnt do that for me. But things like ERA or a WHIP of say, 0.95 do, because i can easily grasp the notion and say “wow” — that means less than a baserunner per inning!

    Maybe i would appreciate WAR more if my stubborn brain somehow couldve been taught the exact formula at age three…

    Anyway…. Sorry for ramblin on in my first re-post guys, but it’s fun to be back and reading these and other topics again. I’ll be talking less and reading more in future!

    • 23
      Neil L. says:

      Shping, (is that code for shipping? :-) ), does memory fail me or were part of the B-Ref cutting-edge community?

    • 31

      @22
      No, I have no idea what the recipe is for WAR.
      Looked it up once, and concluded that even with my pretty good head for math that I couldn’t calculate it in any kind of real-time meaningful way. And what is the fun in that?

      …Suddenly I am nostalgic for the GWRBI.
      Remember that one? The guy who hit that run-scoring forceout in the 2nd inning would sometimes get the GWRBI?
      I was sitting with the Bleacher Creatures on June 11th, 1988 when Billy Martin started Rick Rhoden at DH.

      Rhoden had a sac fly that was the GWRBI.
      That was a first-place team, folks, starting a pitcher at DH.
      And pulling it off.

      Poor Billy did not survive losing 7 out of 8 the following week, however, and 4.5 years of sucking immediately followed.

      How’s that for a tangent?

      http://www.fanduel.com/entry/PVHGXX

      • 45
        Shping says:

        Hee hee — gotta love the GWRBI. I might have to go look up an ’80s boxscore for old times sake, just to see, underneath the line score, “GWRBI: Kittle(7)”

        Will we ever see Saves the same way?

    • 32
      Ed says:

      Shping – I agree 100% about the need for WAR publishers to be more transparent with how the numbers are calculated. The fact that they haven’t is a big black mark in my opinion.

      • 37
        Shping says:

        For some reason, i’m suddenly thinking about Tom Hanks in “Castaway”, trying to figure out the sounds of the coconuts falling from trees, saying, “What is that?!?”

    • 35
      bstar says:

      Well, shping, WAR does stand for Wins Above Replacement, so an 8.7 WAR player contributed ~9 more wins to his team than a replacement player would have. That’s pretty easy to digest and understand. It helps us to better quantify, “How much is player X actually helping his team?”. If you multiply it by 10, you would have to change the name of the stat and “87″ would just become an arbitrary score.

      I, like you, was slow to warm to the WAR concept. But once you spend time with it, you learn to think in WAR terms. For example, I now know what someone means when they say “He’s a 4 to 5 win player.” The same progression for me happened with OPS+. At first I thought it overvalued walks too much(it does), but after years of looking at this stat I know a 120 OPS+ is good, a 130-140 is All-Star worthy, and anything beyond 160 or so is MVP quality. It just takes time, I think.

      • 42
        Shping says:

        I know, i know, thanks. I can see how it becomes a matter of habit. I’m trying to embrace it. Still trying. Getting better with OPS. But i still need to be able to grasp exactly what the stat is saying in a way that WAR doesn’t do for me.

        And i still say moving the decimal would help WAR. Just like we know that a “500″ team is actually a 50 percent team that wins half its games, i think it would work the same way.

        • 44
          Graham says:

          I believe the idea with WAR is to provide the number of wins a player provides over a theoretical replacement. Moving the decimal would confuse that. If it were made out of 100, it might make sense to do it as a percentage, such as “Player X provided 23 percent more wins than a replacement.”

          • 48
            Shping says:

            Ok, the percent thing is another way of looking at it.

            But i think i’ll still invoke the name of the almighty Bill James here. Didnt he once do some kind of rating system for new stats, based on categories like Accuracy, Reliability, Trendability, Digestibility, something like that? Ring a bell with anyone? (WAR, for instance, would probably score high in specificity or comparability, but low in transparency ).

            Perhaps i’ll do some research on that and report back at a later date. Meanwhile, we got another weekend of baseball to enjoy!

          • 52
            Ed says:

            Along the same line, here’s an interesting article by Bill Simmons on some of the new baseball stats. I think he does a good job of capturing how the “typical” fan reacts to these sorts of things:

            http://sports.espn.go.com/espn/page2/story?page=simmons/100402

          • 55
            bstar says:

            Ed, that was a fun ride of an article. I have never read much Simmons for whatever reason; his humor really stood out a lot. Is he always this good of a writer?

          • 56
            Ed says:

            Glad you enjoyed it. Yeah, he’s a very good writer and quite humorous. You have to put up with his Boston homersim (which he fully admits to) and he makes a lot of tv/popular cultural references that I don’t get since I don’t follow pop culture that much. Still, I think he represents an interesting viewpoint – that of the “common man” who’s slowly starting to get the more advanced metrics, not just for baseball but for other sports as well. Sadly, he doesn’t write as many articles as he used to. In the past, he would write 2-3 articles a week. Now it’s generally just one, typically on fridays. He now does podcasts in place of writing articles.

          • 75
            kds says:

            We have general habits of using 1/100, i.e. %, but there is nothing necessary about that. Same for BA, OBP, SLG where we use hundredths. Since these are all rate numbers it doesn’t matter what we call it. 25%, .250 or .01 (base2) are all the same thing. But WAR is a counting stat, not a rate stat. It would make no sense to multiply HR or RBI or pitchers’ Wins by 10. We want to use WAR for things like analysing trades, free agent signings. Multiplying by 10 never helps, and makes it more complicated to use WAR with other things denominated in wins, which we very much want to do.

          • 76
            kds says:

            I meant thousandths for BA, OBP, SLG. (A BA of 3 hundred is 300/1000.)

          • 77
            kds says:

            The Bill James article on rating stats by Importance, Reliability, and Intelligibility is in the 1987 Abstract. He does not look at any “higher order” stats like WAR. Because few had been developed by that time. Of course he basically invented WAR in the article “MVP” pg 164-167 of that volume.

        • 54
          Hartvig says:

          My issue with OPS/OPS+ is that it tends to undervalue players like Rickey Henderson and Tim Raines and even though I know when I look at their numbers I still get this visceral “Oh… well…” feeling in my gut.

          My issue with WAR is a) it’s incredible complexity and b) the fact that there is more than one formula and, on occasion at least, the outcomes vary wildly.

          • 92
            Shping says:

            Thanks for the input guys — enjoyed all of the info and references.

            Like i said, i’m trying to embrace WAR. Lots of objections still, as are being discussed here, along with simple issues of lifelong habits, stubborness and fact that we all need to remind ourselves from time to time that no stat of any type is perfect, the ultimate 3.1444 or e=mc2 that ultimately explains everything! :)

            I don’t think or hope that anyone is actually saying that about WAR, right? But i think we’re guilty on both sides of the WAR arguments at times of either trying to claim that or reject it for lack of it. I like to see people asking, “Is it the best we’ve got?”

            Some of us would apparently like to see it listed first among the daily newspaper stats, or webpage stats, or on the scoreboards in ballparks — and some of us think its overrated.

            Let the debate continue!

    • 58
      kds says:

      rWAR for a starting pitcher. Take his RA, not earned runs. Compare this to his adjusted replacement level. Gross replacement level is about 1.25* league average RA. (Maybe a little less, (1.2?) I’m reverse engineering this from various sources.) You then adjust for park. And for his team’s defense. So a bad defense, in a hitter’s park, in a high scoring league, will give you a higher replacement level than if you reversed all those factors. 1996 was a high scoring year in the AL. Fenway had a one year park factor of 106. The Sox’s defense suxed that year according to Total Zone. So this year, in some parks, with a good defense, a pitcher with Clemen’s IP might have replacement level runs of 135, he actually had 189. His RA was 106, so his RAR was 83. Park adjusted league scoring tells us what number to divide RAR by to get WAR. The higher the scoring environment, the higher the divisor. Someone with 50 RAR in Coor’s would have lower WAR than someone with 50 RAR in Petco.

      The only part of the above that I am less than positive about is the gross replacement level. It may not be linear. It may use multi-year park factors (101 for 1996 Fenway) instead of one year.

      • 60
        Neil L. says:

        I don’t think I’ve read as succinct a description of RAR as yours, kds.

        One newbie question, though. Why is the gross replacement level for pitchers about 1.2-1.25 times the league average runs allowed? Why is not the league average itself?

        Is the bump up of RA the differene between a replacemnet pitcher from the minors and an average major-leaguer?

        May be a dumb question but thanks in advance for your patience.

        • 72
          kds says:

          Your second paragraph basically has the correct answer to the first. Replacement level is what can get, from your minors or free agent signing, waiver deals, etc., for the league minimum. You should expect these to be worse than league average. A big question, not totally settled is where is replacement level? Tom Tango (insidethebook.com) thinks that replacement level for a starting pitcher is W/L = .380. I think this means that if you took the theoretical replacement level pitcher, had him pitch for a gazillion innings, (to avoid sample size issues), looked at his runs allowed, gave his “team” league average offensive scoring, and used a pythag formula to convert RS and RA to W%, you would get 38%. A team that scores 4/gm and gives up 5/gm would be expected to win just under 40% of its games, so replacement level as RA * 1.25 average RA cannot be too far off if our understanding of replacement level is correct.

  11. 24
    Shping says:

    …But i also had to ask Voomo @ 14 and the EvilSquirrel @10:

    Which kind of karma would bring Clemens back as, say, Steve Bartman, or Mario Mendoza?, or young Moonlight Graham? I’ll pick that one. (Samsara?)

  12. 26
    Shping says:

    Hey Neil — Ha ha, thanks i guess. I dont think i could ever describe myself as cutting edge, but i was definitely a BB-Ref devotee/participator for awhile in 2010-11. And i does like to ramble on at times and think about this amazing game of baseball and all the glorious history and stats and comparisons and random possibilities, and hear others do same — so be forewarned, i’ll be visiting again! — defending Braun’s honor as often as necessary too!

  13. 30
    Shping says:

    Wow, an invitation even. Ok, no problem.

    Teach me something about triples this year. Possibilities, trends, ballpark tendencies, something. Kind of a pet curiosity of mine.

    • 59
      kds says:

      Triples have the highest Home Field Advantage of any batting event. Probably because home outfielders are much better at judging the wall and how the ball will bounce. They have a high variability in park factor also. Partly because of park size and partly because of the way the shape and texture of the outfield walls affects the number of odd bounces.

      • 61
        Neil L. says:

        Why are triples on a slow but steady decline as a percentage of all plate appearances? Shouldn’t new “retro” parks have the dimensional quirkiness to create more triples?

        I can vaguely remember having a few discussions about triples and their frequency back in B-Ref, but don’t recall any firm conclusions being reached.

  14. 33
    Ed says:

    Braun has honor???? :)

  15. 41
    Shping says:

    Hmmmm, not sure if or where the sarcasm is with Braun — i’ll accept your offer though bstar! — and we can save that topic for another time.

    How bout that Ben Sheets as one of the unluckiest pitchers of all time, on two lists? He was dominating in Milw. Wonder if he’ll ever pitch again.

    Could we possibly see a list of the luckiest picthers of all time? I’ll bet Pete Vuckovich is on that one, bless his heart!

    • 47
      Graham says:

      It’s a little esoteric, and this stat may not be your cup of tea, but Ben Sheets has the fourth-most WAR at 24.1 of any pitcher with at least a 113 ERA+ and a losing lifetime record. The three men in front of him are Jon Matlack (38.7), Thornton Lee (31.5), and Matt Cain (25.0).

  16. 49
    Shping says:

    I won’t deny that WAR can be useful, especially when it suits my purposes. :) Sheets was unlucky.

    Poor Matt Cain too. Easy guy to root for.

  17. 74
    John Williams says:

    With WAR, ERA+ and K/9 as newer metrics; could be see a starting pitching with .500 or even a losing record win or finish in the top 5 or so for a Cy Young? Félix Hernández won in 2010 with 13-12 2.27.

    Below is a list of the .500 and under starting pitchers with votes for the Cy Young Award. 12 are on the list with 9 in the NL and half since 2000. Also, 9 were finished 6th place or lower in the voting and none in the top three. 2 are Hall of Famers (Ryan and Perry) and 2 are borderliners for the Hall (Hershiser and Mussina). Perry (twice),Lincecum, and Hershiser won the Cy Young other seasons; while Nolan Ryan was a runner up 1973 and half 6 top 5 finishes.

    Dave Roberts 14-17 2.10 in 1971 finished 6th for the Padres.
    Fred Norman 13-13 3.60 in 1973 finished 6th between the Padres and Reds.
    Gaylord Perry 19-19 3.38 in 1973 finished in 7th with the Indians.
    Orel Hershiser 16-16 3.06 in 1987 finished 4th with the Dodgers.
    Nolan Ryan 8-16 2.76 in 1987 finished 5th for the Astros.
    Orel Hershiser 15-15 2.31 in 1989 finished 4th with the Dodgers.
    Mike Mussina 11-15 3.79 in 2000 finished 6th with the Orioles.
    Ben Sheets 12-14 2.70 in 2004 finished 8th with the Brewers.
    Kevin Millwood 9-11 2.86 in 2005 finished 6th with the Indians.
    Roy Oswalt 13-13 2.76 in 2010 finished 6th between the Astros and Phillies.
    Tim Lincecum 13-14 2.74 in 2011 finished 6th with the Giants.
    Madison Bumgarner 13-13 3.21 in 2011 finished 11th with the Giants.

    Fred Norman 13-13 3.60 in 1973 finished 6th between the Padres and Reds.
    Tim Lincecum 13-14 2.74 in 2011 finished 6th with the Giants.
    Madison Bumgarner 13-13 3.21 in 2011 finished 11th with the Giants.
    Roy Oswalt 13-13 2.76 in 2010 finished 6th between the Astros and Phillies.
    Ben Sheets 12-14 2.70 in 2004 finished 8th with the Brewers.
    Orel Hershiser 15-15 2.31 in 1989 finished 4th with the Dodgers.
    Orel Hershiser 16-16 3.06 in 1987 finished 4th with the Dodgers.
    Nolan Ryan 8-16 2.76 in 1987 finished 5th for the Astros.
    Dave Roberts 14-17 2.10 in 1971 finished 6th for the Padres.
    Kevin Millwood 9-11 2.86 in 2005 finished 6th with the Indians.
    Mike Mussina 11-15 3.79 in 2000 finished 6th with the Orioles.
    Gaylord Perry 19-19 3.38 in 1973 finished in 7th with the Indians.

  18. 78
    BryanM says:

    John Autin @68 A definition of statistical inference is the drawing of conclusions about a population by studying a sample (usually a random sample ) a true statistic (say the mean, or Standard deviation of the sample ) is always an estimate or guess about the population . (the mean of the sample is not the mean of the population) WAR is a detailed calculation, which is an estimate ,or guess, about the impact of a player on a team’s wins it doesn’t measure wins, it measures other things , and uses those things to guess about wins. The thread has been about how good a guess WAR is for pitchers – I think it’s OK, not perfect but.. No Stat thinks it is not accurate, which , I hope without offense to him , I can say is about the same thing .

    • 79
      bstar says:

      Bryan, I think WAR for starting pitchers is pretty solid. If you look at the all-time leaders in b-ref WAR, this list really, really passes the sniff test. There aren’t a whole lot of surprises; whomever, by general opinion, you think should be at the top is at the top and it just flows down from there. It’s actually pretty hard to look at one pitcher and say, “Wow, his WAR total looks way too high/low.” It’s a pretty solid list overall.

      What about relievers? Despite a lot of people making noise that perhaps leverage index should be weighted more than it is, to me this list also passes the test. Not many relievers have passed the imaginary ~65-70 WAR Hall of Fame threshold, but that goes right in line with few relievers actually inducted into the Hall. Should more relievers be in the Hall of Fame? I think, oddly, that the BBWAA has done a pretty decent job of choosing HOF members, with maybe the exception being Bruce Sutter.

      I don’t really think that WAR is really that far off as far as a measure of what production a pitcher is bringing to his ballclub.

      • 81
        BryanM says:

        bstar, so do I . I made the same point about the career list that you do , back in @67 (thread starts @5) , when the sample size gets smaller ,like a season, the estimate gets less accurate, but in my opinion, still pretty good. The debate here is about how good . In 1996 Clemens went 10-13 for the Sawks with a 7.7 WAR , got no Cy votes , a young Andy Pettitte went 21-8 with a 5.7 WAR, 2nd in CY. Now was Clemens better than Pettitte in 1996, or not?
        Without confidence intervals, we can’t know ; my hunch is probably ,yes, but WAR is clearly not accurate enough that a 2 WAR difference is conclusive; Pat Hentgen was likely fully deserving of the CY that year, Roger was unlucky, but good , Andy was lucky and good.

    • 80
      Neil L. says:

      Trying to distil your thoughts down about WAR as a reliable statistic for pitcher quality, Bryan, while not being perfect, do you think it is the best we’ve got?

      • 82
        BryanM says:

        Neil — I thinks it’s close for starters , perhaps RA+ would be a little better — ( add back the unearned runs and recalculate ERA+ ) ERA is biased in favor of weaker pitchers. If the Phillies give the other team 4 outs with Doc on the mound , it may not matter – if a bad pitcher allows 3 unearned after an error, his ERA looks fine, he’s still pitched badly, the team still loses.
        For relievers , i think WHIP (include HBP?) is better because of the whole inherited runner issue. What are your thoughts/

        • 83
          Neil L. says:

          At risk of sounding like an HHS “recruiter”, will you post more frequently here or are you too involved at other baseball sites?

        • 84
          Neil L. says:

          Don’t want to give you a superficial reply, Bryan, so I won’t respond in detail until tomorrow. I’m finding my own thinking is being clarified by your posts.

          I’m working a little mini-study involving teams blowing four saves in their first eight games …. trying to find how rare/common it is.

          I hope to put the results up in a few minutes, perhaps Doug’s blog. (I realize it will be a hijack, but there is nowhere else to send it)

          Thanks.

          • 86
            John Autin says:

            Neil, remember that not all blown saves happen in the late innings. Every reliever who enters with a lead* is technically working on a save opportunity, even if he’s a LOOGY working in the 6th inning.
            __________
            *And when a prior pitcher is eligible for the win.

          • 87
            John Autin says:

            To clarify … I sense that you are seeking a cohort for Toronto’s early-season experience of blowing 4 late leads. My point is that searching for “blown save” is not necessarily going to pick out the games you’re after. If you want to catch late-inning blown saves, you might consider adding a WPA requirement — say, minus-0.25 or less.

          • 88
            Neil L. says:

            I’m just using the official version of the stat, JA. Of course, what you say in true.

    • 85
      John Autin says:

      Bryan, thanks for the capsule summary.

      I can tell you know your way around the subject and the language (probably better than I do), so you don’t need me to tell you that using the word “statistic” with no modifier in a sentence like “every statistic is an estimate” is liable to create confusion among an audience to whom “statistics” include simple counts of events like home runs and wins.

      So, um, I won’t say that. :)

      • 89
        BryanM says:

        John , Thanks for that . What our exchange has shown me is that I need to be more careful in my terminology. When we say that Albert Pujols drove in 99 runs last year, calling that a “statistic” is just fine, and it’s clearly not an estimate (but what about Hack Wilson’s 190 or 191? ) . Emphasizing “guess” in my posts is a way of protesting against the sort of argument that goes “A had more blue jellybeans than B last year, had almost twice as many red marbles, and was only picked off once, so clearly A had a better year than B” . When we use a “stat” to talk about itself, it’s a measure and no estimate, but when we use a “stat” to infer something about something else, then it becomes a true statistic, and estimation, uncertainly, and dare I say it., ignorance begin to creep in. To return to the thread; we have RC 1996 W/L 10-13 (a “stat”) RC 1996 WAR (another “stat”) . the first stat supports the conclusion that RC was mediocre in 1996 (an estimate, using w/l as a statistic) the second that he was very,very good that year (another estimate) – I would love to find a term for what we do when we use a “stat” to draw conclusions about something else — the dictionary term would be “statistical inference” which is ponderous and confusing, I’ve been using “guessing” as a sort of mild protest against the folks who think they are “proving” something when they quote stats to buttress an argument. post is getting too long, so I’ll break it here..

      • 90
        BryanM says:

        Part deux .. a long digression. I remember precisely when the light went on for me about counting one thing when you are really interested in something related, but different. Fall of 1970, the Orioles were on a tour of Japan after the season , and with the no-split-screen TV technology of the day, they were interviewing Earl Weaver in the dugout with the game in progress. Earl had his back to the field when Frank Robinson’s bat made an unmistakeable sound. Weaver, who was in mid sentence, interrupted himself to say “there’s two” in a calm voice without looking at the field , and I realized that
        1) I was thinking “home run” an “event” . Weaver’s first thought was “2″ an “outcome”
        2) I knew a lot less about baseball than I thought I did

  19. 93
    Shping says:

    Great allegory BryanM.

    And i thought you were simply going to tell the story about Weaver, sitting in his underwear in office and smoking a stogie during the natl anthem, telling a reporter, “Don’t worry kid. We do this every day”!

    That aside, it seems like Weaver was definitely ahead of his time in a lot of what we call sabermetric thinking. (“The heck with wasting an out on a sacrifice; i’ll take my chances with the lefty hitting a 3-run homer” is a deceptively simple yet advanced point of view.)

  20. 94
    Andy says:

    Graham has come out on HHS on fire, generating tons of comments on his weekly posts…wow!

    • 95
      Neil L. says:

      I’ve learned a lot about WAR, Andy, from Graham’s post and the intelligent debate in the comments.

      A lot of things I felt hesitant to ask about WAR for pitchers.

      Kudos, Graham!! ~thumbs up, high five~

    • 96
      BryanM says:

      Graham has hit rich vein of ore with a simple proposition ; Find one stat that “says” not very good, correlate it with one that says “real good” ,add an emotion-soaked adjective like “unlucky” and stand back. This thread is/was great and I have learned a lot from Graham and the other posters,

      Much thanks, Graham

Leave a Reply