Ross Ohlendorf and another example of why W & L are bad statistics

Ross Ohlendorf / Icon SMI

The Red Sox have signed Ross Ohlendorf, who pitched for the Pirates over the last few years.

Numerous reports about the signing have cited the fact that Ohlendorf has a 2-14 record over the last 2 seasons. That’s true, but what does it really mean?

First take a look at 2009. Ohlendorf started 29 games, had a 106 ERA+ in 172.2 innings, and an 11-10 record. That seems fine…he was a little better than average and had a slightly above-average winning percentage. The only thing that sticks out as unusual is that for a guy who averaged just under 6 innings per start, he got a decision fairly frequently–in 21 out of 29 starts.

In 2010, his numbers were similar. His WHIP went up a little and his ERA+ went down to 99. Still, he was about league average in 21 starts and 108.1 innings. Somehow he earned a 1-11 record. That’s not right. His neutralized pitching stats (see here, at the bottom of the page) say he deserved a 5-6 record that year.

In fact, looking at his 2010 game log, he had only 4 games where he allowed more than 4 runs. He had 4 losses in games where he allowed 2 runs or fewer. He had 6 other starts where he allowed 2 runs or fewer and got a no-decision (not counting a 7/28 start where he recorded only 2 outs.)

Granted, in 2011, he was awful, but that was in just 38.2 innings. Over those innings he allowed a whopping 60 hits plus 15 walks and 6 hit batters. That’s a problem. But if he was just injured in 2011 and can return to 2010 form, he would make a fine 4th or 5th starter for any team.


Comments

Ross Ohlendorf and another example of why W & L are bad statistics — 34 Comments

  1. Right, 2-14 over ’10-’11 shouldn’t be over-emphasized. His 1.53 WHIP and 78 ERA+ over the same period do smell a bit funny, however.

  2. I like seeing how you guys think up posts.
    (Mostly) figured out Autin’s quiz because the recent topic had been about complete game losses.
    Now I’m betting you’re on Ohlendorf because his #1 similarity is Bob Sebra.

  3. Won-loss record is certainly not a great way to judge a pitcher, but neither is ERA, or sometimes even ERA+. His FIP was 4.72 in 2009 and 4.44 in 2010, both substantially higher than his ERAs those years. For his career, the numbers normalize out to an ERA of 4.77 vs. a FIP of 4.85. That kind of says who he is as pitcher. Now add in a transition to the AL, the AL East and that he’ll be taking a low groundball rate of 38% to Fenway Park and O-M-G…

    They say he’s very smart. Maybe he just won’t show up!

  4. Glad that you posted this Andy cause it’s something I’ve been thinking about recently. We take for granted that pitchers get assigned wins and losses. I have no idea what the history is behind doing so, but it’s really a strange idea. Teams win or lose games, not individual players. Heck why not assign wins and losses to the shortstop. Or to the catcher. As far as I know, this really isn’t done in other team sports, certainly not football or basketball. And of course, the definition of a win/loss is arbitrary as well. Why does a starter have to go 5 innings to get a win? Why not 4? Or 6? or 7?

    Okay…getting off my soapbox….

    • Ed,

      Good points; I believe in the distant past, wins and losses were often awarded differently than they are now topitchers, so there is always going to be disparities between old W-L listings, and current W-L listings.

      As for assigning Wins and Losses to individuals in other team sports, this _is_ in fact done, and frequently cited, for quarterbacks in football. You didn’t mention hockey, but goalies are also judged to a degree by their W-L record.

    • Ed, if I had to guess (which, of course, means I’m now going to do just that), it had to do with how starting pitchers were used at the game’s beginning.

      So Tommy Bond, one of the top pitchers of the 1870s, started 59 of his team’s games in 1878, completing fifty-seven of them. Old Hoss Radbourn started and completed seventy-three games, or approximately 75% of the Grays games, on his way to a 59-12 season in 1884. So the W-L record of starters of the early days were much more closely alligned to the teams’ records. A bad day by your team’s starter, and you team was going to lose. A bad day by your rightfielder, and you team still had a good chance to win if your pitcher was doing his job. The pitcher was the most important man on the field when it came to your team’s success or failure, so awarding wins and losses to pitchers made more sense in those days than they do today.

      As for other sports, there are others who do assign W-L records to individuals. In hockey, as one example, Roberto Luongo of Vancouver had a 38-15 W-L record, with a 2.11 GAA (think ERA) last year. The goalie does serve a similar role in hockey as the pitcher does in baseball. He’s part of the defensive unit. In fact, goalies today share a similarity with old-time MLB pitchers, starting a very high percentage of their teams’ games, and pretty much completing the games they start.

      This is not a defense of pitchers’ wins, but just a guess as to why it came about in MLB.

      • A few responses to Lawrence and MikeD:

        1) I’m not a hockey fan so I really have no insight into wins/losses for hockey goalies.

        2) MikeD – Your theory about the origin of wins and losses for pitchers makes sense.

        3) Lawrence – I have to strongly disagree that wins/losses are used for quarterbacks. I’ve been following football since the late 70s and I never see them in box scores, on TV, in the newspaper, in HOF discussions, etc. Sure you can find them on PF reference but as far as I know it’s not an official stat. They just report wins and losses for games started so if a guy leaves with a 20 or 30 point deficit and the backup quarterback rallies the team, the starter still gets a “win”. Anyway, quarterback wins/losses just aren’t used anywhere remotely the way they are for baseball pitchers.

        • I remember it being a BIG deal when John Elway became the all-time leader for QB wins, and a similarly big deal (in Wisconsin, at least) when Brett Favre took him down for the top spot. I think they’re actually used pretty similarly. You hear about certain QBs being “winners.” For example, Philip Rivers, who puts up great numbers every year, gets labeled a “system” guy and a “choker” and all sorts of other things because the Chargers have failed in the playoffs so many times in the last 5 years, even though they’ve been consistently among the best regular season teams. I think it’s used pretty similarly to the Jack Morris crowd. Unless, of course, you’re talking about how it’s used in HOF arguments, in which case, you’re right – it’s definitely not as prevalent for those discussions.

          • Again I’ll have to disagree. I was just thinking to myself this morning….”who’s the all-time leader among quarterbacks in wins?”. Answer: “I have no idea”. I remember zero publicity about Elway or Favre breaking that record. I’ll bet most baseball fans know who has the most career wins and very few football fans know the same. Likewise there’s always a lot of publicity when a pitcher reaches a milestone (e.g., 300 wins) and zero for quarterbacks who do the same. There are so many other factors that are used to evaluate quarterbacks – completion percentage, yards, TDs, interceptions, QB rating, etc. that wins just don’t get discussed. Sure it might get discussed in a very general way (e.g., your Philip Rivers example) but not in the specific way it’s used for pitchers. The one place I do see it discussed is how a quarterback does in the playoffs, particularly the Super Bowl.

        • I am a big hockey fan – and a goalie, in the bargain – and I can tell you that wins are not really a good barometer for keepers, even though they routinely play the entirety of the game and often start two-thirds of the schedule. A superior goalie on a poor team can often have a weak W/L record, and even a relatively-poor GAA. Just to take one hypothetical:

          Goalie A – .920 save% (think WHIP)
          Goalie B – .905 save%

          The first goalie is comfortably above-average (which is about .911 nowadays, give or take), the second mediocre. And yet the difference between their performance is maybe 25 goals over a whole season, given a typical starter’s workload (figuring 60 starts and equal shots faced, in this case 27.75 per game). Which brings us to a large wrinkle – team defense quality varies a lot, which often impacts how many shots a goalie faces. If goalie A faces 32 shots per game, and goalie B only 22, then A’s GAA will be roughly 2.57, and B’s only 2.08. Such swings are not that atypical – in 2002, Roberto Luongo (then with Florida) faced 1653 shots, while Martin Brodeur faced 1655 in New Jersey… in 15 more games. Their GAA reflects this: 2.77 for Luongo and 2.15 for Brodeur.

          On top of that, since there’s a finite clock, any time spent facing shots is time not spent scoring goals, unlike in baseball, where everybody gets the same 3 outs per inning. In effect, Luongo’s teams were making tons of errors and running into tons of outs. End result: Luongo went 16-33-4, and Brodeur went 38-26-9.

  5. Continuing to use a Win-Loss record for pitchers makes about as much as sense as judging position players solely on how many outs they make.

    • I feel like this is an overreaction of an argument. W-L record is like any other stat for pitchers, alone it’s close to worthless, it needs to be taken in the context of at least a half dozen other stats. The main problem with W-L record is that the media (or at least some media) and unknowledgeable fans use it as an end all be all. It’s a stat that’s easily influenced by other people and situations. But let’s not throw it out. Nobody would consider throwing out RBI’s even though it’s a stat that’s over emphasized by the media and unknowledgeable fans… we (as people who follow an in depth stats blog) know it’s not the end all be all, but it does mean something. How much or how little is up for debate, but saying that it literally means nothing at all seems a bit of an overreaction to the the media-types who say it means everything.

      • Certainly, many stats need to be viewed in the context of others. For instance, if you tell me a pitcher had a 115 ERA+, I want to know if he was a starter or reliever, and how many innings.

        But at least those bits of information complement each other — knowing both gives you a much better understanding of the pitcher than if you had just one.

        However, if I have the rest of the pitching stats, adding the W-L record really doesn’t tell me anything new.

        W-L record does have some value as a stand-alone stat — not nearly as much as WAR, but enough that I wouldn’t campaign to eliminate it. But I could easily get along without it.

        • I personally don’t think W-L is horrible, it’s not my favorite way to judge a pitcher and I’d be careful using it in an argument, but I don’t feel it’s completely garbage. There’s always going to be your outliers, just like there’s always going to be guys who hit better at Colorado and have a higher BA/more HR etc, just as there’s going to be guys who don’t hit a ton of homers playing in San Diego who actually are power guys.

          I see it as a decent way to see how well a pitchers teams do in games he pitches, not necessarily how well a pitcher does. Ohlendorf may have pitched well enough to keep a normal team in the game most of the time, but frankly he didn’t play for a normal team, so it’s the best we’ve got for him that year. We aren’t debating every time a player loses a home run to the Green Monster, or gains one by hitting it down the line at Yankee Stadium. You feel sorry for a guy like Ohlendorf, who should not have gotten a 1-11 record that year, but that’s the outlier, I feel like a vast majority of the time W-L records aren’t that crazy far off. That’s just me though…

          Is it a perfect way? No, far from it. Can it give you a vague insight into a pitcher and his team that season? I think so. But that’s why I love baseball and baseball stats, somebody else might think other stats are more important (like k’s or whatever) and nobody’s actually right.

      • Thomas,

        On a seasonal level W-L records can be quite deceiving at times (witness Nolan Ryan going 8-16 in 1981, while winning the ERA title), but on a career level they usually reflect a pitcher’s performance reasonably well. Of course, the particular team any pitcher performs for is going to influence his W-L record, but over an entire career that will even out somewhat. YMMV.

        • I couldn’t agree more, but I think for every 8-16 season that’s like Nolan Ryan’s there’s a dozen or so that were 8-16 and were completely deserving of it.

          I just think the Nolan Ryan-type seasons are outliers random weirdness that’s bound to happen after playing 100+ seasons. More often than not the W-L record is completely (or very closely) deserved…

          But don’t get this part of the argument wrong… I’m not saying it’s a great stat… just saying it’s not worthless…

        • “on a career level they usually reflect a pitcher’s performance reasonably well.”

          Lawrence, I agree that they usually do. But is “usually” and “on a career level” good enough for something that is still the most frequently cited pitching stat in mainstream baseball discussions?

          I looked at starting pitchers with at least 100 decisions since 1893:

          — ERA+ of at least 110: 21 of 235 had losing records. That’s 9%.

          — ERA+ of 90 or less: 9 of 66 had records of .500 or better. That’s 14%.

          — ERA+ between 97 and 103: 19% had W% of at least .550, while 11% were at .450 or lower. The range of W% ran from .631 (Lefty Williams, 82-48, 99 ERA+) down to .336 (Rollie Naylor, 42-83, 102 ERA+).

          In a long career of 400 decisions, a .550 W% is a record of 220-180, while a .450 W% yields 180-220. That’s a pretty big difference. And about 30% of the pitchers in this little study fell into those gaps.

          Now let’s look at the medians for the lucky and unlucky subsets of that 97-103 ERA+ group:

          The 39 lucky ones (W% .550+) had a median W% of .567, and a median ERA+ of 100.

          The 22 unlucky ones (W% .450 or less) had a median W% of .428, and a median ERA+ of (you guessed it) 100.

          In 400 decisions, a .567 W% yields a record of 227-173. A .428 W% yields a record of 171-229.

          That’s a very big difference in W-L record, for groups that were essentially the same in effectiveness.

          • John, your data points to the unreliability of W-L to reflect isolated pitcher performance quality, and your analysis is fun to read (as always). But reading these exchanges makes me wonder how many people would find the conversation surprising. Long before SABR, it was widely recognized that ERA was a far better measure of isolated performance and that W-L measured something else – that there were lucky and hard-luck pitchers. But W-L is embedded in the game’s history and has an immediate appeal that attracts new fans and remains interesting to everyone, even if the main pleasure it provides for some becomes finding the ironies of W-L/ERA(+) disparities. I remember poring over the long lists of ERA-ordered pitching leaders in Sunday papers many years ago, picking out the outliers with high/low ERAs and and contrary W-L records, rooting for the anomalies to get deeper because they made good stories, like Ohlendorf – you’d root for justice to be served next year. I remember feeling that Dick Donovan was proof that the world had moral meaning – you can’t get that kind of statistical drama without W-L. It’s a rare stat that can operate on two levels like that – of course, its interest depends on people continuing to point out its deficiencies . . .

  6. The last four guys before Ohlendorf to go 2-14 over a two-season period:

    Jason Jennings (2007-2008) -1.9 WAR
    Heathcliff Slocumb (1997-1998) -0.2 WAR
    Rod Nichols (1990-1991) 2.0 WAR
    Juan Berenguer (1980-1981) -1.5 WAR

  7. Four pichers have gone 2-14 in a single season:
    Darold Knowles (1970) 2.9 WAR
    Anthony Young (1992) -0.4 WAR (part of Young’s astounding 27 decisions in a row taking the loss over 1992-1993)
    Anthiony Reyes (2007) -1.0 WAR
    Jim Brown (1884) -4.5 WAR

  8. I do think it’s fair to call Ohlendorf’s 2010 record an extreme outlier (1-11, 100 ERA+, 1.9 WAR).

    I looked at all SP seasons since 1893 with no more than 3 wins and at least 10 losses. Ohlendorf’s 1.9 WAR was tied for 3rd-best out of 143 seasons. Only 6% had as much as 1.0 WAR, and just 17% had positive WAR.

    But setting the W% threshold at a still-awful .333-and-under (Nolan Ryan ’87), with a minimum of 12 decisions and ERA+ at least 100, we get 130 such seasons — about 1 per season. It’s not so rare to pitch as well or better than average and still get hung with an awful season record.

  9. I looked at the 120 SPs in 2011 with at least 15 decisions. On first glance, here are the biggest discrepancies between W-L record and ERA+ (in no particular order):

    Lucky Stiffs:
    John Lackey, 12-12, 66 (6.41 ERA)
    Brad Penny, 11-11, 77
    Jake Westbrook, 12-9, 78
    Kevin Correia, 12-11, 80
    Carlos Zambrano, 9-7, 81
    Jake Arrieta, 10-8, 82
    Dillon Gee, 13-6, 84
    Rick Porcello, 14-9, 86
    Max Scherzer, 15-9, 92
    Josh Tomlin, 12-7, 93
    Aaron Harang, 14-7, 98
    Zack Greinke, 16-6, 102
    Jaime Garcia, 13-7, 102
    Kyle Lohse, 14-8, 107
    Yovani Gallardo, 17-10, 111
    Derek Holland, 16-5, 113
    Ivan Nova, 16-4, 119
    Ian Kennedy, 21-4, 137

    Black Clouds:
    Doug Fister, 11-13, 139
    Tim Lincecum, 13-14, 130
    Jhoulys Chacin, 11-14, 124
    Brandon McCarthy, 9-9, 122
    Hiroki Kuroda, 13-16, 121
    Guillermo Moscoso, 8-10, 120
    Matt Garza, 10-10, 118
    R.A. Dickey, 8-13, 113
    Felix Hernandez, 14-14, 111
    Madison Bumgarner, 13-13, 111
    Bartolo Colon, 8-10, 111
    Paul Maholm, 6-14, 105
    John Lannan, 10-13, 104
    Mat Latos, 9-14, 102
    John Danks, 8-12, 97
    Jeremy Guthrie, 9-17, 95
    Jason Hammel, 7-13, 94
    Brett Cecil, 4-11, 90

    That’s 36 of 120 pitchers whose W-L record is pretty misaligned with their ERA+. Even if you think a couple of guys don’t belong on this list, it’s still at least 1/4 of regular starters. For all these guys, their W-L record is nothing but misinformation.

    • Two things:

      One, what was your methodology here? I can imagine one, but I’m not sure if I’m on the right track or not.

      Second, it’s really funny to see Zack Greinke on that first list. Talk about a guy with a funny season. W-L actually pretty accurately reflects how well he pitched last season – his xFIP was the best in MLB (1.65, I believe). It’s just that his HR/FB rate was way out of wack. But team performance somehow “corrected” for it by scoring “too many” runs when he pitched, and his record actually pretty shows how good he was, with the insane K rate (10.5). Weird, weird season.

      • Doc — I used no mathematical model; I just went by feel.

        About Greinke and xFIP, I wanted to stick with actual runs allowed, rather than what they “should” have allowed. I have no particular kick against xFIP, but for these purposes I didn’t want to carry the idea of “luck” to what some may consider an abstract level.

        BTW, Greinke’s ERA in his wins was 2.55. That’s very high for a guy with 15+ wins. Out of 20 pitchers with 15+ wins last year, only 2 had a higher ERA in their wins, and all the rest were at least half a run lower than Greinke:

        ERA in Wins
        Clayton Kershaw, 0.79
        James Shields, 0.80
        Cliff Lee, 1.06
        Gio Gonzalez, 1.37
        Yovani Gallardo, 1.45
        Jered Weaver, 1.51
        Ricky Romero, 1.59
        Roy Halladay, 1.60
        Justin Verlander, 1.67
        Tim Hudson, 1.73
        C.J. Wilson, 1.81
        Dan Haren, 1.82
        Derek Holland, 1.95
        Ian Kennedy, 1.94
        Jon Lester, 1.98
        CC Sabathia, 2.00
        Daniel Hudson, 2.03
        Zack Greinke, 2.55
        Ivan Nova, 2.73
        Max Scherzer, 2.81

        • But it does tend to wash out over a career.

          Since 1901, how many pitchers (min. 200 decisions) have ERA+ of 110 or more and a W-L% of .450 or less? Answer: one

          Ken Raffensberger, 1939-1954, 110 ERA+, 119-154 W-L

          Okay, so we’ll lower it to ERA+ of 105 and W-l% <= .450. Will pick up a bunch more, right? NOT. Picked up one more guy.

          Johnny Schmitz, 1941-1956, 108 ERA+, 93-114 W-L

          Over a career, doesn't seem very likely that incongruities between ERA and W-L will persist.

          • See my comment #19. Incongruities between ERA and W-L do tend to wash out over a career, but there are still a significant number that do not.

          • There is a similar low number of outliers in the other direction.

            For pitchers with 200 decisions and W-L% of .550, there are only 3 with an ERA+ under 95.

            Ross Grimsley, 92 ERA+, 124-99
            Russ Ortiz, 94 ERA+, 113-89
            Jack Billingham, 94 ERA+, 145-113

            I don’t see a significant number of pitchers whose careers show incongruent ERA and W-L%.

  10. As expected, it is clear that over a career, W-L is a pretty terrible predictor of overall pitching quality. And it is likely a bit worse when you consider that teams with poor run support may have defenses that let ERA look worse than it would be under neutral conditions. But any stat that is somewhere between very close & ballpark a max of about 3/4 of the time is a absurdly sloppy & inaccurate for rating individuals.

    ERA + is exponentially closer, though even that can be tweaked for a more granular look, & at least occasionally is not so close to “right” in reflecting pitcher performance.

Leave a Reply

Your email address will not be published. Required fields are marked *