Building a Better Pitcher Wins Metric

Dr. Doom provides us with another new metric to measure wins contributed by pitchers. Not  wins above replacement, just wins, plain and simple. More after the jump.

Greetings, everyone!

I’m glad you all enjoyed my last series on what I called a new version of Pitcher WAR, but was really just a way of re-framing individualized W-L records. Well, here I am today with yet another new approach.

You probably know by now from my many posts and comments that I adore messing around with baseball numbers and learning more from them. The individualized W-L records I mentioned before are something I’ve been doing for years. At least half a decade, I would guess. But this post is about something I’ve only been horsing around with for a month or so, so enjoy!

Yet again, we’re going to use a pitcher’s ERA+ to figure out a lot about him. This time, though, we’re going to use his actual ERA, as well. When you see a pitcher’s basic stats on Baseball-Reference, you see his ERA and his ERA+. Those are two separate pieces of information. However, they also tell you a third piece of information: a pitcher’s expected ERA. This is obvious. If Bret Saberhagen had a 180 ERA+ in 1989, that tells us that the expected ERA for a league-average pitcher given Bret Saberhagen’s parks would be 80% higher than Saberhagen’s actual ERA of 2.16. In other words, an average pitcher would’ve had an ERA of 1.8 times 2.16, which is 3.89.

Now, we get to even easier math territory. Every nine innings, Saberhagen saved his team 1.73 runs, or .192/inning.

We also know how many innings Saberhagen pitched in 1989: 262⅓. If we multiply the rate of run-saving (.192/inning) by the number of innings (262⅓), we can say that Saberhagen saved a total of 50.368 runs in the course of the year.

OK, that’s all well and good. But we can go another step. And this is where things get interesting, in my opinion. We know that Runs are a bad currency. They’re a bad currency because a run in the 1930 Baker Bowl and a run in 1966 Dodger Stadium are not worth the same thing. That’s why SO many stats use “Wins” instead of “Runs.” Of course, one could convert to wins, that convert back to a run number that would please people more. I’ve often thought that would be a good idea. Alas, no one’s really doing that, and it’s easy enough to do if you really care that much. But for my purposes today, we’ll just use Wins as our currency.

We will recall that we expected an ERA of 3.89. That means that we can take the total number of Runs Saberhagen saved (50.368) and divide by the Expected ERA (3.89) to get a number of Wins: 12.95. Let’s do a little bit of rounding and call it 13 wins. Bret Saberhagen was worth 13 wins in 1989.

Right now (8/23), Chris Sale is cruising to the AL lead in ERA+. His is 220, with Trevor Bauer at 199 and Blake Snell at 196. How do we value these? By this method, Snell remains in third place, with 7.6 wins. Sale and Bauer, though, are closer, with Bauer actuall sneaking into the lead, due to his having pitched 20 more innings than Sale.

This kind of counting stat may be a little more satisfying to some of you all on here. I’m not trying to just inundate you with random thoughts, but I thought you all might be interested. I’ll leave you with some great all-time seasons, and you can see how they stack up to one another (listed with actual W-L, ERA+, and then the assigned W by this formula):

Bob Gibson, 1968 – 22-9, 258; 20.7
Steve Carlton, 1972 – 27-10, 182; 17.3
Dwight Gooden, 1985 – 24-4, 229; 17.3
Sandy Koufax, 1966 – 27-9, 190; 17.0
Roger Clemens, 1997 – 21-7, 222; 16.1
Pedro Martinez, 2000 – 18-6, 291; 15.8
Tom Seaver, 1971 – 20-10, 271; 15.4
Greg Maddux, 1994 – 16-6, 271; 14.2
Randy Johnson, 2002 – 24-5, 195; 14.1
Jake Arrieta, 2015 – 22-6, 215; 13.6
Justin Verlander, 2011 – 24-5, 172; 11.7

Seeing Gibson all those lightyears ahead really gives you a renewed appreciation for that ’68 season, doesn’t it? Gibson had the 3rd-highest ERA+ in the group, as well as the 3rd-highest innings pitched. Yet, the two in combination make his 1968 perhaps the best season in the Liveball Era (and not as far away from many great Deadball seasons as you might think, actually).

Thanks again for bearing with me through another arithmetic-heavy post. Hope you enjoy when I write these once in a while. Anyway, friends, what do you think? I look forward to your comments/criticisms below!

For those interested, a spreadsheet can be downloaded here containing Dr. Doom’s WAR metric (nWAR) from his previous post and his new Wins metric (nWAA) introduced in this post (note that calculations for both are based on FanGraphs ERA+ metric). The spreadsheet contains all 50+ IP seasons since 1961 and includes a pivot table for displaying season-by-season results for any pitcher from that period.

107 thoughts on “Building a Better Pitcher Wins Metric

  1. Voomo Zanzibar

    Most Wins, with WAR a higher number than Win Total:

    6 / 6.2 … Ted Abernathy
    5 / 5.5 … Dan Quiz

    And a bunch of guys with 4. Almost all relievers.
    Here’s the list of pitchers with at least 10 Starts:

    4 / 4.1 … Eddie Smith (23 of 38 appearances were Starts)
    3 / 3.9 … Hoyt Wilhelm (10 of 39)
    2 / 2.4 … Rod Nichols (16 of 31)
    2 / 2.3 … Tom Hausman (10 of 19)
    2 / 2.3 … Ross Baumgarten (23 of 24)
    2 / 2.3 … Pascual Perez (14 of 14)

    Right now, Jacob deGrom is at 8 Wins, and 7.7 WAR
    ____

    Pascual ended his career with that season, for the terrible 1991 Yankees.
    Ross Baumgarten labored to a 2-12 record for the 1980 White Sox.
    He got 2.05 runs of support per game.

    Reply
  2. Bob Eno (epm)

    More fun stuff, Doom — I like the way these new stats shuffle the basic numbers in ways that offer new vantage points for us to think about famous and non-famous pitching seasons. In a moment I’m going to point to an issue with the concept of “wins” in this formulation that I think is fun/puzzling to think about. But I think I’d better say first that I’m actually “e pluribus munu” (epm), writing under a pseudonym — I’ve gotten really, really tired of using my HHS screen name, which is sort of silly . . . well, very silly . . . ok: it’s just plain stupid, and I’m swapping it for a version of an only slightly less odd name that I found on my birth certificate.

    The thing about “wins” in your new formulation — I’ll call them DWins (DoomWins) for now — that has me a little puzzled is how we express what the concept signifies in relation to real-world wins (RWins). An example that illustrates the problem is Jacob deGrom 2018 (so far), who looks like this:

    Jacob deGrom, 2018 – 8-8 216; 13.2

    Obviously, deGrom’s value in DWins far exceeds his actual RWins, I think I can deal with that. But if “DWins” really means “wins above a league-average pitcher,” what this seems to me to suggest is that the average pitcher, with an expected ERA of 3.69 (deGrom’s ERA x 2.16), with whom deGrom is being compared, would have -5 RWins (deGrom’s RWins minus 13).

    So from this I conclude that either:
    (a) you don’t intend DWins to be related to RWins, in which case, how are we to understand the “Wins” in DWins?;
    (b) my thinking is screwed up;
    (c) we should consider DWins in the same universe as nWAR, which, for deGrom, would, I think, be about 16-3 — perhaps meaning that an average pitcher would be 3-16 under deGrom conditions . . . which also suggests something’s a little screwy — the Mets’ run support for deGrom has been awful, but, I think, not that awful (3.62 vs. LgAve 4.39) — and, again, the screwiness may be in my thinking.

    Reply
    1. Bob Eno (epm)

      Oh! Wait a minute: DWins is surely net on W-L, that is, wins over .500. So an average pitcher in deGrom’s shoes would have about 1.5 DWins, not -5, and, in nWAR, would be 9.5-9.5. Problem solved? It would be just the sort of basic conceptual error I make most frequently and embarrassingly. Or have I just misconstrued it all in a way that just has has the appearance of plausibility?

      Reply
      1. Dr. Doom

        Bob,

        (It feels SUPER weird to address you as such) The relationship here isn’t directly “to” anything. It’s closest to the “above .500” thought, as a pitcher who is worse than average will come out with a negative number. What you’re really counting is runs; it’s ALWAYS runs. The only reason I translate it to “games” is to make the units work out, so that we don’t conclude that Pedro’s 2000 was better than Gibson’s ’68. Pedro saved 80.1 R by this measure in a 5.06 R/G environment*; Gibson saved 50.9 R in a 2.89 R/G environment. If we DON’T go back to “W,” and just stick with the raw R total, we end up with a skewed idea of who’s better.

        *For the record, in all of these places in this post in which I talk about the “R/G environment,” I’m not talking about Runs, I’m talking about EARNED runs. Below, I’ll talk about a “4.00 R/G environment;” that’s actually assuming about a 700-run environment for the season, in which a little under 10% of runs were unearned (about 1/3 of a R/G). I think that’s pretty accurate; as close as one can get, anyway, while keeping the math super simple.

        Of course, the whole idea of calling these W is misleading, yes. I don’t deny that, but I don’t really have a better word for it. Since we know that both Gibson AND Pedro pitched in immensely unusual historical environments, we could take those “W” totals and convert them back to runs by pretending that both pitched in a 4.00 (for example) R/G environment (see above). That would tell us that Pedro was ~63 R above average, while Gibson was about 83 R above average.

        This is VERY different than the last series of posts that I made, Because I’m specifically NOT trying to match a “real world” number. In the last post, the numbers corresponded (nearly) perfectly with ACTUAL W-L record (the only reason it wasn’t a perfect correlation was that I value elegance over work; so sue me). In this particular instance, we’re instead counting R and just “converting the currency” to something that will make up for the run environment. The “W,” then, are not “real” W, but they’re closest to “above .500.” The better way to think of it is as R.

        Reply
        1. Bob Eno (epm)

          Doom, I wonder whether your concept isn’t better expressed in terms of “Games,” as in “games behind” (or ahead), rather than “Wins.” For example, the Mets with deGrom are 16 games behind. With an average pitcher in his place, we’d expect them to be 29 games behind — his performance has probably raised them 13 games in the standings. Trevor Bauer (AL leader in DWins) is a key to why Cleveland is 12 games ahead. With a league-average guy they’re projected as only 3 games ahead, but Bauer has boosted them 9 more games ahead.

          It’s still not real world, because GB are division numbers; you’re measuring against league average, so “Wild-Card Games Behind” would be more appropriate . . . except that no one thinks that way in the real world, other than about a few teams for a few weeks at the end of the season.

          Reply
  3. no statistician but

    I suspect this stat isn’t supposed to measure career results, but I started working out career results for some prominent pitchers anyway, and here they are, within 1 or so, since I averaged a little.

    Cy Young 817
    Roger Clemens 731
    Walter Johnson 670
    Kid Nichols 663.5
    Lefty Grove 643.5
    Greg Maddux 562
    Randy Johnson 528
    Pete Alexander 519
    Christy Mathewson 410
    Tom Seaver 409
    Kevin Brown 365
    Carl Hubbell 355
    Warren Spahn 342
    Bob Gibson 341
    Mike Mussina 340
    Steve Carlton 336
    Curt Schilling 336
    Bert Blyleven 331
    John Smolz 321
    Whitey Ford 320.5
    Jim Palmer 316
    Gaylord Perry 315
    Tom Glavine 314
    Bob Feller 302
    Phil Niekro 300.5
    Hoyt Wilhelm 295.5
    Mordecai Brown 282
    Eddie Plank 260
    Juan Marichal 257
    Andy Pettitte 239
    Don Drysdale 236
    Billy Pierce 231.5
    Nolan Ryan 227
    Sandy Koufax 222
    Tommy John 194
    Bob Lemon 193
    Rick Reuschel 185
    Early Wynn 126
    Herb Pennock 87
    Jack Morris 82
    Jamie Moyer 59

    I didn’t bother with a handful of prominent pitchers like Robin Roberts, but what surprises me about this list is the placement of various pitchers. It’s a cumulative stat, insofar as the number of innings pitched jacks the total, and yet Nolan Ryan is about where I think he ought to be, close to the bottom, barely above Tommy John. Clemens and K. Brown deserve asterisks, in my opinion. Alexander coming in a hundred above Mathewson is a shock. Lefty Grove, despite his truncated career—truncated in the beginning, not the end—ranks amazingly high. Schilling, Ford, and Wilhelm appear to great advantage, considering their Innings pitched. Koufax is hurt by his early retirement. Reuschel, a favorite of many, doesn’t fare that well.

    Reply
    1. no statistician but

      On reflection, I think I must have done the calculation wrong, since my numbers appear far too high, compared to the ones in Doom’s post. So just ignore the above, since I can’t delete it.

      Reply
  4. Bob Eno (epm)

    A tangent, if Doom will forgive it. There’s another interesting article on FiveThirtyEight.com that relates to the changes we’re seeing in MLB play. This one focuses on how the A’s have managed to put together a surprise winner. The analysis engages a pretty wide variety of relevant stats, ranging from Statcast data (in a general way) to advanced stats like pitcher WPA. (It also includes a graph more unfriendly to the human eye than any I’ve seen.) Without specifically addressing the issue, the article gives one good overview of the way that hitters and pitchers are adjusting their games in ways that feed into a TTO-heavy outcome.

    When we all started following Bill James and Co., it was frustrating to see baseball people largely ignoring the potential that advanced stats offered for leveraging new types of on-field value out of basic skills. The dinosaurs in charge still seemed to be singing, “All ya really need is heart!” Now everything has changed, but much as I admire the thinking nature of the contemporary game, I feel a little threatened by the degree to which that leveraging has come to dominate the way players are trained and games are executed. It’s hard to express, but it’s a bit like the way our lives have altered because of the transition to computer-dominated work and play. There are tons of almost addictive skills to master and stuff to do (like commenting on this blog), but it all seems to push getting out in the woods, a Sunday drive, or a day at the beach to the margins (the implied analogy is to things like free-swinging batters, pitchers pacing themselves to bear down in the clutch, etc.).

    When I think this way, though, I remind myself that I probably sound like a fogey from the 1890s, deploring the rise of “scientific baseball.” Reading about the Cleveland infielders and the A’s players on the hopelessly wonky 538 site, what strikes me most is how well educated in baseball MLB players seem to be — as though they’d done graduate coursework in it . . . and I hadn’t.

    Reply
    1. Dr. Doom

      I understand and agree with some of your points. I would like to propose two arguments, though:

      1. MLB could do a lot to fix this problem. The biggest one I know is one a friend from college always talked about: handle thickness. Maple is a very supple wood, and basically all bats are made of it. To get the “whip” effect they have, the handles are SUPER thin. Gradually expanding the thickness of those handles would really go a long way toward driving away some of the power hitting we see. They could also (very slightly) deaden the balls. Either of these changes would take very little effort, but could have tremendous impact – not necessarily on offense overall, but on power hitting, which would change offensive strategy.

      2. People – the A’s included – still don’t always understand Bill James and Moneyball. The trick is not that there’s a one-size-fits-all solution; it’s finding inefficiencies. This is (at some point) going to include finding undervalued players who succeed based on doing things different from the prevailing wisdom. Likewise, Bill James has talked about evaluating players, not by what they can’t do, but by what they can. This is SO important in this (and every, I guess) era. Someday, someone’s going to find players who do things differently, and they will succeed by the strategy, and people will copy it, and it will be the new “hot” thing. This has happened like 1000 times in baseball, and I suspect it will again. But we’ll see… personally, I’d rather have MLB do something about it, but we’ll see. Waiting for the game to correct itself might take long enough that it could break the game before that happens.

      Reply
  5. no statistician but

    Here’s a second attempt at applying this formula to career stats. If I’ve got it wrong this time, I think I’m at least closer. I’ve added in a couple of prominent relievers and a few more starters. If I’ve missed any significant starting pitcher in the live ball era—one with a rating of over 60, let’s say—I’d be surprised, but not absolutely amazed.

    Cy Young 224.94
    Walter Johnson 209.50
    Roger Clemens 164.24
    Kid Nichols 151.76
    Pete Alexander 150.00
    Lefty Grove 141.78
    Christy Mathewson 141.33
    Greg Maddux 134.51
    Randy Johnson 119.21
    Tom Seaver 112.79
    Mordecai Brown 98.71
    Pedro Martinez 99.04
    Warren Spahn 94.03
    Carl Hubbell 92.76
    Bob Gibson 92.38
    Eddie Plank 90.84
    Jim Palmer 88.22
    Whitey Ford 87.48
    Gaylord Perry 86.77
    Bert Blyleven 84.66
    Hoyt Wilhelm 79.20
    Phil Niekro 78.60
    Kevin Brown 78.08
    Bob Feller 77.12
    Curt Schilling 76.51
    Steve Carlton 75.01
    Tom Glavine 74.96
    Mariano Rivera 73.08
    Mike Mussina 71.64
    John Smoltz 67.62
    Don Drysdale 67.29
    Dazzy Vance 65.92
    Nolan Ryan 63.36
    Sandy Koufax 61.63
    Robin Roberts 60.89
    Billy Pierce 58.65
    Andy Pettitte 53.06
    Tommy John 52.05
    Rick Reuschel 48.05
    Rich Gossage 41.52
    Early Wynn 33.25
    Herb Pennock 21.50
    Jack Morris 20.52
    Jamie Moyer 13.95

    Reply
    1. Dr. Doom

      This is what I have (well, I didn’t round until the end, so we’re within rounding errors of one another… doesn’t matter). Except, you have Pedro (correctly) with more wins than Three-Finger Brown, but listed below him for some reason. Typo, I guess. 🙂

      A couple of players you left off:

      1. In COG discussions, Tiant v Reuschel has been deliberated ad nauseum. Unsurprisingly, they come out nearly the same here: Tiant at 47.6, Reuschel at 48.4 (again, I’m using my numbers, not yours, so we’re a little off from one another, but not enough to substantively change things; it’s also possible you used their actual ER/IP, rather than the “listed” ERA, which is what I used).

      2. I notice you did decide to include Robin Roberts this time. I find it fascinating how close he is to Koufax, and how low he ranks overall.

      3. Mariano > John Smoltz… not as a reliever, but INCLUDING all those starter innings. This is without a leverage adjustment. This is an absolutely incredible find, and really starts to put some things in perspective, I think. I actually have Mussina and Rivera switched on my spreadsheet… but again, we’re probably just within a rounding decision somewhere along the line, and it makes that kind of a difference. Anyway, I think it’s crazy how well he comes out here.

      4. I know you mentioned in an earlier post how you believe Clemens deserves as asterisk. And I don’t want to litigate that here. But from a strict “numbers” standpoint, this ranking (#3) is probably where he “fits” best, in my opinion.

      5. Pedro is like two thousand innings behind EVERYONE in front of him, except Randy Johnson and Lefty Grove, and even the difference between him and Grove is over a thousand innings. Yet, Pedro was so ludicrously effective that he still ranks #11 by this method. This is basically because they’re being compared to average, rather than replacement or zero, but it’s still cool to see.

      Reply
      1. no statistician but

        Doom:

        I left out active pitchers for the simple reason that pitching stats tend to decline toward the end of a career, and depending on how far the player prolongs it, the more those negative years will drag down the final reckoning.

        Also, you are right about the third innings computation(below). I just took the .1 or .2 as read.

        Mariano’s high figure to me simply indicates that this isn’t the best stat with which to compare modern closers and starters of previous eras.

        Robin Roberts—he had a 6-year mid career slump—or five with one outlier—which sinks his total.

        Finally—the guy I definitely missed who tallied above the 60 mark (thought it might be Urban Shocker, but he fell a little short) was Hal Newhouser at 76.87. His career arc is a little like Koufax’s, but he hung on longer.

        Reply
        1. Dr. Doom

          The Mariano thing is primarily because this is comparing to average. If you compared to replacement, the other guys would move WAY ahead of him. Most of a player’s value, as Bill James is fond of saying, is in BEING average. That means that we’re losing a lot of value for thousands of innings pitched, which would push, say, Nolan Ryan, light years ahead of Mariano.

          Reply
    2. Bob Eno (epm)

      Thanks for doing this, nsb. I felt bad when you had to withdraw your first tabulation, because so much work went down the tubes. And here you’ve done it again. It bears out the fun in Doom’s new number.

      Among active players, I’d guess that Kershaw was leading, and his total would be 86.38, good for 20th place on your list. (Although Ed Walsh, at least, could be added higher up, at 103.77, according to my figures.)

      Reply
    3. Dr. Doom

      nsb,

      I did some deeper digging this morning. Wow, did you pick a good threshold for “significant.” I checked 8 prominent pitchers this morning, and five of them came in between 55.9-59.6 without going over 60. Those pitchers were David Cone, Max Scherzer, Zack Greinke, CC Sabathia, and Johan Santana. Santana was the one at 59.6, which is awful close. Some of them will likely go over that threshold in the next couple of years, barring falling off a cliff.

      I did find three players over 60. Bob already mentioned Kershaw. The other two over 60 that I found were Justin Verlander (60.3) and Roy Halladay (72.3).

      As you can see, you did name basically ever “prominent” starting pitcher of the liveball era, excepting the ones who debuted after 2000. But those are the hardest to think of, since their legacies aren’t quite firmed up yet.

      Unrelated, I thought of one other reason we might be getting different numbers on our calculations – and this goes for everyone else, too. Assuming you’re using a spreadsheet for your calculations, it’s important that you adjust one number as you’re copying data from Baseball-Reference. Innings Pitched is always listed with a .1 or a .2 on the back… but obviously, those should be written in thirds. So for Roy Halladay, for example, if you copy out “2749.1” as it says, you’ll get a (very slightly) different answer than if you correct it to “=2749+1/3” which is what I always put in my spreadsheets. It’s not enough to make a giant difference, but it’s something that could explain some small amount of variation between two findings.

      Reply
  6. Bob Eno (epm)

    Since Doom and nsb have created career lists for Doom’s new stats, I thought I’d work up a career list for a very different kind of aggregate: pitcher WPA. My goal in doing this is pretty simple. Doom’s (ERA+*IP)-based stat is terrific for its simplicity and the way it provides several valid bases of comparing pitchers by season or career. But there are a number of different perspectives on value, pWAR being an obvious one that lacks nWAR-based transparency and simplicity, but adds critical components nWAR leaves out (UER and DefEff being two important ones). pWAR, like nWAR, leaves out the specific contexts in which pitching acts occur, and since that’s what WPA is designed to do (and designed to do solely), it can provide a third angle alongside nWAR and pWAR for triangulating pitcher value. WPA is, more or less, a reduction of a game-log based assessment of the degree to which pWAR or nWAR reflect what actually happened in the context of the basic unit of value creation: the game.

    Since WPA measures pitcher value in terms of contributions to win-likelihood for each PA, and top relievers, who pitch few innings, conversely pitch in much higher-leverage average contexts, one of WPA’s virtues is that, empirically, it winds up locating starters and relievers quite reasonably on a single scale. That is, Mariano Rivera appears as a Top-5 career guy, on a par with Warren Spahn (#4) and Tom Seaver (#6), which intuitively seems like a reasonable assessment, so long as we understand that other aggregate scales will reflect the alternative view that high-quality seven-to-nine-inning starters are of greater value than a terrific one-inning closer (a point made by Doom in a comment he posted while I was writing this: half a pitcher’s value is just showing up, inning after inning).

    An important disadvantage of WPA is that the stat can’t be calculated prior to 1925, and there are gaps in the data until 1974. That means the great early pitchers like Young, Mathewson, Johnson, Alex, etc., are all missing. Lefty Grove, whose career began in 1925, is the earliest to make the Top-20 list, which looks like this (I eliminated active players; Kershaw would have been #10 at 40.66):

    1. Roger Clemens: 77.75
    2. Lefty Grove: 75.13
    3. Greg Maddux: 59.46
    4. Warren Spahn: 56.93
    5. Mariano Rivera: 56.59
    6. Tom Seaver: 56.43
    7. Pedro Martinez: 53.75
    8. Randy Johnson: 53.2
    9. Jim Palmer: 45.45
    10. Mike Mussina: 40.61
    11. John Smoltz: 40.51
    12. Bob Feller: 39.08
    13. Bob Gibson: 39.06
    14. Roy Halladay: 38.03
    15. Whitey Ford: 37.04
    16. Carl Hubbell: 36.05
    17. Gaylord Perry: 35.94
    18. Tom Glavine: 35.52
    19. Curt Schilling: 35.27
    20. Robin Roberts: 34.27

    (Smoltz does so well because he piled up 16.0 WPA in his four years as a closer.)

    Looking at this list (and bearing in mind that pre-1925 seasons are missing), the biggest surprise to me is Roy Halladay, who doesn’t even make nsb’s list, based on ERA+*IP.

    But I also believe that any time we do a list of career leaders in a counting stat (like nWAR or DWins), we should complement it with a rate stat that translates the same data into a per-inning figure. In this case, I did that with WPA (I used WPA/1000 IP) to generate a new Top-20 list. The way I did this was to calculate IP and sort on B-Refs Top-50 list for WPA. (I may have missed some, since Brecheen, #14 here, is the last name of that Top-50.)

    Mariano Rivera: 44.10
    Jonathan Papelbon: 39.01
    Joe Nathan: 33.14
    Billy Wagner: 32.19
    Trevor Hoffman: 31.35
    Lefty Grove: 19.07
    Pedro Martinez: 19.01
    Rich Gossage: 17.97
    Roger Clemens: 15.81
    Sandy Koufax: 14.64
    Johan Santana: 14.08
    Roy Halladay: 13.83
    Hoyt Wilhelm: 13.68
    Harry Brecheen: 13.64
    Randy Johnson: 12.86
    Greg Maddux: 11.87
    Tom Seaver: 11.80
    Whitey Ford: 11.68
    John Smoltz: 11.66
    Jim Palmer: 11.51

    Whoops! When you do that with a rate-based view, the closers’ context-bound pitching eliminates the appropriateness of placing starters and closers on a single scale when it comes to WPA. So list list is probably not worth considering (and I haven’t bothered to insert rank numbers).

    If we remove the pitchers who were primarily relievers for this exercise (e.g., eliminating Wilhelm, but retaining Smoltz), here’s a starters-only WPA/1000 IP Top-20 list (still eliminating active players — and there’s a tie for #20):

    1. Lefty Grove: 19.07
    2. Pedro Martinez: 19.01
    3. Roger Clemens: 15.81
    4. Sandy Koufax: 14.64
    5. Johan Santana: 14.08
    6. Roy Halladay: 13.83
    7. Harry Brecheen: 13.64
    8. Randy Johnson: 12.86
    9. Greg Maddux: 11.87
    10. Tom Seaver: 11.80
    11. Whitey Ford: 11.68
    12. John Smoltz: 11.66
    13. Jim Palmer: 11.51
    14. Mike Mussina: 11.40
    15. Bret Saberhagen: 11.17
    16. Warren Spahn: 10.86
    17. Curt Schilling: 10.82
    18. Kevin Brown: 10.30
    19. Kevin Appier: 10.25
    20t. Billy Pierce: 10.21
    20t.Bob Feller: 10.21

    Note that Grove’s lead over Pedro is so slim, that the data missing from his record (probably a scattered 15% of his games or so) might make a difference to their rank order. (Grove is certainly a positive outlier for his era!)

    This list puts the short career guys on a scale with long-career players, and we see some obvious new names (e.g., Koufax, Santana . . .), along with, for me, some other real surprises (Brecheen; Appier). I think that to optimize the purpose of this type of rate-based list, it might be best to consider 5, 7, or 10-year peaks. That’s probably really what we’re looking for when we ask questions about rates: players who were most dominant over some historical stretch of time, not necessarily including their career tails. Unfortunately, no ready-made tables for this exist, and I’m not sure how to tailor such a search using the P-I.

    There’s both overlap and some significant differences between a list of pitcher value based on ERA+*IP and WPA (not to mention pWAR). I think it would be enlightening to compare in detail the way that these various measures of pitcher value rank pitchers on a season and career basis, realizing that the final assessment of how value is determined has to rest with our judgment, made in light of these multiple perspectives.

    Reply
    1. no statistician but

      Bob:

      Some not-so-random comments:

      Although there are a few surprises, what strikes me about these various lists is there relative sameness—the same guys over and over. What we’re doing is looking for different ways to evaluate pitching effectiveness, and every version, within reason, gives the same names—in sightly different order, true, but for a majority of them the variation in place is relatively small. Omitting the pre-1925 crowd and relievers, you see Clemens, Grove, Martinez, R. Johnson, Maddux, Spahn, Seaver, Ford, Palmer, Hubbell, Gibson, and a gaggle of others who alternate in the remaining top twenty spots.

      And when you compare pWAR to these lists you get a similar reading except for the fact that a large number of accumulators blur (contaminate? pollute? infiltrate?) the ranks. Except—back to one of my long-standing sore points with pWAR—that Ford, who appears in the top 15 in all these reckonings, usually around 10th or 12th, ranks 85th in WAR, just .3 pWAR above Billy Pierce, who falls far beneath Ford in every significant measure, and below Drysdale, Bunning, Marichal—contemporaries who also fall far below Ford in terms of pitching effectiveness, and who had careers of the same approximate length. It’s easy to understand that Koufax ranks relatively low—tied with Pierce in pWAR—since his career was cut short. Less and less do I have tolerance for the view that Yankee Stadium and Gil McDougald playing behind him make up the sum of Ford’s talents. In his time he got the opposition out far better than anyone else in the AL, with only Spahn and Koufax in the NL as rivals in this regard.

      Reply
      1. Bob Eno (epm)

        Responding to your second point first, I agree: Ford’s pWAR is indeed a problem (and, if I recall, it interfered with his easy election to the CoG). Your comment on Ford (which I think is completely justified — notwithstanding that, as a Brooklyn fan, he represented the Dark Side to my eyes) leads me to note that although we have no WPA data for him, Three-Finger Brown is similarly pummeled by pWAR, principally, I think, based on DefEff figures for the Cubbies behind him in the field, Gil McDougalds all. You’ll recall how skeptical I was of that calculation, despite bungling my own. On nsb’s incomplete list, Brown comes in 12th, six notches up from Ford — 13th if you add Walsh, and I’d bet there are a handful of other deadball era hurlers who might be above him, but almost surely a Top-20 guy. pWAR puts him 69th, if you eliminate active players, 13 notches above Ford. The disparity is very large, and seems to apply about equally to those two pitchers.

        I think WAR attempts to do the thing that should be done: comprehensively survey all relevant dimensions that can be reduced to numeric form, and smoosh them together. I don’t have much faith in the judiciousness of the formula used (in part because I can never remember how the whole thing goes), and I think it’s anomalies like this — parallel Ford/Brown issues — that are the keys that should be used to either explain why the formula should override the problems our intuition and simpler stats flag, or to improve the formula by analyzing why some cases are skewed. Still, WAR does do a good job in general — that is, it doesn’t stray far from our intuitions in most cases, and where it does we can figure out that the reasons pertain to features that we don’t include in other stats and that may be hard to measure (e.g., strength of schedule, fielding). ERA+*IP has its own problems — it doesn’t include those features — as does WPA: e.g., the values concern PA outcomes, rather than individual player contributions, and the ever-varying challenges represented by specific opposition players (the equivalent of strength of schedule) are ignored.

        As to your general comment, I agree that the top of the leaderboard changes relatively little, and that’s what we’d hope for. But in the second tier, I think there are interesting variations. Steve Carlton is an example: 19th in pWAR, 26th on nsb’s incomplete list of “DWins,” and 41st among retired pitchers in WPA (and, of course, he’d be far lower if that list were complete). Robin Roberts is another: lower than 35th in DWins, but 22nd in pWAR and 19th in WPA (which would be more like DWins). If any list fails to have pitchers like Grove and Clemens near the top, it’s almost certainly based on a flawed set of criteria. But in this second tier I think there’s a lot to be interested in, seeing how different guys shine brighter or dimmer under different light.

        I’d really like to get the energy to compare career rate stats, and, better, peak stats under varying criteria. Aggregate and peak are really the initial two dimensions that shape a player’s profile. One thing I enjoy a lot about our CoG debates are how the two factors often bounce around our discussions as we try to settle on a mediating judgment.

        Final comment (an ongoing one): Your new stats have generated some really good strings — thanks!

        Reply
        1. Mike L

          NSB and Bob, nothing would make me happier than a re-argument of Ford’s worth. Ford’s WAR is either a product of some weird glitch in calculation, or a function of some deeper insight into his skills. The problem if you take the latter position, is that it seems to boil down to “don’t look at anything else he did as reflected by any other stats , traditional or otherwise, our secret sauce says, eh”.
          Thank you for letting me rant.

          Reply
          1. Dr. Doom

            So, the question with Ford (when trying to assess his value) is, “How do you separate him from Mickey Mantle and Gil McDougald?” If we want an accurate way of assessing his effectiveness, that’s the question to answer.

            One way of addressing this question, then, would be, “How does Ford do at the things his fielders have NO control over?” Those things are in the FIP calculation. If we use Ford’s FIP+ in the calculation I introduced in the previous three-part series, we see him with a record of 199-153 (actual record: 236-106; the formula using ERA+ record: 225-127). His “DWins” using ERA+ is 176.1; using FIP+, it’s 84.5 (my numbers may vary a little from Baseball-Reference, as I used Fangraphs data in my table; Ford’s numbers are, for all intents and purposes, identical across the two sites).

            When Baseball-Reference WAR is calculated, it tries to imagine an average defense and average hitters for a pitcher. FIP does the same, by simply REMOVING all the results tied to other players. It’s telling, I think, that Ford ends up with very similar WAR totals at both sites: 53.5 rWAR, 54.9 fWAR. They essentially see him as the same player, once they try to account for the players behind him. Now, obviously, you don’t have to buy that conclusion. But it is interesting, I think, that these two calculations come up with such similar results for Ford. In his case, the argument would go, it’s not that Ford’s WAR is some secret: it’s that he played in very extreme conditions unlike almost any other player in history, which combine to form a drastically different impression on him, depending on whether you look at traditional or sabermetric stats.

          2. Mike L

            Doom, I understand the argument. Interesting sidebar to this–Eddie Lopat’s Yankee numbers, which went from his age 30-37 seasons. 113-59, ERA 3.19, ERA+121, FIP 3.60, total WAR 17.5, maximum WAR in any Yankee season, 3.7. Someone on HHS, several years ago, tried to reverse-engineer what Ford (or any pitcher) would have to do to excel in WAR with the Mantle/MacDougald axis, and it was ridiculous. I’d make an argument that the dampening and smoothing effect may have a disproportionate impact on a handful of pitchers, and would like to see a spreadsheet that tries to identify and compare them.

          3. Dr. Doom

            Replying to Mike L, regarding “what Ford would have to do to excel.”

            Ford did excel. But I’m not sure exactly what “excel means here, so I’ll try a few different thresholds.

            We could get him to 60 WAR (or darn close) just by giving him credit for his service time in Korea. That gets him at least 5 WAR, I would think. But assuming we’re not doing that, and assuming that Ford

            Ford had a career 2.75 ERA.
            To get to 60, he would’ve needed a 2.59 ERA.
            To get to 70, he would’ve needed a 2.36 ERA.
            To get to 80, he would’ve needed a 2.14 ERA.

            I don’t know if that seems ridiculous to get to those marks or not. That’s just what the math is telling me. Hope that helps.

            (If anyone cares how I arrived at those figures, the R:W conversion for Ford is 9.22:1; to get him, for example, to 6.5 extra wins, he would need to save 9.22*6.5 R, which is 60, meaning his total RA would have to be 60 fewer than it was. Since 7/8 of his Runs in his career were earned, and assuming we can hold that ratio, that means he needed to allow 53 fewer earned runs. 967-53=914 ER; 914/3170&1/3=.288 ER/IP; .288*9=2.59 ERA. Process is repeatable for the other benchmarks. Baseball-Reference credits Ford’s defense with helping him to the tune of .25 R/9, and his ballpark helping him to the tune of .19 R/9. His personal career Park Factor is listed as 94.5.

            Eddie Lopat is interesting. (All data I’m using is from his seven full seasons in NY; I’m ignoring the half-season at the end.) He faced offenses that averaged 4.60 R/9, and he held opponents to 3.66 R/9. That’s good. If all of that were a credit to him, that would be 25.3 WAR (that’s 147 RAA, plus 122 between average and replacement, for 267 RAA, in a context of 10.6 R/W). The thing is, the Yankees played in a good pitcher’s park, which takes off .28 R/9, and his defense helped him (according to bb-ref) to the tune of .21 R/9. That’s most of his advantage over an average pitcher coming from defense and ballpark. Instead of 147 RAA, he gets knocked down to 67 (that’s what baseball-reference says, anyway; I get 70… but then, I can’t see all the numbers beyond the decimal points, so presumably we’re pretty close). The ballpark is really what hits him hard. Lopat is also hit a little bit by the RA vs. ER approach. 86% of Lopat’s R were earned; the AL average at the time was nearly 89%. That doesn’t sound like much, except that works out to his ERA appearing about .09 lower than it would’ve, had his ER rate been more in keeping with the rate of other pitchers, so there’s that, too.

            I’m neutral on this; I’m not sure what the right answer is. I just like looking at the data. From my perspective, I don’t see anything outrageously unjust in these scenarios, but maybe that’s just me.

          4. Mike L

            So, let’s get back to Ford.

            To get to 60, he would’ve needed a 2.59 ERA.
            To get to 70, he would’ve needed a 2.36 ERA.
            To get to 80, he would’ve needed a 2.14 ERA.

            Since the cohort of pitchers who’s career didn’t have the bulk of it in Dead Ball Era, there are zero starting pitchers with an ERA of 2:14 or better, Clayton Kershaw (still active) with an ERA 2.36 or better, and, that’s it….

          5. no statistician but

            Doom:

            1) Maybe you realize this, but all you’re response does is confirm Mike L’s observation.

            2) What are sabermetric stats, exactly, if they don’t include RE24, ERA+, WPA, adjusted pitching runs, adjusted pitching wins, WPA/LI, REW, and even the reckonings you yourself have been working out? As opposed, it seems, to FIP, which apparently is an advanced stat and takes precedence over any others except WAR?

            3) Further, when you say that it’s telling that Fangraphs and B-R agree about WAR, all you’re saying is that they go by the same rules to reach their conclusions, meaning that those conclusions are generally going to agree. Well, golly . . .

            4) In looking up sabermetrics I came across a stat called true ERA, but I can’t find any applications of it. Do you know how it works and how to apply it? Would it help here?

            Sorry to be snarky, Doom, but when you take on that pityingly superior tone of the true believer lecturing to the misguided and recalcitrant propounders of heresy I have to object. When you say WAR tells us this is so and there’s an end to the discussion, I have to ask “What discussion?”

          6. Bob Eno (epm)

            On this issue, I tend to get simpleminded and resort to a “divide all people into two types” approach. In this case, the people are good pitchers, and the two types are good power pitchers and good finesse pitchers. Of course, it’s a spectrum, but I’m being simpleminded.

            Lopat, whom Mike points to as someone pWAR treats poorly, like Ford, was clearly near the far-end of finesse pitchers. Ford could strike people out, but his K-rate was modest, and he was closer to a Lopat than to a Koufax. (Mordecai Brown was on the finesse end of the spectrum as well, relying on the odd motion of his three-fingered pitch to keep batters off-balance, not necessarily to strike them out.)

            FIP is not going to show much when it comes to finesse pitchers. Their goal is to keep batters off balance with pitches that are not overpowering, and get them to hit easily fieldable balls. If a good finesse pitcher’s fielders are skillful, the team’s defensive efficiency numbers will be exceptionally high; if they are clumsy, the fact that balls in play have a greater than normal tendency to be weak or routine grounders or pop-ups will still let those fielders do their jobs well. Since fielding efficiency is attributed entirely to fielders in pWAR, good finesse pitchers are simply not credited with a major aspect of their primary skill. And fWAR is going to ignore entirely the strength of their game, which is to generate fieldable BiP and keep truly down all three TTO outcomes. The perfect finesse pitcher game would be 27 pitches and 27 ground-ball outs, with no foothold for FIP whatever.

            Power pitchers are more appropriately measured by fWAR, since they have high K numbers, and often high BBs as well, so much of their game is in TTOs (fastballs adding to this by boosting HRs as well). The perfect power pitcher game is 81 pitches and 81 strikes: an all FIP game. Their goal is not to generate easily fieldable BiPs (or is it BsIP?), and when hitters make contact and the ball stays in the park, launch velocity is much more likely to be high than is the case with finesse pitchers, creating added challenges for fielders and a tendency towards lower DefEff numbers, which raise pWAR figures.

            In the case of Lopat and Ford, in particular, those guys played on Stengel-era Yankee teams whose starting pitching staffs were overwhelmingly finesse pitchers, usually of good quality. I think we should understand that while talented Yankee fielders added value to the staff’s pitching performances, the talented staff of finesse pitchers very likely added value to fielding performance that is unrecognized by any advanced stat, including, or course, pWAR. But non-WAR aggregate stats, like nWAR (or ERA+*IP) or WPA will stay neutral on this ground because fielding is not independently factored in.

          7. Mike L

            Bob, you are right about Ford’s K rate, but that rate was also reflective of the era he pitched in. He was in the top ten seven times (with rates that modern eyes would say were almost disqualifying).

          8. Bob Eno (epm)

            That’s a good point, Mike. nsb’s post below reminds us that the gaudy K numbers we’ve become used to were rarities in most of baseball history.

          9. Mike L

            Bob, I’d never argue that the power pitcher doesn’t have natural advantages over the “feel” guy, but I guess what troubles me about the broader argument are two underlying biases. The first is that we seem to be discounting the net result (outs and runs) based on how that result was achieved. The second is I’m not sure we are adequately adjusting for managerial approach. Up until relatively recently, we asked starters to go deep, and the phrase “six-inning pitcher” was a pejorative. To do that, most (except the freaks of nature) had to pace themselves. And batters were coached differently as well. The true sluggers could take full cuts, but most others hit situationally– cut down their swings with two strikes, choked up, hit the ball to the opposite side, etc. This has less to do about Ford than maybe an era…are we unfairly judging actual on the field results because of the way they were accomplished?

          10. no statistician but

            Bob from nsb:

            I just accidentally deleted a long response to your comment, which I’ll now try to reproduce.

            What I said, basically, was that I thought your remarks were insightful, especially when you talked about the power pitcher’s desire to strike out the side on nine pitches for nine innings vs. the finesse pitcher’s desire to force 27 groundouts.

            My distinct impression is that deep within the perspective that drives pitching WAR is the overtly expressed or unarticulated assumption that there exists an ideal pitching performance against which all real pitching performances are to be measured. Further, my impression is that this ideal is in fact your concept of the power pitcher who strikes out the side on nine pitches for nine innings running. Why? Because even letting the batters make contact has the potential to unleash all kinds of unpredictable results, not just hits, but errors and risk taking.

            I have no right to fault this assumption, except to say that it is just that, an assumption, a bias about how baseball is to be interpreted. It’s important to remember that for most of baseball history pitchers were not governed by it, thinking in their ignorance that just getting the opposition out using whatever means available was the challenge.

          11. Bob Eno (epm)

            nsb, I think you’re definitely right, that the unstated ideal of pitching is the power pitcher. But I’m not sure the reason is because strike outs are the safest outs. I think its simply an expression of the way we value athletes. We cheer for the little guy who surprises us by turning limited natural advantages into a winning performance, but we’re in awe of the athlete who possesses super-normal physical gifts and puts them on display to good effect.

            I recall once reading a comment about Nolan Ryan, whom I saw as a player who wasted extraordinary talent by refusing until his last years to incorporate into his repertoire any finesse at all. The comment was that although Ryan’s games could be boring for teammates and wound up adding few wins to the team total beyond what an ordinary pitcher might add, teammates and opponents all respected Ryan because they understood how difficult it was to do what he was doing — I took that to mean throwing relentlessly in the high 90s for 120-140 pitches, game after game, without letting batters get base hits. Walks be damned! With a physical talent that rare, the W-L record took second place to the sheer uniqueness of his abilities.

            I don’t know that pWAR actually reflects that subjective bias: it may simply be that the statistical bias, which seems to emerge from rational and carefully thought formulas, reflects a lack of focus on the issues we’re discussing.

          12. Dr. Doom

            Sorry for sounding like I was trying to end discussion; it’s not my intention. I don’t even think I was defending the WAR ranking of Ford at all. On the contrary, I just think it’s interesting how things shake out. I don’t think there’s some mystifying secret formula that says “If player name = Ford, Whitey, subtract 25% of WAR.” I think it’s more that he pitched in unusual circumstances, was a very good pitcher, and got better results than other pitchers of similar ability due to the team on which he played. That’s basically what WAR says. It says he’s a Hall of Famer (in all likelihood, anyway), just not an inner-circle one. I’m honestly not sure why the Ford defenders here get so self-righteous about him. He was a great pitcher; literally no one is arguing that he wasn’t. And as for the COG, we’ve got… what… 40 pitchers in there? Isn’t right around the top-40 where you’d put Ford? Like, you know… on the borderline? I mean, I don’t see him as a gold-standard-top-20 guy that sails in, no problem. So the discussion is merited, I think. (I looked; it’s 38 pitchers. That was a good guess.)

            As for WPA, RE24, ERA+, WPA, and all the other things you’ve mentioned is that they’re all dependent, to some extent or other, on the defense playing behind you. McDougald turns a double-play that no one else could’ve? Helps your WPA, as well as RE24. All these things are measuring different stuff; WAR is the only one trying to account for defense. If RE24 or WPA were somehow redesigned to give credit to defensive players, as well as pitchers, each pitcher’s WPA or RE24 would be reduced by about 25%, and that would be more in-line with actual run scoring. In Ford’s case, might it not be a little more than that, given the team he played for? On a similar note, Ford’s career run adjustment due to defense is .25 R/9. That’s less than Jim Palmer’s, whose number is .33 R/9. Palmer has about a 25% edge in IP, which actually more than compensates for the difference in WAR. Ford was a workhorse in some of his seasons, but his career IP isn’t actually that high; if he had pitched 4000 innings instead of 3000, he would undoubtedly rank much, much higher in those career things. Catfish Hunter has the same career defensive reduction as Ford. There are other guys getting hit similarly (or worse), as it turns out.

            I kinda want to diverge into a discussion of why Fangraphs WAR and Baseball-Reference WAR are different, but I’m not in the mood. The short of it is: FIP v RA, sequencing, park factors. They’re not always close to the same; Jim Palmer, for example, differs by 12 WAR between the two systems. So I think Ford’s similarity is interesting, though I don’t necessarily know what it means.

            Last one, on true ERA. Yes, that applies here. It attempts to do what FIP does, without ignoring batted ball results (assuming you’re talking about the same one with which I’m familiar; it’s entirely probable that you’re not. Could be someone else out there with a similarly “pityingly superior tone” to my own who has the hubris to name something true ERA). It’s an interesting concept, though I think still struggles with what defenses behind him are doing. Batted balls are handled by the defenders in question, after all. I think the main point of true ERA, if I’m remembering correctly, was to take sequencing out of results, which FIP does and standard ERA doesn’t. It also looks at unearned runs, which is, frankly, a better idea. If I wanted to do anything with the pitcher stuff that I’ve done in these last four posts, I would do the same; but the point of all of these things I’ve been doing is elegance and simplicity over accuracy.

          13. Mike L

            Doctor Doom, let me clarify what I was saying, without any intended snark. The purpose of pitching is contextualized run prevention which then results in a higher probability of winning games. “Science” (sorry, can’t think of a better phrase) tells us that almost nothing bad can happen with a strike out, but ultimately, most outs are just outs. FIP emphasizes Ks over other outs. No one is questioning that pitchers who have better-fielding teams behind them will have greater success than those who have a team of older Jeters and Dick Stuart’s. But quality pitchers take advantage of what’s offered them, so it seems the penalty for not being a power pitcher is disproportionate to actual run prevention.

            And, on a collateral issue I’d like to be discussed here, how are these advanced stats going to take into account things like fielding positioning and shifts?

      2. Bob Eno (epm)

        You know, nsb, I replied to your comment somehow thinking you were Doom; that’s why you’re a third-person in my comment and I ended by attributing Doom’s stats to you. I guess it’s because you started by calling me by my name, which only Doom has done here, although there are actually other people in life who refer to me by my name, and I don’t think they’re all Doom. Doom said using my name was weird — looks like I’m going to have to get used to it here too.

        Reply
  7. no statistician but

    Since Base-Out Runs Saved has a similar aim, here is a somewhat revised career listing of pitchers who are ranked in the RE24 top 20 (to the right) versus their ranking using Doom’s Ws (from top to bottom):

    1. Roger Clemens 164.24—1
    2. Lefty Grove 141.78—2
    3. Greg Maddux 134.51—3
    4. Randy Johnson 119.21—5
    5. Tom Seaver 112.79—6
    Pedro Martinez 99.04—4
    7. Warren Spahn 94.03—7 Base-Out runs Saved
    8. Carl Hubbell 92.76—19
    9. Bob Gibson 92.38—14
    10. Jim Palmer 88.22—8
    11. Whitey Ford 87.48—10
    (Clayton Kershaw 86.38—12)
    13. Bert Blyleven 84.66—17
    17. Bob Feller 77.12—13
    18. Curt Schilling 76.51—11
    20. Tom Glavine 74.96—16
    21. Mariano Rivera 73.08—18
    (Roy Halladay 72.3—20)
    22. Mike Mussina 71.64—9
    John Smoltz 67.62—15

    And here are those bumped down out of the top 20 from my original list.

    Gaylord Perry 86.77—23
    Hoyt Wilhelm 79,20—38
    Phil Niekro 78.60—71
    Kevin Brown 78.08—21
    Steve Carlton 75.01—34

    Cy Young, Walter Johnson, Kid Nichols, Pete Alexander, Christy Mathewson, Mordecai Brown, and Eddie Plank from my original list date from the dead ball era and so are pre-RE24, as are several others, such as Ed Walsh, whom I did not list anyway, since I wasn’t attempting to be inclusive of pre-1920 pitchers.

    Reply
  8. no statistician but

    To change the subject to the present season, the BoSox have lost six out of eight, four of the losses coming at the hands of the Rays, and I’d say the Mariner’s 116-win season is now safely out of reach. To break that record Boston would have to finish with a 27-3 flourish. In the NL the Cards have put up a 19-5 record in August which is getting them a lot of coverage, but the Cubs haven’t been too shabby either, 15-8, and hold a 4 game lead in the NL Central, which also happens to be the biggest lead of any NL Division. The AL Central is somewhat the opposite. The Tribe holds a 13 game lead even though they have the worst record of the three division leaders.

    Just one note on individual players: Has anyone ever considered J. D. Martinez to be a strong candidate for a triple crown?

    Reply
    1. Doug

      I know I wouldn’t have been thinking about Martinez as a Triple Crown threat. But, as I write this, he’s one HR shy of leading in all of the TC categories, so, yeah, he’s right there. But, he’s coming off a red hot August (except in the HR department), so chances are he won’t be able to carry that to the house (he had a very similar August in 2016, but followed that with only a .736 OPS and 3 HR in Sept).

      Reply
      1. Dr. Doom

        The biggest impediments to a Martinez Triple Crown would not be a Martinez slump, I don’t think. I think the bigger thing facing him is actually the competition.

        In Batting Average, he’s only a hair ahead of his teammate Mookie (.337-.336), with the guy who won three of the last four AL batting crowns close behind (Altuve, .332). (I don’t think the previous batting crowns make Altuve more likely to win, for the record; I’m just pointing out that it’s not like it’s some fluke player due for a major regression in the final month; in fact, his current average, good as it is, is actually lower than it was over the last four years coming into 2018 (.334).

        In HR… wow can Khris Davis (39) hit ’em. Moonshots every day, it seems. It’s a good thing Chris-with-a-C Davis stopped hitting altogether, or the leaderboard would’ve gotten very confusing. Considering that Jose Ramirez (37) is right there, too, one behind Martinez. It’ll be tight.

        In RBI, well… that shouldn’t be a problem. Not that someone can’t come out of nowhere with a 10-RBI day that messes with the leaderboard. But for now, it seems pretty safe.

        This Triple Crown quest reminds me of the great forgotten one of the last decade. In 2012, the year AFTER Ryan Braun’s MVP season, he nearly won the Triple Crown, too. (Had he done so, I don’t know what the voters would’ve done, but I’m not sure Buster Posey would’ve won that MVP.) Chase Headley netted 6 RBI in the final five days, while Braun earned none, and Headley led with 115, Braun three back at 112. Andrew McCutchen had the batting title LOCKED UP in the first half (as late as August 3rd, he was still hitting .373), but McCutchen hit a dramatic slump, batting just .247 the rest of the way. From August 3rd on, Buster Posey, on the other hand, batted .368 and took the batting crown. But entering that final week, I remember that MLB.com was actually running TWO Triple Crown watches, because Miguel Cabrera was leading in all three in the AL, and Braun was leading in two with about two weeks to go. The thought was, “If Braun gets red hot in these last two weeks, he can top Posey and pass McCutchen. Alas, while Braun actually managed hits in 12 of the final 13 and batted .408 over that span, Posey batted .359 over the same span and Braun had to settle for third in the batting race, finishing at .319 (Cutch at .327, Posey at .336). I doubt anyone is upset about that now given the then-still-to-come revelations about Ryan Braun, but as a Brewers fan in the waning days of 2012 watching our playoff chances slip away, the hope of a Triple Crown and a possible/probable back-to-back MVP was very exciting, though ultimately not to be.

        Reply
  9. Dr. Doom

    I want to jump in to the conversation above about strikeouts relative value to other outs. Part of this comes down to thinking. Strikeouts are not more valuable than other outs. In fact, as a group, they’re probably LESS valuable, since other outs include double plays. On the other hand, they also include sac flies and sac bunts, which are surely better than strikeouts; maybe it’s a wash, but I think the double plays are so much more common than the other two (nearly 2:1 in 2018; as of this morning, 2842 GDP, 1672 sac bunts + sac flies).

    The thing is, that’s (I would argue) a warped perspective; you shouldn’t be comparing strikeouts to other outs. The alternative to a strikeout is not an out – it’s a ball in play. Balls in play, yes, include double plays. They also include hits and errors. I don’t disagree that most pitchers would prefer a 27-pitch perfecto to an 81-pitch, 27-K game. The issue is, that relies on a pitcher knowing for certain that a certain pitch will result in an out. If they know that for sure, why aren’t they just doing that all the time? I’ve never understood that logic.

    The reason “soft tossers” are effective is that they keep good hitters off-balance and induce bad contact… but it’s still contact. Take a guy like Jim Kaat (first “crafty lefty” that came to mind). Below average strikeout rates for his career, but very effective for a long time. Compare to Nolan Ryan. Both had long careers. Why did Kaat, the very effective soft-tosser, allow a higher BABIP than Nolan Ryan (.286-.269)? Perhaps you would argue that, although Ryan allowed fewer hits, he was allowing harder hits. OK… but the career SLG against Ryan was .298, to .387 against Kaat. If finesse pitching is this big advantage, why don’t we see it in the data?

    Ryan was an effective pitcher because he struck a lot of people out. You have to get 27 outs. For Ryan, 9.5 of those were taken out of his fielders’ hands every game because they were already gone. That means they only had to get 17.5 outs. Given an average BABIP of .300 (not Ryan’s average, just a hypothetical pitcher), that means it took about 25 balls in play to get those outs. Jim Kaat only got (we’ll round up to) 5 outs via the K. That means 22 BIP outs per game. Again, given the average of .300 BABIP (higher than Kaat’s, but not by as much as Ryan’s), that means 31.5 BIP outs per game. An extra 6.5 times (as compared to Nolan Ryan), the ball would go in play. Those are real, actual extra hits or errors coming as a result of Kaat’s style. This is why Kaat allowed more than 2 H/9 more than Ryan – it’s all those extra balls in play.

    So why was Kaat so effective that he stuck around and that his career managed to last so long? He controlled the basepaths by reducing walks. Ryan walked more than twice as many people per 9 as Kaat. In fact, he almost gave up ALL of his H advantage back in BB (although those BB are slightly less detrimental than the H would be; still, it’s close from a baserunner perspective). PS, Kaat also had a higher HR rate, which I don’t think fits the “soft-tossing” stereotype you usually get. Kaat controlled the basepaths by keeping his walk rate low; that is how pitchers who don’t strike people out succeed. It’s the ONLY way, other than being extremely fortunate with the defense behind you (or being a knuckleballer, which demonstrably has an impact on BIP; perhaps, though, that’s a conversation best left for another time).

    These issues have been studied many, many times. Strikeouts are only good for pitchers. There’s no advantage to striking out people less. The only advantage is if you have to sacrifice some power for control; then it might be a good tradeoff. Heaven knows that might’ve been worthwhile for Nolan Ryan. But I think people are overstating the ability for pitchers to “know” that the contact they induce will result in an out, and people are underestimating the importance of keeping the ball out of play.

    The heart of it, I think, is that every hitting coach ever wants his batters to make contact. Just make contact, and good things will happen. I was told that many times when I played growing up. It’s essential to a hitter’s success to put wood on the ball. The best thing that can happen for a pitcher is that the hitter doesn’t make contact. It’s one of those cases where, I think, the simple reality is not reductio ad absurdum; rather, it illustrates the common-sense recognition of a mathematical and strategic truth at the heart of the game: strikeouts are the best kinds of outs, because they mean nothing happens for the offense (dropped third strikes notwithstanding, in case there are some smartalecks out there). That’s my thought on the matter, anyway, and I would say that there’s an enormous body of mathematical evidence out there to back it up.

    Reply
    1. Bob Eno (epm)

      Doom, I think you’re agreeing with nsb, no? He wrote about why strikeouts are valuable: “Because even letting the batters make contact has the potential to unleash all kinds of unpredictable results, not just hits, but errors and risk taking.” That’s your point too, so perhaps you’re disagreeing with me. But I don’t disagree at all. I think it’s true, but I believe that’s not why people admire top power pitchers.

      I think there are some points in your generally good analysis that are open to question. For example, you say that “people are overstating the ability for pitchers to ‘know’ that the contact they induce will result in an out, and people are underestimating the importance of keeping the ball out of play.” I assume that nsb and I are these people, and I don’t actually know why you think that. Perhaps it’s because you’ve taken the argument to be about whether power pitchers are more likely to have careers superior to those of finesse pitchers, when it’s actually about whether successful power pitchers should get higher WAR ratings and esteem than equivalently successful finesse pitchers. Regardless, I think you’re attending only to half the equation. The skilled finesse pitcher’s strength is control: among pitchers, he will be the one most certain of where his pitch is headed as he tries to induce weak contact, the results of which will be beyond his control once the contact is made. The gifted power pitcher’s weakness is control. He saves himself the uncertainty of balls in play to a much higher degree, but among pitchers, he will be the one least certain of whether the pitch will go where he wants it to. So the calculus is not a one-sided advantage for the power pitcher; it’s a trade-off. I think you cover this ground in your discussion of power pitcher walks, but don’t allow it to recede when addressing the uncertainty of BiP. (Of course, when you get the rare power pitcher who develops pinpoint control, you have a sort of power/finesse cross-over, and it’s hard to argue there’s anything likely to produce better results.)

      I found your comparison between Kaat and Ryan intriguing. Your analysis seems on target to me, but what I see as the most interesting aspect is something you don’t note. Although Kaat was a fine pitcher, he was basically a No. 2 starter (e.g., after Pascual, Grant, Perry in Minnesota; Wood in Chicago). He may have been the first crafty lefty to come to mind, but he’s not actually comparable to Ryan, who may be the most talented power pitcher ever. They both pitched forever, and in the end, despite the disparity in their relative statures within the finesse/power divide, their ERA+ levels were very close: Ryan has the edge by only 112 to 108. It’s a good example of how the advantage of the strikeout should never be decoupled from the disadvantage of the walk, since Ryan was not only the model of a power strikeout pitcher, he also modeled the walky weakness of the power pitcher (just as the BiP has both out and hit/error sides). Controlled power maximizes the advantage of the K by minimizing walks; skilled finesse maximizes the likelihood of BiP outs by minimizing the fielding challenges presented by batted balls.

      One other quibble. You write, “Strikeouts are only good for pitchers. There’s no advantage to striking out people less.” I think that’s probably not true. I don’t know the stats, but I think that strikeouts require, on average, a significantly greater number of pitches over the course of a game, season, and career than BiPs. They are the downfall of the Kerry Woods of the world just as much as they are their bread and butter. Of course, some such players can have it both ways, like Lefty Grove or Ted Lyons: burn your arm out at both ends for half a career, and then successfully transition to outstanding finesse for the second half. More often, though, the best result you can hope for is a Frank Tanana, who made the transition, but was never as skillful in the second role as he had been in the first. (I think people tend to forget how good Tanana was when he was a power pitcher: from 1975 through 1977, when Ryan was a celebrity in his prime, his power-pitcher teammate Tanana outperformed him each year by keeping walks low).

      PS: If you had played youth baseball with my stature and talents, your coaches would not have said, “Just make contact and good things will happen.” They would have said, “Wait this kid out and good things will happen.”

      Reply
      1. Dr. Doom

        I’ve tried like 10 times to write this post.
        Here are 7 pitchers who weren’t really strikeout guys:
        Jim Kaat
        Greg Maddux
        Jim Palmer
        Tom Glavine
        Tim Hudson
        Mike Fiers
        Matt Cain
        You can quibble with some of these, if you’d like, saying they have too many strikeouts, but I think it’s overall a decent group.
        This group pitched 23,981.1 innings.
        They had a BABIP of .282 (I calculated this myself and left out sac flies)
        They had a (modified) SLG of .446 (I modified SLG; instead of doing it over all AB, I took away AB that ended in a K)

        Here are 7 power pitchers:
        Nolan Ryan
        JR Richard
        Kerry Wood
        Steve Carlton
        Randy Johnson
        Steven Strasburg
        Sam McDowell
        I tried not to just go to the top of the strikeout list.
        This group pitched 22,296 innings.
        They had a BABIP of .280 (I calculated this myself and left out sac flies)
        They had a (modified) SLG of .458 (I modified SLG; instead of doing it over all AB, I took away AB that ended in a K)

        I tried to pick two groups of players from a long span of time but all since integration, of varying career lengths and with (slightly) different levels of success, though they’ve all been successful pitchers. Each group has three Hall of Fame pitchers. I’ve also included one active pitcher in each group, so if you try this yourself with these exact guys next week, things might be different depending on how Fiers and Scherzer pitch, though those data will be so small as to be unlikely to have too much effect.

        Anywho, this is SOME evidence (bad evidence, admittedly, but I’m the one doing the research, and I don’t have access to a database with a ton of info and pitchers sorted into groups) that soft throwers might be MILDLY more effective on batted balls. I would point out, though, the the “finesse” pitchers get the bulk of their innings from pitchers on OUTSTANDING defensive teams (Maddux, Glavine, and Palmer make up the majority of their innings), while none of the power pitchers (as far as I’m aware) come from teams with particularly notable defense one way or the other. I’m willing to throw out the difference in BABIP… with this small of a sample, a difference of .002 is non-existent. The .012 difference in SLG is of more interest to me, but it would be interesting to see if, in a wider sample of data, the power pitchers really are out-performed by the soft tossers.

        I did not know how this would come out; I didn’t have to “rig the numbers.” This was done in genuine curiosity, so I hope it’s taken as such. I did not know what the conclusion would be; I mostly just hoped the innings pitched would come out fairly even, and was genuinely pleased when they were less than 2000 innings apart.

        Anyway, what this says to me is that, if the soft throwers are more effective on extra base hits, the effect is totally nullified by the number of strikeouts they’re missing out on, and therefore the number of extra that are put in play as a result. This group of power pitchers is, as a group, more effective than the finesse pitchers. As I demonstrated in my post above, striking out fewer means facing more; that means, while the RATES on balls in play will be the same or better for the crafty pitchers, the NUMBER of balls in play will be so much greater that they will experience less success overall, and thus will have lower WAR numbers.

        I don’t think WAR is prejudiced toward strikeouts; I think BASEBALL is prejudiced toward strikeouts. Pitchers who rack up more of them are more effective. This may lead to pitchers sometimes being erroneously credited as being more effective than they are if the strike out a lot but have control problems, but overall I think the difference is warranted and not the result of unfair prejudice.

        Reply
        1. Bob Eno (epm)

          Great research, Doom. I appreciate your doing it. It makes your point very well. I think that to include my point, we’d need to add BB rates, which, if they are higher for power pitchers, will balance to some degree the increased number of BiP that finesse pitchers allow (and the increased number of batters faced).

          Here’s what I get as a per 9IP figure for each group (I’ve included both Ks and BBs):

          Finesse pitchers: 5.72 K/IP; 2.54 BB/IP
          Power pitchers: 8.74 K/IP; 3.71 BB/IP

          This implies that BiP out rates per 9IP would be:

          Finessers: 21.28
          Powerers: 18.26

          That’s a difference of about three BiP per game. If the BABIP for finessers is about .280 and SLGBIP about .446, then about 0.85 of those BiP add runners, and those runners generate about 1.35 bases per game. Meanwhile, the power pitchers add 1.27 runners/bases per game beyond the finessers via BB.

          Unlike our friend nsb, I’m no statistician, and you have good reason to know that’s true. But based on your hard work and calculations, and this added perspective — assuming, rashly, that it’s right and makes sense — I see the two groups you chose as coming out virtually identical in effectiveness.

          Normally, I’d let this sit and think it over, worried about making another obvious error in arguing with you over stats. But I have to go out and I’m deciding to throw it out there and let you do whatever you will with it. If I’ve gotten on the wrong track again, your finding will be confirmed and my clumsiness will be on view once again (have I no shame? — don’t answer). If I’m not wrong, perhaps you’ll think of additional arguments to swing the balance back to the power pitchers.

          One thing I’ll add (having now written the paragraph, there is, of course, more than one thing). It is certainly true that finesse pitchers rely on their fielders more than power pitchers. I don’t know whether that’s a problem: it’s what the fielders are there for; it’s the way baseball was originally set up. But it is true that they share the work with fielders more than power pitchers. When they take on the added work, power pitchers relieve their fielders (in Ryan’s case, it was said he bored them and weakened their readiness, but I don’t know whether that’s really true), but the value of taking this on themselves is measured, in large part, in terms of the K/BB ratio, not just K. If that ratio is lousy, they might do better to hand some of the work back to the fielders. There’s wide variance in this ratio within each of the two groups you selected — a guy like Strasburg, for example, has none of the weaknesses of power pitchers in this respect (although I believe he does have one high-K weakness you did not respond to: frequent arm injury, like Wood — it’s significant, and we’ve been seeing the Nats — and the Dodgers with Kershaw — pay a price for it in recent years). Palmer has none of the virtues of a finesse pitcher in this particular respect. (Strasburg’s ratio is 4.54, best of both groups; Palmer’s is 1.69, worst of both groups — two reversals of expectations). Palmer apparently made his HoF mark by letting his outstanding fielders do their work. I think that was a good choice. We really don’t know what he would have done on a different team, but it’s possible he would have pitched differently.

          Reply
          1. Dr. Doom

            Bob,

            I’m going to issue a slight math correction. Hopefully, you’re not too shamed. You’ve made an error I myself have made in doing similar situations. Hopefully, I can explain; let me know if I’m not making sense. So, yes, the “power” group averaged 8.74 K/9 and the “finesse” group averaged 5.72 K/9. Because we’re using “per 9 innings” math, that means that some outs are accounted for. That means, not that finessers face 21.28 BiP and powerers face 18.26; rather that that’s how many OUTS they need to get after the strikeouts are accounted for. So that means, they need to face enough batters, such that those batters create the remaining 21.28 outs. Assuming a BABIP of .280, that means that only 72% of BIP turn into outs. So we need to get 100/72*(outs on balls in play) to give us the ACTUAL balls in play number. For the finesse group, that’s 29.56 BIP. For the power pitchers, that’s 25.36 BIP, which is actually a difference of 4.2 batters, ALL of whom turn into baserunners (since we’ve now accounted for all 27 outs). That just gets us to the same number of outs (27).

            Now, those 4.2 extra batters (all of whom become baserunners) actually averaged 1.58 bases per hit (SLG/BA), which is 6.4 total bases… on hits, which means they can ALSO advance other runners. Yes, with the power pitchers, you had 1.27 extra walks, but that’s a lot less significant than 6.4 Total Bases, no? So the power pitchers, as a group, were significantly more effective.

            Another note on “soft contact,” which I didn’t mention before: in these groups of pitchers (again, small sample, so it may be nothing), the finesse pitchers averaged .750 HR/9; the power pitchers averaged .711 HR/9. Considering that they actually face the exact same number of batters in a game (29.56 BiP hitters, plus 2.54 BB, plus 5.72 K for finesse equals 37.82 batters/game; 25.36 BiP hitters, plus 3.71 BB, plus 8.74 K for power equals 37.81 batters/game), there’s really no out there: the balls in play just create more baserunners, all of whom can advance baserunners already there.

            Finesse pitchers’ lives are harder because they’re relying more on BiP outs, which invariably and unquestionably leads to more hits. Strikeouts may take more pitches/AB, but they face fewer hitters overall. So while they may throw more pitches, they save themselves from facing 2.9 hitters/9 (that would be the 4.2 extra BiP batters for the finessers, minus the 1.3 extra walks the power pitchers subject themselves to.

            On the arm injury point, it’s a FABULOUS one. It may well be that pitching with finesse earns you more runs, but also a longer career. The way managers treat bullpens now is probably a recognition of this trend: save the arms of the power pitchers by pitching fewer innings; use bullpens full of “expendable” arms, that can be even more effective but will flame out of the league in three years. Rinse and repeat. It’s highly effective for the managers; it’s a MASSIVE bummer for the players. Also, it makes for worse baseball, which is a bummer, too.

            On one other thing, I didn’t use K/BB ratio for this for a couple of reasons. For one, K/BB ratio continues to rise over time. Kershaw, Strasburg, and a bunch of other people are going to rank above Walter Johnson. This is simply because walks have held relatively steady over time, but K continues to rise. Plus, it’s not always a good way to sort players into these two groups. As mentioned, Nolan Ryan does not have a great K/BB ratio (basically 2:1), and similar to Jim Palmer’s. It wasn’t a particularly useful skill in sorting players, because it tells us more about the era in which a player played than it does about the player’s skill relative to his contemporaries. Guys like Jon Lieber and Scott Baker are in the all-time top-30, not because of special skills, but as finesse pitchers who pitched in an era in which guys just hack at everything.

          2. Bob Eno (epm)

            Great reply, Doom. Lots of stuff here, and it’s all interesting. It looks as though I’m going to have to grant your general point about power pitchers, though in terms of the pWAR argument, I think Mike L captures the issue above by asking why similar results should be evaluated differently based on how the results were achieved.

            Once again, I’m just passing through — I’ll log on this evening to try to digest your ideas more fully and give you a better response.

          3. Bob Eno (epm)

            PS to Doom: When I was a teacher, I’d do the same thing you did: try to save face for students’ by saying I’d made the same error they did. Much of the time I was making that up (although I certainly committed my share or more), and I’m bearing that in mind in reading your reply — as soon as I made it past the intro sentences, I realized my error, just as in the Mordecai Brown case (except, I have to tell you now that time has passed, in that case I’d actually raced back into my house from walking in the woods because I’d suddenly realized I’d mucked up the “denominator” issue — too late: you were already there).

          4. Dr. Doom

            Haha, yes, the face-saving thing is sometimes a part. But I’ll also admit that juggling these things is difficult. I HAVE in point of fact, made that error when I first learned about Bill James doing some era-adjustments in the New Bill James Historical Baseball Abstract. Sometimes, you have to adjust based on PAs, sometimes based on Outs… it was all very confusing and I made my share of errors when I first tried. Now, admittedly, it’s been over a decade, but I really DID mean it when I said I’d made the same (or a similar) error.

            As for the Mike L question, “Why should similar results should be evaluated differently based on how the results were achieved?” my answer would be that you should check the Maddux/Ford research below. Yes, they achieved outstanding results on batted balls. But those results are fully explained by the teams for which they pitched. If Jim Palmer had been playing in front of the same defenses as Rick Reuschel, do you really believe that Palmer’s rates of getting outs would’ve remained the same? It’s not punishing him to say he had a good defense. In fact, you certainly CAN make the argument that he would’ve pitched very differently if he hadn’t had Mark Belanger behind him. I can buy that… but if I’m going to bet on it one way or the other, I’m more inclined to say that Palmer was lucky to be in the right place at the right time, rather than that he HAD the ability to be a great strikeout pitcher, but chose not to use it. We’re left with the idea that we try to pull a pitcher out of his context and adjust for what he would’ve been against average opponents, in an average ballpark, with average fielders behind him, but that he was otherwise the “same” player. In other words, the WAR framework assumes that “process” is stable and “results” are variable. You’re welcome to assume that the “results” are stable and it’s the “process” that varies, but that’s simply not how most analysts would think about it.

            (PS, super funny about the Mordecai Brown thing. That makes me actually LOL. I once had to run back into an exam in high school when I was thinking over the math problem in my head. I tore the paper from the teacher’s desk and redid the problem. Thankfully, I was not reprimanded, nor accused of cheating – but I think I was the only one done with the test at the time, and the only one who had left, so that helped.)

    2. no statistician but

      Some to me interesting facts on the pitching stat K/9:

      By my count, of the top 500 qualifying seasons for this stat, 352 have occurred in the new millennium, say 70%.

      Of those same 500 or the remaining 148, take your pick, one occurred in 1884 in the Union Association, which existed for that one year and is a major league INO—produced by a pitcher named Hugh Daly. Trivia mavens please take note.

      Of the remaining 499 (or 147) exactly 6 occurred prior to 1960. Of those six, just two made it past the magic strikeout per inning mark of 9.0, both produced by Herb Score in his 1955-56 seasons. Score was basically the beta version of Nolan Ryan, only his career was derailed by the infamous line drive off the bat of Gil McDougald (he’s everywhere, he’s everywhere), a convenient explanation that fails to include the reality that in 1959 Score came back for half a season close to his old level, but then had arm trouble.

      Of the 269 player seasons where the 9.0 mark has been matched or bettered (from 1955 onward, in other words) 192 have occurred from 2000-2018, again circa 70%.

      Prior to Score’s breakthrough, the record had been held from the beginning of time through 1945 by Rube Waddell—8.3889 in 1903. In 1946 both Hal Newhouser at 8.4567 and Bob Feller at 8.4345 outdid Waddell against the rusty returning bats and weak-swinging holdovers from WWII. In 1956 Toothpick Sam Jones also outdid Waddell and set a new NL record at 8.3958. Those four seasons plus the two by Score are the only legitimate ones to appear in the top 500 that happened pre-1960.

      Of the 39 player seasons of 11.0 or better K/9, 7 are happening as we speak, in 2018. Five occurred last year. Three the year before that. Looks to me like it might be 9 in 2019 if the progression holds.

      Satchel Paige, I think, is supposed to have waved his fielders into the dugout on occasion and then struck out the side. The way things are going, by 2025 all the better pitchers will have the option, at least, of following Satchel’s lead in this regard.

      Reply
      1. Dr. Doom

        I, too dislike the general trend of baseball; I think everyone does. There’s an interesting article from 5 years ago on Beyond the Box Score about why, if you’d like to dig into the numbers on it. Basically, it comes down to the fact that strikeouts are great for pitchers, and not so bad for batters. The batters are not disincentivized to avoid strikeouts (sorry about the triple negative there; “there’s no reason to avoid strikeouts,” is what I’m saying), but the pitchers are gunning for them more than ever. Eventually, as nsb and I discussed above, I think MLB is going to have to do something about it, unless the players decide to do something on their own… but that’s doubtful. Personally, I think the easiest solution is handle thickness on the bats, but we’ll see.

        Reply
        1. Bob Eno (epm)

          That’s a really good article, Doom. (The comments help too.) I wonder whether, five years later, it continues to be true that it is the called-strike rate that is rising, rather than the swinging strike.

          One takeaway for me is that the rising K-rate is largely being generated by batters’ strategic approach, rather than by the increase in pitcher skills (although the 2013 article may not fully reflect the way pitching roles have continued to change to maximize velocity). If this is the case, then it may not be that there has been an increase in power pitching so much as an increase in power hitting — of course, the rise of HRs and “barrels” is no secret, and the relation to high K-rates has been recognized since Babe Ruth.

          Another point new to me, although it’s apparently old, is that BABIP are more volatile from season to season for pitchers than for batters. This works in the other direction, explaining pitcher motivation for Ks rather than batter toleration of Ks. (I wonder, though, whether it is, in fact, a reflection of the underlying structural balance of control between pitcher and batter in baseball as much as it is a reflection of differences in the volatility of physical condition relative to pitching/hitting skills — that is, pitcher bodies may be more subject to yearly fluctuation in performative efficiency than hitter bodies, perhaps because of the unusual focus of pitchers on one limb. . . . useless speculation.)

          So, according to this analysis, it’s primarily the hitters who are avoiding the unpredictability of BiP, preferring instead to go with the percentages among the TTO PAs. As players like Lindor and Rodriquez on the Indians, small men playing traditionally low-SLG positions, are turned into short-pull-HR power hitters, with their low K-rates rising as a consequence, the pitchers they face are increasingly turned into power pitchers by default.

          One other thing that stood out for me was the example of exceptions: Maddux among pitchers, Pujols (as of 2013) among hitters. Maddux’s relatively (although not all that) stable BABIP is discussed as a rare example of a finesse pitcher whose control over BABIP outcomes was, perhaps, as great as hitters’, while Pujols is an example, perhaps a little more common, of a power hitter who did not need to trade increased Ks for high SLG. Those exceptions seem to me to be indicators of the fact that there are skill/performance thresholds above which a player can escape the normal structural dynamic of the pitcher/hitter balance of control.

          So to bring the conversation back to its beginnings: Was Whitey Ford a pitcher, like Maddux, who was able to exercise more control over BiP outcomes than batters? His BABIP stats appear to me to be more stable over seasons than Maddux’s. If so, then perhaps it is, in fact, the leverage of that control that accounts more fully for Ford’s success, rather than the talents of his fielders, something that pWAR would be unable to reflect.

          Reply
          1. Dr. Doom

            The question of Whitey Ford and Greg Maddux, and whether what they were doing was a true skill or whether they happened to be the example of flipping a coin a million times and finding two sets of ten where it came up heads ten times consecutively is an interesting one.

            To study it, I would propose this. We look, not just at year-to-year correlation with the pitcher himself, but with his TEAM. In other words, yeah; these guys were better than others at controlling balls in play, and there’s a high year-to-year correlation. But here’s the question: were they outperforming their team, or were they just taking advantage, year after year, of favorable defense?

            (I’m writing this AS I do the research, so I’ll lay it out here.)

            We know how many innings a team pitches. If we multiply by 3, that gives us the total number of outs. From that number, we can subtract strikeouts and home runs, which are plays in which the fielders CAN’T make an out. That gives us OUTS on balls in play. We also know how many hits a team allows (though we’ll have to subtract HR, since those are already accounted for). Dividing the Outs by the Balls in Play (Hits+Outs) should give us a team efficiency; we can then check the individual pitcher’s efficiency to see what that tells us. Okay, here goes:

            Whitey Ford’s teams turned balls in play into outs 74.7% of the time overall; when Whitey Ford was pitching, balls in play were turned into outs 74.9% of the time – slightly WORSE than his teams.

            Greg Maddux’s teams turned balls in play into outs 72.1% of the time overall; when Greg Maddux was pitching, balls in play were turned into outs 72.7% of the time – slightly WORSE than his teams.

            Both Whitey Ford and Greg Maddux did, indeed, turn batted balls into outs at a higher rate than the average pitcher. However, both were extreme beneficiaries of good defensive fortune, as both were actually worse than their teammates at turning batted balls into outs.

            I don’t see anything here that would indicate to me that this is a real skill, at least as illustrated by these two particular pitchers.

          2. Dr. Doom

            I effed up the discussion of the process (the paragraph after the parenthetical). I was just re-reading and realized that I explained it wrong. Let me do it correctly here:

            1. Start with the total innings pitched. Multiply by 3. Subtract strikeouts. This is the total number of BiP outs.
            2. Start with the number of hits. Subtract the number of HR. This is the total number of BiP hits.
            3. Add BiP Hits + BiP Outs. This is the BiP Total.
            4. Divide BiP Outs by BiP Total. This is the Defensive Efficiency.

            Again, for the Yankees, 1950&1953-1967, DefEff is .747, and it was .749 for Whitey Ford in his career.
            Greg Maddux’s assorted teams were .721 in the years Maddux pitched for them; Mad Dog himself .727.

            To me, this just goes to show players who had totally expected BiP results, not players who had a special ability to control what happened on balls in play, or to induce soft contact. Debiting them somewhat for those defense makes total sense to me, if those defenses are indeed performing exactly the same, irrespective of who’s on the mound.

          3. Dr. Doom

            I got to thinking about Maddux and Ford some more, and wanted to check something out. In the interest of open discussion (and unbiased data), I thought I should share this.

            Perhaps one might argue, “Well, SURE, over the course of their careers, these two players regressed to average, but SURELY they were better in their prime.”

            In Maddux’s prime, 1991-1998, he was consistently better than his team, beating them every year. (That’s actually true for 1988-1990, as well, but the biggest contrast comes when we start in 1991; there’s no reason to winnow down to 1994-1998, as he beats the team by the same margin whether we include 1991-1993 or not) We would expect a .723 DefEff, Maddux got .743. That’s pretty substantial – a difference on 2% of BiP. Of course, he fell off a cliff in 1999 and was worse than his team ever year thereafter, so either he lost that skill or it was an illusion of random variation, and he was neither as much better than his team as a young player, nor as much worse as his team as an old one. Either way.

            Ford also has a period (albeit a much shorter one) in which he consistently beats his team at creating BiP outs. However, in his case, it’s only the period 1954-56, and it’s really only in ’55 that he laps the team (.788-.762). The rest of his career, he does a lot of alternating. Better in 1950, worse in ’53. Worse in ’57, better in ’58, worse in ’59, better in ’60, worse from ’61-66, better in his finale in ’67.

            The thing about these data is this: I’m not sure if the phenomenon of a pitching prime in which batted ball results are better than what you’d expect from an average pitcher on this team is something unique to precision pitchers like Ford and Maddux, or if that’s something universally true about ALL pitchers in their prime. I would suppose that it’s true for power pitchers, too, but it’s WAY too much work to research that. Seriously; I do a lot of these things as a mental break during work, so these research-heavy comments take a long time! Anyway,if you wanted to argue that Greg Maddux, in his prime, was consistently better than his teams at inducing batted ball outs, I can buy it… with the rather large caveats that: 1. that might also be true for Roger Clemens or Nolan Ryan or Randy Johnson in their prime; 2. it still COULD be a fluke; 3. he magically lost that skill suddenly and completely in 1999, and 4. Whitey Ford absolutely did NOT possess this skill.

          4. Bob Eno (epm)

            Doom, I still have some stats I want to find time to calculate in response to your posts here, but I have to say that you’re clearly winning this argument, and doing it with convincing calculations, rather than rhetoric. I don’t think I mind that Ford may not benefit from the verdict of sabermetric injustice that Mike, nsb (a little indirectly), and I were advocating for — as you know, I’m a Brooklyn/Koufax fan and arguing on the side of a Yankee finesse pitcher from the era I most care about is not a natural stance for me. But I really do much prefer winning arguments to losing them.

          5. Dr. Doom

            “But I really do much prefer winning arguments to losing them.” Haha, likewise. And thanks for your kind words; I appreciate it. I’m curious what you were thinking about calculating; I might like to try something myself.

          6. Bob Eno (epm)

            I had several ideas, Doom, but the main one was to check out Ford’s BABIP stats relative to strength-of-schedule issues. As you know, Stengel pitched Ford disproportionately against strong opponents. Although strength of schedule is part of the pWAR calculus, I thought perhaps some of the inconsistency of Ford’s BABIP performance vs. the Yankee staff as a whole might be revealed on closer analysis of that parameter. I also thought I’d do a game-log analysis of Yankee error rates when Ford was pitching, at least for some selected years, but that’s quite a time investment, and I wanted to spend some time exploring volatility in BABIP among individual pitchers, an issue that this string has made very interesting to me. There are other fishing expedition ideas I had for finding some anomalies that might break into your argument, like WPA surprises that I might find and then follow up via game logs — when you get as far down in an argument as I am in this one, it’s tempting to try anything. The fact that I’ve been sitting with a blank Excel sheet open for a day now suggests I don’t really have much expectation that my efforts will do more than confirm your calculations. (But it’s also true that, uncharacteristically, I’m elsewhere working on a deadline item — probably my last professional one — and shouldn’t be wading deep into HHS weeds until it’s done.)

          7. Dr. Doom

            Wow… those are some VERY weedy issues. I will leave you to it. I think that you’re getting to the point where you’re desperate to prove SOME part of your point. But I think, again, that we’re getting to a point where we’re forgetting: I think Whitey Ford is an excellent pitcher, too. I just don’t think WAR is mis-assessing his value – or that, if it is, it’s by an amount that doesn’t concern me.

          8. Bob Eno (epm)

            Not desperate enough to actually try, Doom. And if I do try, it’ll be as a relaxation after my deadline work is done.

            Anyway, there’s more to it than just trying to salvage an argument. The BABIP issue and the relative degrees of control between batter and pitcher that you raised via a link to an article online are really interesting to me. (You remember: it’s a facet of the luck/chance issue I’m so dogmatic about.) Ford is a way to explore a little further and get the feel of those data (I’ve never paid that much attention to them before, and being told the results doesn’t mean I grasp what they signify). If I’m lucky, I’ll wind up dragging out my “Stats for Dummies” and learning to perform a few simple functions new to me in order to work on the issue of BABIP consistency/randomness.

            Anyway, win or lose, this was a good string for me!

          9. Dr. Doom

            Not exhaustive, but I checked BABIPs for the 1952, 1955, and 1958

            In 1952, the top three scoring (non-Yankee) teams had BABIPs of .282, .278, .269. The bottom three teams had BABIPs of .276, .262, and .258.

            In 1955, the top three scoring (non-Yankee) teams had BABIPs of .279, .285, .285. The bottom three teams had BABIPs of .283, .271, and .271.

            In 1958, the top three scoring (non-Yankee) teams had BABIPs of .279, .278, .283. The bottom three teams had BABIPs of .278, .261, and .263.

            1955 seems to be the outlier of the three (randomly-chosen) seasons, in that the differences in BABIP were incredibly consistent among the teams. The Yankees themselves were the biggest outliers at .269. In the other seasons, there IS a difference. I would note that, to be complete, we would have to check a couple things: whether these games were at home or the road (BOS always shows up near the top; one assumes that has SOMETHING to do with Fenway, and unless Ford is starting way out of the ordinary at Fenway, I’m not sure how much it affects him.

            Anyway, here’s what I’d say: there DOES seem to be a difference. HOWEVER, bb-ref WAR actually makes an adjustment for strength of schedule in its WAR calculation. Everything we’re doing here is, in my opinion, questioning/trying to verify those things that baseball-reference is already doing. That’s one of the strengths of WAR as opposed to just ERA+: it HELPS Ford by making that adjustment. The thing Ford those who struggle with Ford’s ranking don’t like is that Ford gets that credit, BUT he gets docked for the strength of his defense and ballpark.

            Finally, I do want to respond to the point made (several times) in this thread that pitchers weren’t trying to do certain things in certain eras. Yes; this is undoubtedly true. The thing I would say about that is #1, you’re not required to compare pitchers across eras. #2, while they may have been using sub-optimal strategies to procure outs, and while EVERYONE may have been doing same, they still would’ve been better off HAD they engaged those optimal strategies. #3, while it’s true that strikeout rates were lower in earlier eras, the strikeout rates for the top pitchers relative to their league are almost always the same. No, some of great deadball pitchers didn’t strike out as many per 9 as guys today. But you know what? The best pitchers were still, consistently, near the top of the strikeouts per 9 listing. That’s totally consistent throughout league history. So I don’t think it’s as big of a disadvantage as people are seeing it as.

            BTW, I do want to respond to Mike L’s comment below regarding The Other Yankees in the ’50s, but I have to find the time to do what I want to, and I’m not sure when that’ll be. So don’t hold your breath, but know that I have an idea. The one for this post just came to me like an hour ago, so I wanted to do it quickly; that one’s a little more labor intensive.

          10. Mike L

            Doom, one very quick note to your very comprehensive comment: You said “The thing I would say about that is #1, you’re not required to compare pitchers across eras.”
            But we do compare pitchers across eras, and that’s exactly what WAR is all about for many people–an all-encompassing single number to be used both in-era and across eras. It’s the measurement used by JAWS and I suspect, if we return to our Circle of Greats discussions, part of every thread.

          11. Dr. Doom

            Yeah, WAR can compare pitchers across eras. But WAR is measuring each season in its context. So while WAR can be used across eras, it’s not actually “punishing” pitchers relative to their peers. I didn’t mean WAR – I meant directly comparing K rates

    3. Mike L

      Just for the hell of it, I decided to look at other pitchers who were fellow rotation-mates with Ford. Allie Reynolds pitched 8 years for the Yankees after coming over from Cleveland. 131-60, ERA of 3.30, ERA+ of 115, FIP of 3.64, total WAR 19.6. Two years jump out: 1951, 3.06 ERA, 126ERA+, FIP 3.47, fewest hits per nine, led league in shutouts, and WAR of 3.7 (also 3rd in MVP voting) and 1952, lead-leading 2.06 ERA, 161 ERA+, FIP 2.89, led league in K’s (with 160!) and 4.7 WAR.

      Reply
  10. Voomo Zanzibar

    An historical season in progress for a Denver player…
    Lowest ERA in a qualifying season for a Rockie:

    2.88 … Ubaldo
    2.90 … Kyle Freeland (2018…)

    3.47 … Ubaldo
    3.47 … Jhoulys
    3.49 … Jorge de la Rosa
    3.62 … Jhoulys
    _______________

    Highest WAR:

    7.5 … Ubaldo
    6.9 … Freeland (2018…)

    5.9 … Pedro Astacio
    5.7 … Jhoulys
    5.6 … Joe Kennedy
    ___________________

    Reply
  11. no statistician but

    A two unrelated remarks:

    On baseball bats: is it really true that maple bats have completely replaced ash? If so, how has that impacted in itself on batters going for the long ball?

    On the Ford issue—which is really the issue of how reliable pWAR is in assessing pitchers from baseball’s middle ages (1893-1920 or so), renaissance (1920-1968 or so), and modern era (1968-1995 or so), as opposed to the post modern era of sabermetrics and TTO: Currently, as Doom remarks, most pitchers are on board the strikeout ship as the ideal craft, but that was hardly the case for most of the pitchers in previous eras. I doubt strongly that the idea crossed many of their minds—that wasn’t how the game was approached. Hence, I think a measurement that posits the currently reigning ideal of the 81-pitch strikeout pitcher as the ideal for those eras as well is bound not to give a wholly accurate assessment of pitcher worth, because its presumption is based on a fallacious premise, that one size fits all, that all pitchers in all eras have desired the same thing and bent their pitching skills to the same end. Take one famous game as an example, the 26-inning Cadore-Oeschger 1-1 tie in 1920. One hundred eighty-six batters came to the plate. Fourteen struck out. The game lasted three hours and fifty minutes, or roughly an hour and fifteen minutes for every nine innings. The ball was put in play 153 times. There were four errors. If Cadore or Oeschger had strikeouts on their minds as the most effective tool, then they surely failed the test, since each K’d the opposition just seven times, a rate of 2.43 (which parenthetically was not that far below the NL league average that year of 2.9). But of course they didn’t have strikeouts on the brain, and therefore to assess the contest or the league or its pitchers with a 27 strikeout/9 innings measure is laughably naive.

    Reply
    1. no statistician but

      As in “A one, and a two . . .” Non-baseball question for old timers: Who was famous for using this phrase in the 1950s?

      Reply
        1. no statistician but

          “Somebody turn off the bubble machine.”

          A line, actually, from the Stan Freberg parody of of the Lawrence Welk show. In those bad old days of limited telecast, out in the central Illinois boonies we suffered the choice of either Lawrence Welk or the Gale Storm sitcom—had to look this up. Kitsch and treacle. My parents preferred Welk. My brother and I generally found something else to do.

          Reply
          1. Mike L

            Gale Storm? My Little Margie? As for Lawrence Welk, the correct pronunciation, according to my grandmother, a huge fan, was Lahryrince Velk.

    2. Bob Eno (epm)

      nsb, You and I are on the same side in this argument. I think your major point here is, as usual, correct. But I think you’re overstating the case.

      There were always power pitchers with 250+ K seasons who served as alternative models: Rube Waddell, Walter Johnson, Dazzy Vance, Lefty Grove, Bob Feller, Herb Score, far too many to mention from 1961 on. That other pitchers didn’t follow their example was likely due to three factors: (1) just as you say, the basic concept of the way baseball was to be played made Ks exceptional and BiP the norm; (2) very few players could carry off the high-K power-pitcher act because they did not grow up and go through training levels building their arms for it — only a few physiologically gifted men could actually pull it off; (3) what had to be pulled off was not just high-K pitching, but high-K complete games. Cadore and Oeschger are famous for their great game, but they were mediocrities. If they’d had Ks on their minds, it might have been that seven in 26-innings was the best either could do.

      I think Score renewed the ideal of the power-pitcher in the mid-’50s, as did Drysdale and Koufax in ’59-’62 (I think Jones did not function much as a model, perhaps because his BB totals were so high), but, as I see it, it was the enlargement of the strike zone in ’63 that permanently changed the way power pitching was viewed, even though that rule was revoked in ’69. From 1963 on, Koufax’s celebrity in particular was an inducement to strong-arm pitchers to try to travel his route, and what was formerly seen as a rare and nature-based exception, became more of an ideal to be realistically pursued through training.

      In terms of pWAR, while I think Ford and others like him were doing exactly what they should have, given their talents and the talents of their teammates, and that their approach fit the norms of their era, there were virtually always power pitchers around doing their thing, and pWAR needs to use a single scale to measure accomplishments, so that a top power pitcher — now following Doom’s calculations, which I have to let supersede mine — is in fact accomplishing a greater portion of his team’s winning work than a similarly positioned finesse pitcher. That’s not a matter of measuring against the wrong ideal or of naivete; it’s just a matter of measuring value in constant terms. The way I approached this before Doom’s arguments, I was convinced constant value terms indicated value equivalence between pitching types and a pWAR injustice; I prefer that, but the math seems to prefer something else. (If further, better calculations switch the math outcomes, I’ll switch positions again: the integrity of the flip-flopper.)

      A quick added note on Score: As I recall it, Score himself always claimed that his lack of success after the ’57 line drive was due to a torn tendon suffered in spring ’58, which caused him to change his motion when he, too eagerly, returned to action prematurely. The new motion caused the permanent damage that ultimately ended his career. While he had a few fine games in the first few weeks of the ’59 season and a couple of promising games a bit later, it was not a return to his old pitching form, and no injury interrupted a succession of strong starts. My own conclusion when I explored this issue in the pre-internet era was that Score was deeply committed to denying that the impact of McDougald’s hit was something he did not fully overcome, and that the arm injury of early ’58 was, despite his insistence to the contrary, probably the result of changes to his mound stance and delivery as a result of the trauma he’d undergone the previous May — that damage was huge: I recall newspaper reports that doctors were not optimistic they could even save his eye. The further, conscious change to his motion that Score made after the tendon began to heal may have made things worse, but I think by mid-’58 Score’s career end was essentially determined, and that, indeed, the root cause was McDougald’s drive. (But I don’t know why anyone would think I had special insight on this; I don’t.)

      Reply
  12. Dr. Doom

    Mike L, way up the thread, posed a question about how the various WAR-systems are accounting for shifts. For a while, baseball-reference was just throwing away the data on shifts. It came about because of Brett Lawrie, if I recall correctly. It was, and remains, the single most significant accomplishment of Brett Lawrie, who was a VERY touted prospect in the Brewers’ system I was initially sorry to see go. Well… I’m not saying Shaun Marcum was such a great pickup, but Lawrie was really no loss.

    Anyway, at one point, maybe a third of the way through a season (2012, maybe? I think we were still in our first seminary apartment at the time, so I think that’s right), Lawrie was leading the league in WAR with a totally DAZZLING number. Like, he was crushing people. And it was largely in his defensive WAR number. People were VERY skeptical. Suddenly, bb-ref pulled out the data for the shifts, and voila – order was restored. Now, there was a lot less shifting going on in 2012 than there is now, so I don’t know how tenable the idea of “just throw it out” really was/is, but that was the story once upon a time.

    For those interested in some even crunchier numbers, Tom Tango is currently running to AMAZING series on his blog right now. One of them is about positional adjustment, which is pretty great, but unrelated to this post; still, I’d encourage you to check it out. The other series currently running is about looking at fielder roles and how they might be looked at in a new way. You all might find it interesting. Personally, Tango has shaped SO much of how I think about baseball that it’s hard for me NOT to love the stuff he does, but I think this is a good and interesting way to look at stuff; were it to be adopted by one of the WAR systems, we’d have MUCH better defensive understanding.

    Reply
    1. Bob Eno (epm)

      To add to Doom’s references, FiveThirtyEight.com has a new article titled, “Baseball Positions Are Starting To Lose Their Meaning” that picks up a related issue: the trend of positioning players not by fielding position skills but in order to maximize lineup strength. On the shift itself, The Fielding Bible III (2012) has several good articles (one of the people commenting on Tango’s blog apparently wrote one of them, though I’m not spotting his name in the book) — Tango’s field diagram looks like a simplified version of many in that book, though its purpose is different. (Just to make things easy, here’s a link to the latest of Tango’s series on positional adjustment — I didn’t know about the series before Doom’s post).

      I’ve talked before about The Fielding Bible series. These are big books of articles and annual (or multi-year) stats produced by Baseball Info Solutions, a company that earns its keep by compiling enormous amounts of micro-data on fielding (generated by banks of video observers for every game — the data is new) and selling its data and conclusions to ball clubs (BIS is responsible for the Rdrs and Rtot data in B-R’s fielding stats). After the data dates beyond current usefulness, they earn a few more bucks by publishing it in these volumes. Bill James has written contributions to the first three volumes. I think it’s terrific stuff — but not so terrific that I buy the books new: I wait till used copies are available cheap (under $10 with shipping) on Amazon. The comment on the Tango blog led me to check whether vol. IV (2015) was now available used and it is (I’m looking forward to it coming on Saturday). The “new” volume will be the first with data compiled during the “StatCast Era” and I’ll be particularly interested to see whether it reports any methodological changes due to the influence of that new technology (which Tango uses). (I wonder whether BIS will survive much longer as a business when teams now receive StatCast data, I presume, without charge.)

      I apologize for this plug for BIS stuff if you’re all already familiar with it.

      Reply
  13. Doug Post author

    Jose Bautista made his Phillies’ debut yesterday. Since the three division alignment began in 1994, he becomes the first NL player to appear for three teams from the same division in the same season. Kelly Johnson (2014) is the only player to do it in the AL.

    Reply
    1. Paul E

      Doug,
      The Phillies need OF’ers that can hit…play, etc….That’s why they signed 1B Carlos Santana in the off-season for 3 yrs/$ 60 M???
      They should have traded for Yelich and kept Hoskins on 1B. No kidding, huh?

      As for what Jose Bautista is doing in a major league uniform at this point in his career is anybody’s guess. I imagine the Mets and Phillies had their reasons – just tough for we laymen to figure out

      Reply
  14. Doug Post author

    For those interested, you can download a spreadsheet containing Dr. Doom’s WAR metric from his previous post and his new Wins metric introduced here. Details can be found in the last paragraph of the post.

    Reply
  15. no statistician but

    I’d like to turn the following remarks on Whitey Ford in a different direction, although some statistics as well as some assumptions are necessarily part of the construct.

    The most basic assumption, to take those first, is that the New York Yankees from 1949 through 1964 were the dominant team in the American League. The second is that they dominated that league and baseball generally in a way that hasn’t been seen before or since, matched in professional sports only by the Boston Celtics’ long run from the mid-1950s through the late 1960s in pro basketball.

    The third is that the team, evolving slowing over the sixteen years in question, was never a powerhouse team on the order of, say, the Cubs of the “ought” years, or the Yankees of the late thirties. It produced no seasons approaching that of the Yankees of 1927 or 1998, to name just two. The fourth is that its persistent dominance, despite the presence early of DiMaggio in his decline and Rizzuto at his peak, and Maris’s brief ascendence later on, was carried by the long, overlapping careers of Berra and Mantle, plus those of a number of ten-year-wonder players, notable Gil McDougald, Hank Bauer, Elston Howard, and Bill Skowron.

    The fifth is that Yankee starting pitching, while often the best in the league from a team standpoint, was done by a shifting cast of characters starting with the older triumvirate of Reynolds, Raschi, and Lopat as anchors through 1953, then Bob Turley for a few years, and finally ending with Ralph Terry. Otherwise Yankee starters tended to have a couple of good years—Tommy Byrne, Don Larson, Tom Sturdivant, Art Ditmar, Jim Bouton, Bill Stafford—followed by a precipitous fade from view. The exception to the rule is, of course, Whitey Ford. In fact it is Berra, Mantle, AND Ford who did the major lifting and loading for the dynasty.

    In 1950 Ford was the best pitcher on the team by most measures, although his late call-up put him behind the big three in WAR. In 1953 he began a run of dominance that lasted through 1965, in which he led the team’s starters in WAR every year except 1957, when he was injured and pitched only 129 innings, 1959, when Art Ditmar edged him in the stat by .1, and 1965, when he was second to Mel Stottlemyre. The two, in fact, led all Yankees in WAR that year of the team’s fall from power.

    It should be noted parenthetically that outstanding relief pitching marked many of the teams during this span, but the relief staff was even more transitory in tenure than the starters in terms of prominence. Only Johnny Sain and Ryne Duren led the team in saves even two times.

    I urge you,while perusing the following, to keep in mind that the team stats are those of a dominant dynasty, one that was nearly a perennial pennant winner. I’ll be looking at two things, W-L percentage vs teams with winning records, and ERA in road games.

    1950: Team % .576/Ford 1.000; Team ERA 4.37/Ford 2.85
    1953: .576/.750; 3.59/3.20
    1954: .571/.700; 3.43/2.73
    1955: .528/.455; 3.72/3.96
    1956: .557/.706; 3.83/2.63
    1957: .662/.500; 3.47/3.38
    1958: .602/.615; 3.02/1.27
    1959: .455/.333; 3.89/2.67
    1960: .568/.455: 4.15/3.66
    1961: .574/.818; 4.18/3.84
    1962: .583/.556; 4.25/2.99
    1963: .604/.667: 3.52/2.56
    1964: .569/.714: 3.10/2.29
    1965: .378/.500; 3.40/3.70

    Totals: .555/.630; 3.71/2.94

    I’ve fastened onto these particular stats for definite reasons; first to compare how good the Yankees were against winning teams versus how good Ford was in producing wins against them. Pretty darn good, as it turns out. In 9 of 14 seasons he bests the team, and overall he bests the various teams at the rate of .630 to .555. This, of course, fails to consider that, without Ford’s superior percentage included the team’s would be a few points lower. It may also be needless to point out that the other Yankee pitchers were usually pitching much worse against the teams they had to beat than Ford was.

    On the subject of ERA in road games, the question again is how much difference existed, not in Ford’s performance home vs. road, but in his performance vs his own team on the road. One way of looking at this might be to go to home-road W-L %. Ford owns the highest W-L percentage of any post-1900 pitcher with a normal career at .690 (Spud Chandler’s 1400+ innings shouldn’t cloud the issue. Ford’s the man on this one). Through 1965, Ford’s last as a full-time player, his record was even better: .705, .695 on the road. I haven’t calculated the Yankees’ road W-L % during the Ford years, but since it seldom exceeded .600, the point is moot. Ford’s is far superior. And anyway, to calculate how much Ford really exceeded the team on the road I prefer the comparison above of team ERA vs. his.

    Only in two seasons did Whitey’s away-game ERA fail to beat the team’s,1955 and 1965, and the margins weren’t large, .26 and .30. In five separate seasons Ford’s away ERA was over a point below that of the Yankees as a team. Seven times it was over half a run better. Career-wise it was .77, three quarters of a run per game, better than the team’s. His home ERA of .258 was lower, true, but so was the team’s at home.

    What’s my point? Ford wasn’t just some fairly good, slightly above journeyman level pitcher who happened onto an easy berth that swept his pitching to unsupportable statistical heights, in no small measure because he was a lefty throwing in Yankee Stadium. He excelled far above the dominating team he played for, not simply in beating the lesser competition, but in beating the better competition at a rate far beyond expectation. He didn’t depend on the friendly confines of the Stadium for his success, but carved out one of the most impressive records in history playing on the enemy’s home turf. True, he had the game’s most enduring dynasty to back his efforts, but it backed the efforts of his fellow pitchers over time with far less successful results. WAR pretty much disdains Yankee pitching generally during this era, but even by its measure Ford was the best of the lot year after year after year.

    There’s something to be said for being the best pitcher by far on the longest running baseball dynasty.

    Ford wasn’t too shabby in the World Series either.

    Reply
    1. Richard Chester

      Here’s a couple of old posts of mine about Ford. Of Ford’s 438 career starts 203 or 46.3% were against teams with greater than a .500 winning percentage. The Yankees were in the first division for 88% of his career starts, meaning that 43% of the teams (not counting the Yankees) during the 8-team league years were above .500 and 44% of the teams (not counting the Yankees) during the 10-team years were above .500. So it looks like he really was not held back for the better teams that much. He was held back from games at Fenway Park. Of his 43 starts against the Red Sox only 12 were at Fenway. His ERA there was 6.16 and his WHIP was 1.848.

      For 1950 and 1953-1960 Ford started 116 games against teams with a .500 or more percentage and 121 against those below .500. During that time span there were 38 teams at .500 or higher and 34 below .500. The Yankees were always above .500 so that left Ford pitching against 29 teams of .500 or higher. Percentage of teams (except Yankees) at .500 or more = 29/63 = .460 and Ford’s percentage of starts against those teams = 116/237 = .489.

      And I do believe that he holds the record for most IP in a season without having a runner steal a base off him.

      Reply
    2. Doug

      The other professional team with a similar dynasty was the Montreal Canadiens, with 15 championships in 24 seasons (1956-79), including 10 in 15 years (1965-79). The first 7 required wins in two post-season series, and the next 8 needed three.

      Reply
      1. no statistician but

        Is hockey a sport? Thought about the Canadiens, actually, but the NHL was only a six team, one division league through about 1967, and I was thinking the Maple Leafs were pretty strong in those days, too. It’s hard for me to take seriously a sport played on ice with half its US teams in climes that never see snow.

        Reply
    3. Dr. Doom

      nsb,

      “Aught,” not “ought,” in this context. Not to be a pedant, but your scare quotes indicated to me that you really weren’t sure about the word, so I figured, “Why not let him know?”

      Furthermore, there are some very specious arguments I’d like to point out here.

      I’m not sure I understand the point of any of your first five points. Yes, the Yankees were dominant. Okay. I’m not sure what that has to do with Whitey Ford. He played for them, yes, but that doesn’t make him one of the best pitchers of all-time anymore that it makes Moose Skowron or Bob Turley an inner-circle Hall of Famer. What matters is the player’s performance. Andy Pettitte was not in the last I checked, in spite of his overlap with perhaps the most fabulous period of dominance of any team in the free agent era. Jorge was roundly rejected by voters, as well. Team performance is great, but individuals don’t have to be Great for a team to be. Cumulative performance makes for team greatness; the assessment of players as individuals is a different matter.

      For consistently outperformed his teammates here, that’s true… but does that tell us that much when we’re not comparing Ford to other players of his caliber. What was the difference between Randy Johnson (Roger Clemens, Tom Seaver, Bert Blyleven, whoever) and his teammates against teams with winning records? I’d bet he 1.) outperformed his team most every year; 2.) got a greater percentage of his starts against good teams than his teammates, 3.) frequently outperformed his teammates by 2 runs of ERA. I’m not going to research it, because it’s way too much work, but if anyone wanted to, the way to look at this is not, “How did Ford outperform his teammates; that’s not really relevant until/unless we assess the quality of the teammates. Otherwise, I don’t know what it’s supposed to be telling us.

      Finally, you end with a strawman. No one, on this thread nor any other I’ve ever seen, has argued that Ford was a journeyman. He was an excellent pitcher. Why would you characterize the argument that way? I’ve, in fact, said the exact opposite in this thread, as the primary arguer “against” Ford. And my argument isn’t that Ford was average. No one thinks he was Jamie Moyer. His career WAR is MOSTLY “low” because he missed two seasons in Korea and retired when he was still a very useable pitcher. Had he played 21-22 seasons (which he certainly still had the talent to do), he’d have 65-70 career WAR, and this discussion would literally have never even happened. He averaged 3.2 WAR per season, which is excellent; it’s just not inner-circle elite. I’m honestly not even sure WHERE those folks in the “Ford is underrated” camp WANT him ranked, or how, and that makes this whole discussion difficult.

      Ford may have more factors complicating his ranking than other players do, and how one untangles those issues may yield some different conclusions. I think we must, to some extent, simply agree to disagree. If I had unlimited time, like I said, I would check 3-5 other pitchers and their percentage of starts and relative ERAs against winning teams to see whether Ford’s results are unusual or not. But that’s just not a priority, because I think I’ve done the lion’s share of the research on this post, and while I love it, my time is not unlimited.

      I’m going to try, when I have a couple hours free, to address the “WAR hates the Yankees pitchers in the Dynasty-Era,” and I have a method half-formulated to do so; it’s just going to take a while to set up, and work has been busy, between actually finding to time work, and writing OTHER stuff about Whitey Ford. 🙂 This is what I’d rather do than look up other pitchers, as I mentioned in the above paragraph, anyway. Now there’s lots of time playing with the kiddo on a long weekend, so if I find the time, I’ll get to it, because I’m curious about the results, too. Like I said, I’d like to do something to study it, because both you (nsb) and Mike L have brought it up, and it merits study and serious discussion.

      Finally, I’d like to point out that I’m not “against” Ford; if I see strong evidence that WAR is mishandling Ford, I will change my opinion of him. I just haven’t seen strong evidence yet. WPA is the most convincing evidence; yet, as I’ve said before, his WPA is actually “all defensive WPA when Whitey Ford is one the mound,” not “Whitey Ford’s WPA,” since all defensive WPA is attributed to pitchers. Because we know that he played in front of a very, very good defense, I’m wary of putting too much faith in that stat, which allows me to perhaps over-privilege the other things I’ve seen. But for now, I find WAR’s conclusions the most satisfying explanation of Ford’s ability level.

      Reply
      1. no statistician but

        Hey, Doom:

        Chill out. Seriously. I didn’t have you or your arguments in mind at all while I was writing that comment. My aim was simply to point out that a couple of knocks on Ford’s credibility (not from you, I don’t think)—that he was a product of Yankee Stadium and that his record of success was entirely dependent on the team’s record—weren’t consistent with at least these particular facts as I interpreted them. Plus, you can search all the past posts of HHS since I’ve been making a pest of myself, and you won’t find me saying either overtly or by implication that Ford is one of the inner elite pitchers of all time. My comment way above that started the discussion about Ford was sincere, in so far as I don’t think his WAR ranking (85th all time, but only at the moment) does him justice in terms of his abilities considering his high ranking in so many different statistical measures. Saying this is not, however, saying that Ford is a lesser Walter Johnson. Arguing that Ford should rank a considerable measure higher than Billy Pierce, which he doesn’t, per WAR, doesn’t mean that Pierce is to be scorned as a pitcher. Feel free to disagree.

        An anecdote you might find useful: In college I lived my sophomore through senior years in an off campus dump of an old house divided up into rooms and apartments—$30/month, all I could afford, being a scholarship student, and some months more than I could afford. I was permanent there but most of the other male student residents came and went. One semester, when I was particularly disliked by the gang of soon-to-flunk-out partiers who had taken up temporary residence and with whom my relations were often intense, I made a sarcastic remark to a group of them as I was leaving one afternoon to go to my job at the library, something on the order of, “After I’m gone you can say anything you like about me.” One guy’s quick response was this: “You know what we say about you when you’re not here? Nothing at all.” I had to laugh, because he was right. I wasn’t worth their trouble. I and what I say here, Doom, aren’t worth your losing sleep over either.

        Reply
        1. Dr. Doom

          Not that I don’t love being told to “chill out. Seriously,” but I’m not sure what I’ve said here to upset you so. Sorry for whatever it was; I’m not some raving lunatic – I don’t think. But I suppose most raving lunatics wouldn’t describe themselves as such. Also, I feel fairly certain that no post on this website has ever caused me to lose sleep, so I don’t see any reason for your concern in that regard. My two-year-old who still doesn’t sleep through the night, most of the time, though is a different story. (Sigh) C’est la vie.

          I agree that Ford’s straight WAR ranking underrates his talent… I just don’t think it’s by as much as you think. So let’s say we don’t like where Ford ranks by traditional WAR. Don’t you think a lot of that can simply be chalked up to his missing time in the service and retiring early? An extra 8 WAR (a decently cautious and conservative figure, I would think – 2.5 WAR for each year in the service, another 2 WAR at the end of his career) puts him 30 spots higher in the career WAR ranking, to 55th, in the neighborhood of Justin Verlander, David Cone, Juan Marichal, and Zack Greinke. I don’t know… that sounds pretty good to me, and I think is a reasonable adjustment to make.

          I’m still confused as to why you think anyone is arguing that Ford was solely a product of his environment. I mean, his W-L record is, unquestionably, impacted by his high-quality teammates. And his ERA was, unquestionably, lower because of the ballpark he played in. That said, he still out performs “average” on either of those metrics. Very good pitcher, yes. Hall of Fame, probably. Circle of Greats… meh. He’s in, and it’s not a great travesty. I wouldn’t have voted for him, but I understand why some others would. He’s in the lower rung of players there, I don’t think there can be any doubt about it.

          PS, You know what stuck out to me most about your post. You once rented an apartment for $30/month?! I… I just… I… wow. Different days.

          Reply
          1. no statistician but

            Mid-1960s, off-campus dumps away from metropolises at the low end, yeah, but it was actually seven dollars a week for a room, $60 a month for a two-person apartment, so $30/person. The cockroaches lived rent free. In graduate school as a follow up in a similarly sized town the year before I got married, I rented a room in a converted house for $35/month with hotplate privileges. I also had a toaster oven, which ranked me above the other three guys on the floor. There was a communal refrigerator in the hall for the four rooms, this being prior to the personal refrigerator era. Here’s the gut-sticker: the sink for washing up after all those hotplate meals was the shared bathtub. Did this all go against the city fire and safety codes? If it did, we were safe, I’d say, since the landlord, who lived in a house across the street, was an alderman on the city council.

            At any rate, no one who lived there felt the arrangements were particularly strange. We were all on tight budgets. The era of borrowing on your future to get an education was just in its infancy then—I knew very few students on loans. Besides, college and living costs were manageable then. A full tuition scholarship, a summer job, and part-time campus work during the school year would just barely see you through. Part-time students were draft eligible, which was also an incentive to buckle up and buckle down. Further, you could actually live on a full grad school grant without hassling a summer job if you managed your money.

          2. Bob Eno (epm)

            Obviously, nsb went to college when a dollar was a dollar and men lit matches with their fingernails. We live in last times now: students vape, nsb’s old library has computer clusters where bookshelves once stood, and old timers recount tales of balls in play to grandkids who’ve never seen one. . . .

            I have a question. In my understanding of park factor calculation, the complexity is such that no attempt is made to adjust the basic R/G data to exclude games pitched by individual pitchers. For example, the Stadium park factor for pitchers during Ford’s tenure there averaged about 94.7, but Ford’s R/G rate in the Stadium was about 89.6% of his R/G in neutral parks (his ERA was 87.7% of his away ERA). This suggests two things to me: (1) if you extract Ford’s Stadium performance to get a truly pitcher-neutral park factor relative to Ford (Ford started about 20%+/- of Stadium games), the park factor would be higher (less favorable to pitchers), meaning that Ford is actually being penalized for his own success in the Stadium; (2) Ford’s Stadium park effect would be about 50% due to the park’s advantage to pitchers, and 50% due to Ford’s ability to make use of the park beyond the degree to which the average pitcher was able to. Ford’s ability to exploit his home park environment beyond the norm is being calculated as a deficit, when, in real life, it is precisely what a pitcher should be trying to do, and should be calculated as a positive. (We had a similar issue arise in a batting context when considering Coors Field effects in Larry Walker’s case.)

            So my question is: Does any of this make sense?

      2. Mike L

        I don’t see Ford as an inner-circle Hall of Famer. I do see him as better than 53.5 BWAR for his career, and a 99th-ranking among starters by JAWS.
        I agree completely that if his career WAR was 65-70 this entire discussion wouldn’t have taken place–but maybe that’s good thing since it’s causing us to look more deeply into what devalues him (and, by extension, other Yankee pitchers of that Dynasty era or other pitchers ) Perhaps WAR is an accurate measure of their true worth, perhaps not.

        Reply
    4. Bob Eno (epm)

      I don’t know whether Richard’s response to nsb is the ideal way to measure how Ford was used. For one thing, the notion that Ford was “leveraged” (scheduled to pitch irregularly to increase starts against tough opponents) only applies to the Stengel years (1950, 1953-60), and that’s where we should be looking for that issue. The second point is that above/below-.500 is not a very accurate way to measure.

      I’ve poked around through some of Ford’s seasons: 1953-56 and 1958-60 under Stengel (leaving out his rookie debut in 1950 and the year he was injured, which could skew the figures), and his great 1961 season under Houk. For each season I’ve calculated his starts against each team, multiplying these by each team’s season-ending W-L Pct., and then taken the weighted average of those percentages to compare to the W-L Pct. of those 7 or 9 teams overall. This is what I find:

      …………#Starts……League-Yankees Pct……….Ford’s Opponents’ Weighted Pct.
      1953……..30……………….. .478………………………………. .500
      1954……..28……………….. .471………………………………. .533
      1955……..33……………….. .482………………………………. .474
      1956……..30……………….. .481………………………………. .498

      1958……..29……………….. .486………………………………. .479
      1959……..29……………….. .498………………………………. .501
      1960……..29……………….. .481………………………………. .510
      1961……..39……………….. .481………………………………. .486

      Clearly, 1953-54, Ford’s first two seasons, established a general pattern, with 1956 and 1960 reinforcing it. 1954 was particularly clear, with Ford facing the Indians (.721) and ChiSox (.610) in almost half his starts (13 of 28). In 1955 and 1958, Ford faced slightly weaker than average opponents. He faced about average opponent strength in ’59, an unusual year, where the Yankees were themselves mediocre (how I loved that year!),

      To illustrate another example of how extreme Stengel’s deployment of Ford could be, beyond 1954: in 1960 Ford never started against a seventh-place BoSox team (.422) and started only twice against the cellar-dwelling A’s (.377), but had 7 starts against the second-place Orioles (.578) and 5 against the third-place ChiSox (.565).

      When Houk replaced Stengel in 1961, Ford pitched in a normal set rotation and had normal strength opponents.

      We could explore 1962-67, but like 1961, they are luck of the draw, since only Stengel practiced leveraging with Ford (look how Ford’s GS figure jumped after Stengel left).

      I think this may give a truer picture of the issue of leveraging in Ford’s case.

      Reply
      1. Bob Eno (epm)

        By the way, I should have noted that I borrowed both the term “leveraging” and the methodology for calculating it from Chris Jaffe’s book on managers.

        Reply
  16. Doug Post author

    We were talking about grand slams down by three runs in the previous post. Yesterday, Justin Smoak joined David Bote as players just in the month of August with a 2 out pinch grand slam down 3 runs in the 9th inning or later. There had been just three such searchable home runs previously, two of those also in the month of August, in 1970 (Carl Taylor) and 1973 (Joe Lahoud), and the other by Andre Dawson in April 1991.

    Before this season, Smoak had a .123/.200/.200 slash with one home run with the bases loaded. This year, he’s .286/.375/1.143 with a pair of grannies.

    Reply
      1. Bob Eno (epm)

        Don’t remember Freed at all, but his SABR bio tells an interesting, rather sad story. I appreciate that the SABR bio project includes many players whose careers were short and unsuccessful at the Major League level. If you were looking for models for a serious novel based on baseball, that’s the group you’d probably want to look to.

        Reply
  17. Bob Eno (epm)

    More on Ford.

    I was thinking about some points Doom made about Whitey Ford in relation to his relatively low career WAR number: that he missed two prime years to military service and that he retired a bit early. In addition, Stengel’s leveraging of Ford over a period of 8 prime seasons reduced his IP, and thus his opportunity to earn WAR. So I decided to see how Ford would rank if we calculated career WAR on a rate basis: that is, WAR/9IP. It raised his rank, but not as much as I expected.

    Ford ranks #85 in straight WAR (#79 if you eliminate the five active players on the list, whom we would not want in a rate table, since they still have downward trending career ends ahead, plus Mariano Rivera, who, as a relief pitcher, is simply on a different scale in terms of WAR rate (about 40% ahead of any starter). I cut off my list with #100 on straight WAR (Johan Santana), and there surely are a good number of pitchers lower on that list who would place among the top #92 in rate terms, and should show up on the list I have below, but don’t — but making a definitive list is for another day. (The list has on 92 because two additional active pitchers on the Top-100 list for WAR are excluded, beyond the group of six mentioned above).

    Ford winds up placing #64 in starting pitcher WAR/9IP. Now 64th out 92 is a lot better than 85 out of 100, but it’s not the dramatic change I was wondering about. In terms of both rank tables, it’s good to bear in mind that there are a number of pre-1893 pitchers — I think I see five ahead of Ford on the list below — who probably should be evaluated differently from those who pitched at 60’6″. (Sorry the alignment on this list isn’t uniform. . . . By the way, Rivera would be at .395; Eckersley and Smoltz, who both come in ahead of Ford, benefit to varying degrees from their years as closers.)

    ……………………………….WAR per 9IP
    1 Pedro Martinez  0.274
    2 Lefty Grove  0.257
    3 Roger Clemens  0.254
    4 Walter Johnson  0.232
    5 Johan Santana  0.227
    6 Randy Johnson  0.225
    7 Curt Schilling  0.222
    8 Roy Halladay  0.214
    9 Dutch Leonard  0.211
    10 Mike Mussina  0.209
    11 Cy Young  0.208
    12 Bret Saberhagen  0.208
    13 Kid Nichols  0.207
    14 Sandy Koufax  0.206
    15 Pete Alexander  0.203
    16 Tom Seaver  0.200
    17 Ed Walsh  0.194
    18 David Cone  0.192
    19 Kevin Appier  0.191
    20 Stan Coveleski  0.191
    21 Dazzy Vance  0.190
    22 Bob Gibson  0.190
    23 Kevin Brown  0.189
    24 Greg Maddux  0.189
    25 Rube Waddell  0.185
    26 Urban Shocker  0.185
    27 Christy Mathewson  0.184
    28 Hal Newhouser  0.183
    29 Eddie Plank  0.177
    30 Dave Stieb  0.177
    31 Wilbur Wood  0.176
    32 Bert Blyleven  0.175
    33 Rick Reuschel  0.173
    34 John Smoltz  0.172
    35 Carl Hubbell  0.172
    36 Dennis Eckersley  0.171
    37 Luis Tiant  0.171
    38 John Clarkson  0.170
    39 Al Spalding  0.167
    40 Tommy Bridges  0.167
    41 Mark Buehrle  0.165
    42 Fergie Jenkins  0.165
    43 Andy Pettitte  0.165
    44 Chuck Finley  0.165
    45 Tim Hudson  0.164
    46 Amos Rusie  0.162
    47 Charlie Buffinton  0.162
    48 Mordecai Brown  0.162
    49 Phil Niekro  0.162
    50 Don Drysdale  0.161
    51 Eddie Cicotte  0.161
    52 Jim McCormick  0.160
    53 Robin Roberts  0.160
    54 Warren Spahn  0.159
    55 Juan Marichal  0.159
    56 Tim Keefe  0.158
    57 Joe McGinnity  0.158
    58 Ted Breitenstein  0.157
    59 Gaylord Perry  0.157
    60 Jim Palmer  0.155
    61 Bob Feller  0.154
    62 Clark Griffith  0.153
    63 Vic Willis  0.152
    64 Whitey Ford  0.152
    65 Tommy Bond  0.151
    66 Tom Glavine  0.151
    67 Red Faber  0.151
    68 Orel Hershiser  0.148
    69 Ted Lyons  0.146
    70 Steve Carlton  0.146
    71 Larry Jackson  0.145
    72 Old Hoss Radbourn  0.145
    73 Jim Bunning  0.145
    74 Billy Pierce  0.145
    75 Silver King  0.145
    76 David Wells  0.141
    77 Nolan Ryan  0.141
    78 Jerry Koosman  0.134
    79 Jack Quinn  0.134
    80 Waite Hoyt  0.128
    81 Pud Galvin  0.125
    82 Frank Tanana  0.124
    83 Bobo Newsom  0.124
    84 Tommy John  0.119
    85 Mickey Welch  0.119
    86 Tony Mullane  0.119
    87 Don Sutton  0.117
    88 Jack Powell  0.114
    89 Red Ruffing  0.114
    90 Eppa Rixey  0.114
    91 Bobby Mathews  0.113
    92 Early Wynn  0.103

    Of course, WAR rate is not a definitive stat. Cy Young fails to break the top ten here, falling just short, a point behind Mike Mussina. But Cy Young’s career is, quite literally, Mike-Mussina-times-two, since Young’s rate stretches over 7300+ IP, more than double Moose’s total of 3500+. On the other hand, if you asterisk Clemens (which I would do), Pedro stands out so far beyond anyone else but Grove that this stat seems a good measure of his special excellence.

    And, boy, does WAR like Kevin Appier — I have not yet penetrated this mystery.

    So the bottom line is that while I’d hoped this might help those of us who feel Ford’s career WAR is out of whack see an easy way towards a mediating position, WAR is not very enthusiastic about Ford by any measure. It really does seem to come down to RA9def and Park Factor. In another post on this string, I suggest that Ford may be overly penalized by Park Factor calculations. . . . (I hope all you Yankee fans appreciate that I’m doing what I can for Ford. This is not in my nature as a Brooklyn fanatic, and I don’t think my mother would approve.)

    Reply
    1. no statistician but

      Bob:

      Of your top 20, ten were peaking late-Eighties onward, corresponding with the drop in complete games. So I don’t think the stat tells too much that isn’t arguable. Sorry, because it’s a good idea.

      As for trying to raise Ford in the WAR list, I’m not sure that that is the way to go at this point anyway. WAR is what it is. The sensible way to look at not just Ford but all players in terms of trying to compare them is to view WAR as an important stat, even the most important stat, but to look at other telling stats independently as well, sift all the evidence, decide what figures seem most significant given the player’s era, career details, and anything else that seems relevant—injuries, military service, whatever. Does this process sound familiar? Because it is the method used collectively right here at HHS in electing players to the COG, where Ford’s rank is no lower than 41st on the pitching side at the moment, 37th in pWAR. Actually, the COG list is an excellent resource to consult, a shorthand approach much better, I think, than WAR, because it is comprehensive and based upon more than just one formula.

      Reply
      1. Bob Eno (epm)

        nsb: 1st par.: good point. I hadn’t thought about that. Perhaps adding a factor reflecting IP/GS might make the rate stat more useful. (I’ve implicitly done that by eliminating Rivera).

        I have no disagreement in what you say in the second paragraph either, both in terms of Ford and general principles. Since the CoG draws the line at 1900, more or less, Ford’s rank by traditional pWAR is much higher (also eliminating players too recent for CoG): I think about #64. On the rate list I think he’d be about #53. Either way, you’re right that the CoG process has evaluated him significantly higher, in light of other measures.

        Reply
        1. no statistician but

          Adam Darowski’s Hall of Stats is kind of idiosyncratic (Eddie Cicotte, Babe Adams) and includes a number of pre-1900 pitchers, but Ford comes in the low 60s by his reckoning (The listing isn’t numbered, and pitchers and position players are thrown together. I’ve gotten three slightly different rankings in three tries with my ancient eyesight, and I’m not trying again).

          In Darowski’s Hall of Consensus, Ford jumps into the top forties among the pitchers. As usual, I can’t get the link to work, but it’s

          http://www.hallofstats.com/consensus

          Reply
          1. Bob Eno (epm)

            nsb, I think the “Hall of Consensus” is a little too skewed for use. For example, to rank among the top group (true consensus), you have to be in the Hall itself, which rules out players like Clemens, Mussina, Schilling . . . It is true that everyone considers Ford to be Hallworthy, regardless of how their Hall is configured. But I don’t think that’s a ranking matter.

            I don’t actually see a ranking for the Hall of Consensus, and I’m guessing you just used the Hall of Stats list and knocked off everyone who wasn’t a consensus choice. That lets Ford in, but barely, with only Ruffing and Koufax below him in the consensus hall, and Clemens et al. sitting outside.

          2. no statistician but

            Bob:

            I wasn’t touting the rankings, just refreshing people’s memories that they exist. Multiple views for comparison are often useful, a platitude for the ages.

          3. Kenton

            Hall of Stats has Ford as 76th among pitchers. (Click on “positions” then “pitchers”.) His hall score is 106, meaning he did 6% more than the 228th (or whatever number of people elected as MLB players is), by the formulae used by HoS, which start with bWAR.

  18. Doug

    Here’s a quick little quiz. The Red Sox, Rockies, Astros, Royals, Marlins, Athletics and Nationals are the only teams so far this season to record what pitching accomplishment. Hint: each team has done it only once.

    Reply
    1. Doug Post author

      Since the discussion has moved on to the newest post, the answer is that these are the only teams this season to get a start of 8+ IP in consecutive games. In none of the cases were the two starts both complete games. Oakland had four of these starts in a 7 day period at the end of May, but hasn’t had one since.

      Cleveland, which didn’t make the list, has the most such starts with twelve, and is the last team to record consecutive complete games in 2016, and the last to record three CGs in a row in 2015. The Cardinals in 2013 are the last team with consecutive shutouts, and the 2004 Twins the last with three shutouts in a row.

      Reply
  19. Bob Eno (epm)

    This is a comment in praise of Doom’s nWAR (not so much DWins, but I don’t mean to knock that stat in any way).

    I was just reading an essay by Bill James in the 2015 version of The Fielding Bible. (The same essay apparently appeared on James’s blog years ago, so some of you may have already seen it; I hadn’t.) In the essay James has one of those brilliant insights that make him so terrific. He tells us that traditional batting and pitching stats are not just numbers, we read the as a language. A regular fan knows what a .300 BA or 40 HR mean, not just the numbers, and knows what 20 wins or a 2.50 ERA mean too. Sabermetric will view those numbers differently, according to context — for advanced stat people, they are primarily numbers, not a language. And sure, advanced stats give better information, because the “meanings” of conventional key stats become distorted when the main markers stay constant while the context changes. But all of us understand the basic language of major batting and pitching stats because (to improvise from James) they constitute our first baseball language — we’re all native speakers. (Parenthetically, one of James’s points is that traditional fielding stats are just numbers and not a language, and he has a good explanation for why that’s so.)

    As I was reading the essay, I realized that this is precisely what Doom’s nWAR is so brilliant at doing. It translates a number (ERA+, quantified by IP) into a stat we understand as we understand language: W-L. (DWins is neat, but that’s not something it does.) I have a friend who, each year, sends me his list of greatest all time pitchers. His list is an Excel sheet, and it records one number for each pitcher, ERA+*IP. I have always appreciated the concept and the ranking, but the numbers meant nothing other than the ranking (and the distance between contiguous ranks). nWAR takes the same information and renders it meaningful in the context of baseball as a culture with its own language. It’s not the end of the story, any more than W-L, ERA, or WAR — as nsb put it, quoting Socrates, or Sophocles, or Derrida: “Multiple views for comparison are often useful” — but the more I reflect on nWAR, the more I’m impressed with the elegance of Doom’s new tool.

    Reply
    1. Dr. Doom

      Thanks for your kind words, Bob!

      As a parenthetical, on my personal “nWAR” spreadsheet, I also include FIP, because I think it’s a useful comparison to see how different pitchers look, and I include a balanced 50/50 FIP+ and ERA+ ranking. That helps me to see how a pitcher might be viewed differently, and translates those “numbers” into “language,” as you (and James) so elegantly put it. Thanks, and I’m glad you had fun with this stuff!

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *