Regular HHS contributor no statistician but (or nsb) has authored this series of posts on the Hall of Fame, and the perennial questions of which players are there who shouldn’t be, and which aren’t but should. Unlike some debates on this topic, though, nsb applies a metrics-based approach to this task, and invites you to do the same in contributing to the discussion. So, without further ado, here is nsb.
Adam Darowski’s Hall of Stats is a reasoned, statistical attempt to evaluate top performers across baseball history by recognizing any player who scores 100 or better on the scale Darowski and his cohorts have created. The aim is to provide an alternative to, if not a correction of, the amorphous baggy monster that is the official Hall in Cooperstown. Like any such attempt, it shows a few built-in biases and, as a result, generates some results that seem partially illusory, but it does, I think, come as close to an impartial view of evaluation as exists. Importantly, even when I disagree with its rankings, I understand the basis of the disagreement.
In a sub-feature of the Hall of Stats called the Hall of Consensus, Darowski supplies an expanded look at all players who have been enshrined, either by himself, the official Hall of Fame, or a selected group of other ‘personal’ Halls of Fame. In this easy-to-follow listing, one can readily determine the sheep and the goats from Darowski’s point of view, and it is to the goats that I want to bring some attention.
It’s common nowadays to sneer at the notorious ‘Friends of Frisch’ in the Hall of Fame, for example, but how many are there in reality? I find nine—George Kelly, Freddie Lindstrom, Rick Ferrell, Chick Hafey, Jesse Haines, Jim Bottomley, Travis Jackson, Dave Bancroft, and Lloyd Waner. Waner and Ferrell have no direct connection to Frankie Frisch, but they are low performers of that era and so come under the larger umbrella.
A tenth, Ross Youngs, falls into an overlapping category among Darowski’s under 100 group: those players in the Hall of Fame whose careers were blighted by injury or terminated by untimely death, with the result that their cumulative stats fail to attain the requisite level. Among these, only Youngs seems questionable. Addie Joss, Chuck Klein, Dizzy Dean, and Kirby Puckett, all with Hall of Stats ratings of 85 or better, would undoubtedly have surpassed 100 had their careers gone on in a normal manner. Youngs at a 61 rating might or might not have made it.
A third subset is comprised of these 19th Century players of note: Tommy McCarthy, Hugh Duffy, Sam Thompson, Hughie Jennings, Mickey Welch, Bid McPhee, Willie Keeler, and Joe Kelley.
What remains is a group of forty-eight post-1900 players, elected either by baseball writers or various veterans’ committees, who have been or are about to be enshrined in Cooperstown, but who fall short of the Hall of Stats cut-off. These are the players I want to bring forth for discussion.
Catchers: Roger Bresnahan, Ray Schalk, Roy Campanella
First basemen: Frank Chance, Orlando Cepeda, Tony Perez
Second basemen: Johnny Evers, Tony Lazzeri, Billy Herman, Bobby Doerr, Red Schoendienst, Nellie Fox, Bill Mazeroski
Third Basemen: Pie Traynor, George Kell
Shortstops: Rabbit Maranville, Joe Sewell, Phil Rizzuto, Luis Aparicio
Left Fielders: Heinie Manush, Ralph Kiner, Lou Brock, Jim Rice
Center Fielders: Max Carey, Edd Roush, Hack Wilson, Earle Combs, Earl Averill
Right Fielders: Harry Hooper, Sam Rice, Kiki Cuyler, Enos Slaughter, Harold Baines
LH Starters: Rube Marquard, Herb Pennock, Lefty Gomez
RH Starters: Jack Chesbro, Chief Bender, Waite Hoyt, Burleigh Grimes, Bob Lemon, Catfish Hunter, Jack Morris
Relievers: Rollie Fingers, Rich Gossage, Bruce Sutter, Lee Smith, Trevor Hoffman
In subsequent postings, I will provide some comparative statistics for these forty-eight players, occasional observations that seem pertinent, and a challenge to HHS contributors.
For now, consider players not yet in the Hall of Fame, but who make the grade in Adam’s Hall of Stats, our own Circle of Greats, or maybe just your personal favorites. Are any of them better Hall of Fame choices than any of the forty-eight players above? If so, why is that?
Sounds like this series will be fun and a good way to stretch the CoG-type discussions we’ve had. It looks to me as though among these 48, only Campy is in our Circle, and that due to special considerations.
Very telling observation, Bob.
We’ve been deriding the quality of players on our recent CoG ballots, but most of those players look better than many of the forty-eight that nsb has identified.
I think the comparison to Hall of Stats is most useful because, unlike the CoG, the HoS is matched to the total number of HoF players from all eras. Scanning the HoS roster, many who are not in the HoF are expansion era and 19th century players; of the bottom 48 on the HoS roster, only two (Wilhelm and Koufax) are in the CoG.
I guess I’ll start the discussion by lobbying for Bobby Doerr’s case. His 98 Hall of Stats rating is on the cusp of the magic 100, and I suspect he very likely passes 100 with room to spare without his career-ending injury. Just 33 in his final season, Doerr was still making solid contributions (on pace to better his 3.6 WAR of the prior season before his injury), and would likely have had several more solid years. FWIW, Bill James’ career projection of 4.5 more seasons and a total of 66.7 WAR is, if anything, a bit on the conservative side, but Hall-worthy nonetheless.
By accident, Bob Lemon was left out of the list of right handed starters. Sorry.
I’ll add him now.
Doerr benefited greatly from playing at Fenway Park. His home tOPS+ of 125 is the highest for retired players with 3000+ PA. His OPS at home was an excellent .928 but on the road it was a so-so .716.
Good observation, Richard.
Also, Doerr’s 31 tOPS+ at old Yankee Stadium is lowest of any player with 250 PA there.
At old Yankee Stadium Doerr had 3 HR in 559 PA or 1 HR per 186 PA. Even Phil Rizzuto did better than that.
Bob Lemon — what an unusual B-R profile. The guy won 200 games — in 9 seasons as a regular starter he averaged 20 wins per year. He had a .618 W-L Pct. and 119 ERA+. For all this, he earns a total of 37.6 pWAR. (He adds over 10 more from his hitting.)
This gives me a chance to post a comment that I’ve been trying to add for several days on the last CoG string; it was repeatedly rejected by HHS software, and Doug reported to me that it had been reported as “spam” — so, caveat lector!
My comment is prompted by something said by Bill James in the series that Doom referred us to last string. I originally wrote it with reference to Rick Reuschel, who is sort of the polar opposite of Lemon in this respect, so even though Lemon is the topic of this comment, I’m going to refer to both pitchers in this adapting this comment to nsb’s string.
In the course of a long analysis of bWAR, James points to a particular example of a pitchers-park team with outstanding defense (Oakland 1980). B-R seems to undervalue the pitching staff, and James comments: “What I THINK has happened here is that the park’s run-suppressing characteristics are being double-counted as if they were also evidence of superior defense, thus adjusting twice for the park.” James leaves this as an open possibility in the instance he discusses, but one of the people commenting on his post, @dackle, generalizes from it and does some interesting calculations.
dackle finds that if you break down park factors by quintile (that is the 20% of parks that rate highest over 100 through the 20% that rate lowest under), you find a clear trend that defensive quality tracks park factor. He concludes as follows:
So, based on BBRef’s calculations, it does appear there is double counting of the park effects and fielding runs. Pitchers in hitters park played in front of defenses assessed at -7.3 runs overall, while the defenses in pitchers parks were worth +6.8 runs on the season. Also it appears that WAR is artificially boosted by this adjustment. Pitchers in hitters parks averaged 2.4 WAR, while those in pitchers parks averaged 2.0. Not a huge difference, but it does appear there is an effect.
Obviously, this could have relevance for Bob Lemon, who pitched for Cleveland in its salad days (plenty of arugula), with outstanding defense in a marked pitchers park. It suggests the reverse of the reservations I expressed about Reuschel last string: “Reuschel was operating at the outer fringe of both park factor and defensive quality, and the tolerances built into the bWAR system were being tested to the extreme.” Lemon was too, although the extremes are the opposite of Reuschel’s. My thinking had to do with distortions that may occur in extreme cases, and the extra problems when there are two extremes reinforcing one another. But the point James and @dackle seem to be making is that the reinforcement may itself be a distortion: that when defense and park factors are aligned, the defense may be a component of the park factor, rather than solely the park (actually, the implication is broader: that the defense is always a double-counted factor when calculating pitcher value).
So far as I know, Park Factors do not include allowances for home team RA9def (perhaps they do, but no description I’ve found mentions it). If that’s the case, I wonder whether it isn’t true that defense is always built into the Park Factor in addition to being an independent component of pitcher WAR. (If it is true, I wonder whether this isn’t something embarrassingly well known to everyone but me.) It seems to me that this could be reasonably addressed — and maybe it already is — by basing Park Factors solely on visiting team performances, plus a fractional component for the home team; in an era of unbalanced schedules, this might require some extra complexity. (It also suffers from a problem similar to RA9def’s positing uniform defensive performance for all team pitchers: it assumes uniform visiting team personnel mixes at every stadium.)
Bottom line: These points directly affect pitchers like Lemon and Reuschel most profoundly. Lemon’s success is obvious from traditional numbers, but B-R regards much of it as an illusion caused by Lemon’s optimal team/park backing. Lemon’s career RA9def is +0.14, while his PF is 96.1. If James and @dackle are on track, then Lemon is being double-penalized. Perhaps he is more deserving of Hall consideration than WAR suggests (even without getting into the issue of his three lost War years). Reuschel’s success is not really visible in his traditional numbers, and he is very much the beneficiary of the calculations made for both RA9def (-0.18) and PF (104.6). This may be why Paul E wrote, “Still amazed by Reuschel’s WAR – I just didn’t see it while it was happening.” I didn’t either, and maybe it wasn’t.
Bob:
Whitey Ford?
No, I’m not he. I’m younger and pleasingly balder. But he would be another example of extreme alignment of Rdef and PF in a direction that would tend to undervalue him.
My own thinking turned to Three Finger Brown.
But before we go to far down this road, I’d like to ask whether any of those here who understand WAR better than I can turn this line of thinking off by pointing to some obvious flaw. If there is none, then I think the implications are very broad, and B-R may need to do another WAR revision. No further comments seem to have emerged on James’s blog, but I think he stumbled on something very important. My problem is that it seems such an obvious point that surely the B-R calculations must already take it into account, mustn’t they? . . .
If a pitcher is “penalized” for having a strong defense in a pitcher’s park, one might expect the same WAR penalty would apply to fielders. Just to pick the most obvious current example, the A’s Matt Chapman has amassed a gaudy 5.7 dWAR (48 Rfield) in just 2000 (exactly) innings, with his park-assisted range factor (0.7 plays per 9 innings better than league) the most obvious component (the “old-fashioned” Total Zone Rating metric has Chapman at 18 defensive runs above average at home, but only 6 above on the road). Doesn’t seem like any double-counting penalty is being applied for defensive WAR.
I kind of like your idea of calculating park effects based only on visiting teams’ play (maybe that’s already done?); that would certainly help with separating park effects from defensive prowess.
Doug, I’ve found the B-R formula for calculating Park Factor on the B-R site. I cannot imagine that it was not winner of the TOA (Total Opacity Award) for the year in which it was first published, although credit seems to go to the now defunct Total Baseball site. (I miss those massive volumes.) I do not actually understand some of the notation being used in the formulas (lots of dash lines and | marks, which mean nothing to me unless they are for some reason typographical substitutes for parentheses and normal nominator/denominator marks).
Nevertheless, one thing that’s clear is that fielding is not factored in (or, rather, factored out), at least in terms of Rdef, Rtot, Rdrs or their components.
I had been thinking of PF primarily in terms of hitters and pitchers — fences, altitude, mound height, and such being the primary contributing factors. It’s hard for me to see how infield grounders and hits are much affected, although groundskeepers do try to model the field to the strengths of the home team fielders, and I’m sure they have some effect. In Chapman’s case, perhaps a major factor would be the size of foul territory, which will affect third baseman putouts: the Oakland Coliseum has more foul territory than any other park.
But although I haven’t been aware that Park Factors actually were calculated for Rtot, prompted by your comment I see now that they are, so there may indeed be a double-counting issue there too. I’m pretty sure that’s not the case for Rdrs, which is based on observation of the nature and difficulty of each play.
Park factors for defense would be most noticeable for foul territory for corner infielders and, slightly, for catchers; and also for outfielders in terms of park dimensions, atmospherics and unusual configurations (even something as subtle as the rounded outfield corners in Kaufman Stadium).
Rdrs, I think, considers the likelihood of a play being made on a batted ball hit with requisite force (weakly to strongly) to a particular part of the field. So, that may be an easy or hard play to make depending on the positioning of the fielder, but he will only receive the credit or demerit based on the “averages” for such batted balls. I hope it’s not the case that catching a pop-up deep in Oakland’s foul territory garners a big Rdrs score because plays are seldom made on such balls that find the seats in most other parks.
I feel like park dimensions still play havoc with this stuff. It probably normalizes out over the entire team with park factor but at individual positions, I think there are real dirty details. Third base in Oakland is one such place and as I’ve pointed out before, left field in fenway and right field in Yankee Stadium are two other oddballs that defy the other positional norms.
I certainly hope nobody is being that blind, and I doubt they are, but unfortunately, I’m not sure there’s any good way to account for these park anomalies that won’t have potential problems.
If you just take those plays out of the reckoning, because they are unplayable in most parks, you have a bunch of problems: Where do you draw the line? Is there enough data to know what balls would be unplayable elsewhere? What if a ball would playable in 3-4 parks and not just Oakland? What about 1/2 the parks? Do you really want to completely ignore the differences in skill between guys who take advantage of the greater foul territory to make tons of extra outs and those that only get minor value out of it?
Also, even if park is adjusted, couldn’t there be a big advantage to having a park anomaly in your home field because you’d have a lot more practice taking advantage of it or minimizing the disadvantages than your road-team counterpart.
Everything I can think of to try to limit the effects of these, I can’t see how to do it 100% fairly to those who play in or don’t play in those parks.
Doug, I think you’re missing what might be happening. If this double counting is correct, then *pitchers* are getting penalized double for pitching in pitcher’s parks and bonused double for pitching in hitter’s parks, but for *fielders* it would be the opposite, because they would getting credit for some runs that were really “saved by the park”
So Matt Chapman’s greater runs at home than away is consistent with this hypothesis. It also is suggesting that players with high rField in pitcher’s parks may be overvalued. If the fielding numbers are fair then I don’t believe this double counting could be happening.
But if fielding numbers are not properly park adjusted, then I believe this double counting would happen, and it would be consistent with pitchers park fielders having better numbers than batter’s park fielders. (and better/worse home away rField splits depending).
Given how the defensive numbers are calculated, I doubt this is a true double counting, but I can certainly imagine that some parks would make it easier or harder to rack up rField, and basically fielders would be getting unfair credit/blame and pitchers the reverse if this hasn’t been adjusted correctly.
The Indians’ top four during Lemon’s time (Lemon plus Feller, Garcia and Wynn) all topped 2000 IP over the years Lemon was active (1946-58), and you could throw a blanket over their ERAs, with a low of 3.21 (Garcia) and a high of 3.32 (Feller). But, there is quite a variance in their WAR per 100 IP, with Wynn (1.54) and Garcia (1.52) leading and quite a bit better than Lemon (1.33) and Feller (1.14).
This is a really interesting point, and I absolutely could imagine that there could be a significant effect even if it’s not a true double counting.
One checking factor though — looking at fangraph’s FIP based WAR. If you look at Lemon, his fWAR is even lower than bWAR, while Reuschel’s fWAR is roughly the same as his bWAR.
This isn’t a knockdown argument, I have a lot of suspicions about fWAR completely ignoring pitcher BIP skill, and I think it’s clear that there is *some*.
But it does suggest that some more research is needed before concluding that Lemon and Reuschel’s WAR is being overly affected by park factor.
nsb, I’d like to ask for a point of clarification. Doug mentioned that you’d be providing a series of posts, so I’m not sure whether you’d like us to start picking individual players to comment on now, or would prefer that we wait as you use coming posts to take us through the list in sections.
By the way, judging by last year, the post-CoG period has a very (very) low participation rate, but a good number of comments from those who logged on, so if the same happens this year (e.g., this comment will be the 16th on this string, but from only four posters) don’t be disappointed. I think this is a really good choice of topic.
The next post will consider catchers and first basemen, an easy start. The rest of the infield follows in post three, outfielders in post four, and pitchers lastly. I’d prefer that you wait to make serious arguments until each position and its group member are presented with some basic stats, notably each player’s HOS rating, but some others as well that try to put that rating in perspective.
Doug and I came to the conclusion that to assess the grand mass at one go would be a mistake, hence the breakdown. Thirteen outfielders and fifteen pitchers will messy enough, when we get to them, if people want to participate seriously.
But this isn’t meant to be a vote or a re-vote. What I’d like to illicit beyond comment on individual players is some discussion about what makes a HOFer, not what’s wrong with the selection process. Working up this idea has made me moderate some of my own views.
For instance, the small hall idea—how small is small? There have been over 19,000 major league players. The COG, which is, in effect, a small hall, has 132 members at this moment, and it’s growth won’t keep up with the influx of new players, I’m reckoning, even though it now includes only a James Bondian .007 of those 19000+. The Hall itself has just 230, .012, and that includes many pre-1900 players.
Well I already blew it on Lemon, I guess, but I’ll avoid serious talk about the others and hold further comments on Lemon till later.
I’m working on some very intense frivolous arguments in the meantime. I’m not sure anyone will be able to tell the difference.
Your comment was welcome as a good start to the discussion, assuming we have one. Once you see the second part, which goes at specifics, I think you’ll find more to say.
To make some use of the time while we anticipate Part 2, I’ve compiled some stats on the 19th century figures listed as borderline in this post. I’ve included total bWAR, peak 5 season WAR, WAR rate (per 162IP/500PA), Hall of Stats rating, position (for position players), career length (counting 3000 IP or 5000 PA as 1.0), and noting any special considerations. (In this group, for that last point I’ve noted that Mickey Welch pitched during the era when the mound was closer to the plate, and that Hugh Duffy is celebrated for his BA record; normally this would concern things like War years, segregation, catcher, etc.) I’ve also included ERA+ and OPS+, and dWAR for position players as well, since the pair OPS+ and dWAR gives a rough picture of trade-offs between offense and defense.
Here are the figures for these eight 19th century guys:
Pitchers
WAR…Peak5…WAR rate….ERA+…HoS…Career….Xtra Factors
62.3…..40.0……….2.1………..113……90…….1.6……Mound at 55’……….Mickey Welch
Position Players
WAR…Peak5…WAR rate…Position…OPS+…dWAR…HoS…Career…Xtra Factors
42.3……35.4……….3.7…………..SS……..118…….9.0……..87…….1.1……….None……..Hughie Jennings
14.6……14.8……….1.3…………..OF…….102….. -3.1……..28…….1.1……….None…….Tommy McCarthy
43.1……22.5……….2.7…………..OF…….123….. -2.5……..77…….1.6……….Top-BA….Hugh Duffy
44.4……23.8……….2.7…………..OF…….147….. -7.2……..86…….1.3……….None…….Sam Thompson
52.4……22.0……….2.8…………..2B…….107…..16.2……..98…….1.9……….None…….Bid McPhee
54.2……27.1……….2.8…………..OF…….127….. -9.9……..99…….1.9……….None…….Willie Keeler
50.6……28.4……….3.1…………..OF…….134….. -7.4……..99…….1.6……….None…….Joe Kelley
We’re not going to discuss these guys in nsb’s series, but they do illustrate some Hall of Fame issues. (I’m not going to talk about Welch here because I have no basis of comparison for pre-1893 pitchers and can’t really assess his figures at this point.) For example, Tommy McCarthy is clearly in the Hall on the basis of anecdotal stature alone: he is celebrated because he and Duffy were called the “Heavenly Twins” in the outfield of a famous team (the 1890s Beaneaters), and he rides in on Duffy’s coattails two years after Duffy is enshrined. Duffy’s accomplishments are none too persuasive either, but he holds the all-time record for single-season batting average, which is, I think, the sort of thing the Hall voters would have paid a lot of attention to (especially in 1945, when he was elected).
Three of the seven are clearly propelled, in part, because of their association with the Old Orioles, which, together with the two Beaneaters, probably illustrates the importance of anecdotal considerations over stats for the entire group. Nevertheless, two of those guys (Keeler and Kelley) are borderline Hall of Stats calls, which, along with McPhee, indictes that the group is not entirely an exercise in anecdotal nostalgia. (McPhee notably has a very low peak, but played a more challenging position than the other two.)
The one who stands out to me is Jennings. Adam Darowski’s rubric doesn’t really put him in shouting distance of the Hall (Darowski’s balance of WAR and WAA seems to wind up placing more importance on longevity than peak, at least based on these figures, and Jennings’ career was quite short), but the stats I like to use put him in a class well above the others, in terms of both peak and rate (he was also a terrific offensive/defensive combination, with a 118 OPS+ and strong dWAR). The knock on Jennings is clearly the brevity of his career, but of all these position players, he seems to me to have been the true phenomenon and I think the Hall committee made a good call putting him in, although I suspect they put more weight on his managerial success later, which is mentioned on his plaque (along, perhaps, with his colorfulness).
— Since we seem to be a small group of regulars on this string, perhaps I can ask a meta-question that’s been on my mind. I’ve been posting stats like these for a couple of years now, but I can’t tell from the comments, which rarely if ever refer to them, whether they’re at all useful to anyone else. It may be that the categories I use aren’t ones HHSers are interested in. Can you give me a hint?
I like peak-5, though I always forget if it’s a continuous or non-continuous sampling. I don’t like career length normalized like that. Generally I’m less focused on career length. WAR captures a lot of that already. When talking about all-time greats, I like WAR, peak numbers, and WAA+. Probably interesting to pull out defensive value as you’ve done as well.
mosc,
When Bill James did the historical baseball abstract with “Win Shares”, he used career total, WS/162G, peak 5 consecutive seasons, and best 3 seasons. He added his own subjective opinion when comparing/rating the players. Unlike WAR, due to the likely improvement of athletes over time, James made adjustments to all these Win Share totals. That adjustment kind of makes sense when you think about it…..like what are the odds of Doak Walker at 5’10” and 180# winning an NFL MVP trophy in 1988 or 2018 (yeah, I realize he’s been dead a long time but that’s not what I’m talking about)? Or for that matter, Hughie Jennings or Mugsy McGraw being top of the order world beaters in 2018? But, when comparing eras, I don’t believe WAR makes that time-line/era adjustment.
Paul, I think WAR, WAA, ERA+, OPS+, all make the adjustment by treating each season as a closed system and assessing each player’s performance against benchmark aggregates specific to that season.
As I understand it (and I did understand it, once, years ago, when I read “Win Shares”), the platform on which Win Shares is based is significantly different, being anchored not in aggregate averages and norms but in specific outcomes (wins), each player being assessed initially within the framework of team success, with those results the basis for comparisons with other players league-wide. Perhaps the difference is that the B-R measures ask how good the player’s performance was within the framework of the league season, and WS asks what the player actually accomplished in terms of season team goals.
I think James’s time-line adjustment was based on the theory that competitive balance has continually increased over baseball history, and that the great players of the past had swollen stats because they were on the tails of pretty flat bell curves. I think he added adjustments designed to see the curves of all season as more or less the same shape. (I’m trying to recall the concept, not the language; I have no recollection of whether James talked about curves.) So — if I’m recalling it right — it was the old timers who received the downward adjustment, not because they were puny wimps by the standards of today’s gym rats, but because they had the profiles of giants when they were, in fact, sorta puny. (Michael Sullivan made this sort of argument in downgrading pre-1920 players during the recent CoG votes.)
Hi mosc, thanks for the feedback! In earlier versions I’ve done both Peak5 (continuous) and Top5 (non-continuous), but they so rarely were much different that I dropped the latter here, and I’m thinking of switching just to Peak7 (continuous — for 19th century guys, I think I’ll need to turn that into a rate stat too, since season length was so variable). I like to do career length precisely because it suggests how much of WAR is a matter of capturing longevity. I’ll think about adding WAA+, although there’s not too much room, and I think it tends to duplicate the “peak” figures, since they’re capturing much the same thing. Just to clarify (I’ve never been certain), is what you mean by WAA+ simply total WAA excluding negative seasons?
You could use JAWS together with Peak7, as JAWS is based on the average of career WAR and best 7 years WAR, and is intended as an evaluator of HoF worthiness.
If anyone is interested in “HOF worthiness,” it might be interesting to also look at the CAWS Career Gauge – based on Win Shares. At the moment, you can go to seamheads.com and download an 80-page monograph explaining the system (or write to me at profhoban@gmail.com and I will send it). The monograph (is an update of my book: DEFINING GREATNESS: A Hall of Fame Handbook (2012). Among other things, the monograph does indicate who belongs in the HOF as well as who is there and does not belong.
New assessment stats are always popular on this site, Mike. Can you specify which link on seamheads you’re referring to? (Looking forward to the update, I just ordered your book.)
By the way, we had a recent discussion on this site about your seamheads post on dWAR and Matt Chapman.
The link is right on the home page of seamheads.com. The monograph (A Century of Modern Baseball) is on the right hand side. Just click on the link and you can download it. You may also note that I have a couple of other recent articles re WAR and JAWS on the site. Charlie Blackmon’s rankings in 2018 are an even better gauge of the apparent “fielding problem” WAR has with certain players.
Just to clarify – the CAWS Career Gauge is not really new. I believe it may predate both WAR and JAWS. You will note from the monograph that Bill James first commented on it (positively) in 2004. My first book about it was published in 2007. The monograph is my latest update.
Thanks for specifying the article, Mike. It was in such plain view that I overlooked it.
On Blackmon (whose ’18 stats I hadn’t looked at before), I think you raise an interesting problem by noting the heavy weight B-R places on fielding.
Looking at range factor alone, Blackmon, playing CF, seems to have reached about 20% fewer balls than the league average for the position, which would translate to the neighborhood of 60 balls dropping that other center fielders would have caught. Blackmon’s really awful Rdrs number (-28) suggests that this is confirmed by observation matched against BIS established norms, though the absolute value of the number incorporates degree of difficulty (probability in terms of norms) in addition to a simple binary. (I don’t recall from Fielding Bible tables the specifics of those calculations.)
Because those 60 missed balls are the mirror image of 60 missed hits, I don’t see the oddity of Blackmon’s low 2018 WAR. His Rbat figure of 14 is far outweighed by his Rdrs -28, and even when the CF positional adjustment is added, that makes him a below average player overall — his WAA of -1.2 on those grounds predicts his WAR of 0.8.
If you simplistically take away 60 hits from Blackmon’s offense (about 250 H+BB+HBP), you reduce his offense (oWAR=3.9 before Rpos/2 is subtracted) about 25% on a baseline of zero, far more on a baseline of replacement value. I’d have estimated a WAR in the range of 1.5, but 0.8 is not outrageous because the difference is far less than the ~50% it seems, given the cushion of replacement value. (Of course, when you get in this range of WAR, small absolute increments will make a big difference in player rank, accounting for the apparently huge plunge you note in your post.)
What really puzzles me is Blackmon’s fWAR. I don’t really understand fWAR or the difference between UZR (as a component of fWAR) and Rdrs (as a component of bWAR), since BIS developed both metrics. But compare Blackmon’s fWAR components for 2014 and 2018. In 2014, Blackmon’s Off is less than a fifth of his 2018 figure (2.8 / 15.2), and his Def is worse than 2018 (-10.4 vs. -10.1), yet the difference in his fWAR is rather minimal (1.3 vs. 2.8). How Blackmon could have quintupled his offense while “improving” his defense and wound up with only about double the WAR (a much smaller increment on a zero base) is really puzzling to me. bWAR is much kinder to Blackmon ’14: his oWAR is close to fWAR’s Off (2.1 to 2.8), but his dWAR is actually about flat (-0.2), based on slightly positive Rdrs, canceled by slightly negative Rpos (he played half his games in RF). How you can get such drastically different results in UZR and Rdrs is a puzzle (both range factor and Rtot tend to confirm Rdrs in this case).
Having written all this in response to an emeritus mathematician, I should note that I currently have math skills at about the level of Blackmon’s CF skills (although, putting modesty aside, I was class leader in arithmetic in fifth grade). That doesn’t stop me from trying, but a couple of regular posters here are used to pointing out that my errors suggest that if my math peak was fifth grade I’m still there.
You can also read Mike’s paper here.
Stay tuned for more on CAWS Career Gauge, which will be published in a discussion forum here on HHS starting next month.
There’s an interesting philosophical discussion there, Doug, involving both the significance of 7 years, the trade-offs between contiguous and discontiguous, and the utility of averaging two figures that represent a quantity and a quality measure. I never actually look at the JAWS number myself (I have no feel for it): I look at the separate figures and the rank that B-R’s version of Jaffe’s system kicks up. (And I have to say I’m not in sympathy with the positional adjustment component.) Maybe I’ll try adding it as an experiment, but it would be odd to add it without dropping both WAR and PeakX, since if you have those, you really don’t need JAWS. So perhaps I’ll try both WAA+ and JAWS on an experimental basis next CoG season, dropping Peak, and see how people respond (or whether anyone notices).
The more I think about it, the more I’m interested in thinking of “peak” as “prime” — what were these players like in the hearts of their careers, rather than in terms of a cherry-picked sum of separated seasons. Comeback guys like Tiant, who have camel-backed peaks, will get dinged, but unusual career shapes seems to me better handled in narrative discussion than in a stat.
Weird factoid about Nellie Fox. For a 15 year stretch he had between 11 and 18 SO in each of those years.
Two weird factoids about Joe Sewell. In his last 5539 PA, he had 48 SO. You have to wrap your head around that one. Last year, Yoan Moncada had 47 through April. And, in 1927, Sewell was 3 for 19 in SB.
Seasons with CS more than 2x SO:
29 / 5 … Charlie Hollocher
29 / 8 … Eddie Collins
17 / 8 … Edd Roush
16 / 7 … Joe Sewell
16 / 0 … Herb Washington
14 / 6 … Matt Alexander
11 / 0 … Larry Lintz
10 / 5 … Charlie Hollocher
10 / 1 … Matt Alexander
9 / 0 … Don Hopkins
5 of those 10 played for Oakland between 74-77
Don’t forget Allan Lewis, another of Charlie O’s experiments, 4 / 0 in ’73.
Lewis and, more profoundly, Herb Washington in ’74-’75, did more than simply avoid striking out over the course of a season: no pitcher got a single strike past them. Washington’s strikeless career would suggest that he had the greatest batting eye in history, yet, somehow, after 105 games he was gone!
Bob Eno
You may recall Herb W was the fastest 55 meter/indoor sprinter in the world and about as much of a baseball player as Wilma Rudolph. The sprinter/pinch runner was a luxury of the 25 man roster back when teams carried 10 pitchers. I don’t recall if that was a Charlie Finley idea or Alvin Dark or Chuck Tanner (1976? A’s)?
At the time it was peddled as Charlie O.’s idea, and Lewis was his initial experiment. I don’t recall why Rudolph never made it past the low minors, but she was, after all, much older than Washington and her batting eye could have declined before Charlie O. got to her.
Closest thing we’ve come to a Designated Runner since then is Terrance Gore. Though he seems to be an expanded roster luxury (though he has usually made the playoff rosters of teams he’s been on).
He’s definitely the best of the bunch when it comes to SB%, but will he be able to topple Allan Lewis’s 6 hits?
Bob Eno,
I’m still surprised they didn’t pull a Jim Thorpe on Wilma and strip her of her medals in light of her professional minor league baseball career.
The last of Finley’s designated runners was Larry Lintz, who came to Oakland as a lightning fast but weak hitting infielder. Despite having a spaghetti bat and being a potent running threat, Lintz knew how to draw a walk, with a 12.2% walk rate batting mostly 1st or 2nd in his three seasons before joining the A’s. With his one season plus as Finley’s DR, Lintz is the only player with stolen bases exceeding 40% of his times on base in a 750 PA career. Omitting his time in Oakland, Lintz is still one of just four post-1901 players (Vince Coleman, Miguel Dilone and Tom Lawless are the others) with stolen bases exceeding 35% of TOB in a 500 PA career.
Career Assessments – JAWS and CAWS
JAWS calculates the career value of a player in a manner somewhat similar to the CAWS Career Gauge – but with two big differences. (CAWS is an acronym for career assessment/win shares)
Here are the formulas:
CAWS = CV + .25(CWS – CV) where
CWS = career win shares and CV = core value (10 best WS seasons)
JAWS = 7WAR + .5(WAR – 7WAR) where
WAR = career WAR and 7 WAR = peak value (7 best WAR seasons)
You can see that by taking the 10 best seasons CAWS is looking for a longer “quality period” in a career – indicative of a “Hall of Famer.” By taking only 7 seasons, JAWS is far less demanding of a player’s career.
CAWS then adds only 25% of the remaining win shares to give less importance to “longevity” – whereas JAWS adds 50% of the remaining WAR.
Meh, both formulations are arbitrary. I proposed a sort of geometric sum for counting seasonal WAR over a 25 year period to weight peak against career in a more continuous fashion. It comes out very similarly but perhaps values peak (koufax like peak) more substantially.
I really don’t like win shares. Era vs era comparisons to me are somewhat irrelevant. relative greatness to the mean is timeless. I do worry that nobody has properly looked at the standard deviation of “value” through the years, which if it has indeed gotten smaller would reflect poorly on earlier players.
mosc: What about the WAR averaging system you suggested several years ago, either here or on the old BR blog?
“Arbitrary?” Of course, both formulations are arbitrary – as are all “advanced sabermetrics.” Once you get past BA, ERA, OBP, etc., assumptions and judgments must be made at some point = “arbitrary.” The real question is HOW ARBITRARY are the assumptions and judgments. WAR is far more arbitrary than Win Shares – as witnessed by the various WAR results out there.
Since Bill James spent the better part of 25 years developing Win Shares and since it is the most mathematically consistent of the metrics of which I am aware, for me it is the “most accurate” at this point in time. But, yes, they are all arbitrary to some extent.
Well, perhaps the different WAR stats are less a measure of the instability of the concept than the fact that, so far as I know, there is as yet no second Win Shares formula (who wants to go head to head with Bill James?).
As I wrote above, I think WAR and Win Shares answer different questions, and I think both are of value, and both have potentially varied methods of yielding reasonable answers, though WS has only one model.
I understand the impulse to collapse peak and total value into a single index, but I really am not sure that doing so is desirable. I prefer to look at total, peak, and rate summary stats together. I don’t feel an impulse to reduce them numerically, only judgmentally. I could use a WS base, but B-R is handier and more encyclopaedic than Baseball Gauge, and so I like juxtaposing WAR, PeakX, and WAR/season, with the guideline of a career length index. I think those stats give you everything you need to argue over, without appearing to close the argument with a single number (though adding just a few other stats can add interest too, which is why I like OPS+/ERA+ and some fielding metric). Collapsing the numbers the way one person wants means she wins the argument, but if the argument just continues through disagreement about how the collapse was done, I’m not sure anything is gained. On the other hand, I enjoy very much looking at, say, JAWS or CAWS in the light of, “Look what happens if I win the argument.” It clarifies what’s at stake in the argument itself, and in a way that’s really fun: clear rank orders of all the familiar names, based on a formula. (And as soon as we’ve enjoyed the new ranked list, the arguments begin.)
“there is as yet no second Win Shares formula.” I confess to being confused by this statement. Why would there be a second formula if you got it right the first time?
Mike H
I’m a fan of the WS system since it doesn’t appear to take fielding prowess (or ineptitude) too far. You know, like if Ron Santo makes 370 assists and Ken Boyer makes 340 assists, how much do those 30 assists (or, 1 ground ball every 5 games) really amount to? dWAR would, in my opinion, blow those 30 assists way out of proportion. Chapman, in my opinion, is a good example…..However, WAR seems to be the metric of choice for the baseball statistics obsessed community.
I do believe the basis of WS is runs created and that’s as good a place to start as any – logically speaking, the final score is our runs versus their runs
Off the top of your head, are there any ballplayers out there whose WS totals (career) appear to defy common sense or a total surprise? Any single season surprises?
Paul,
I certainly agree that the major problem with WAR is that at times it blows the value of fielding way out of proportion.
…..”However, WAR seems to be the metric of choice for the baseball statistics obsessed community.”
I attribute this fact not to the validity of WAR but to fact that B-R adopted it – and B-R is the site of choice.
The biggest surprise for me is the fact that Gary Sheffield has more than 400 career win shares. I did not expect that.
Mike H
“The biggest surprise for me is the fact that Gary Sheffield has more than 400 career win shares. I did not expect that.”
In a period of 1748 games at his peak, in ~7,400 PA’s, Sheffield had a 154 OPS+. Dick Allen, in a career of 1,749 games and 7,315 PA’s, had a 156 OPS+. Allen, I believe is credit with 342 Win Shares for his career. Sheffield had an additional 3,500 PA’s prior to and after his peak. Both are regarded as inferior fielders so, I guess, the system actually works and Sheffield’s total WS shouldn’t be that much of a surprise upon a closer look?
Paul and Mike,
We’ve had some discussions about the value of dWAR over the past year or so, and I think the major thing to bear in mind when considering it for recent seasons is that current dWAR stats are based on Rdrs, and Rdrs is a translation of BIS methodology. My feeling is that if you read the various Fielding Bibles (Vol. III is, I think, the most important in this respect) it’s very hard to dismiss the methodology of BIS and the runs saved/cost calculations that derive from it. It’s worth saying again that the system is Bill James’s innovation, and he remains a consultant of BIS and contributor to their “Bibles.” I’ve gone into some detail on this before — I have the evangelistic urge of the recent convert — but I don’t want to repeat all that again here.
I wonder whether all my shilling for cheap used copies of the Fielding Bibles has lured any fellow posters into getting hold of them (or whether others were familiar with them independent of my encomiums).
Bob,
Did we ever conclude that the system (BIS) was so new that there is no way to give Santo or Boyer or B. Robinson or, for that matter, Yost, Billy Cox, or Puddin Head Jones, enough credit for the work they (POSSIBLY) did at 3b? I mean, nowadays, these ‘investigators” are making judgment calls with such detail that is it not conceivable that the lack of such depth of investigation until recently cheats the non-moderns for comparison’s sake? Kind of like, yeah, I know Ty Cobb would only hit .285 today but Ichiro would have been a career .375 hitter in the dead ball era
I think it cuts both ways, Paul — those guys might have looked worse under the BIS microscope.
The bottom line is that there’s no way to know. We’re stuck with Rtot figures for the pre-Rdrs eras, just as we’re stuck with long periods where stats like SF and CS weren’t tallied — not to mention data gaps that persist to the mid-20th century.
Bob,
“those guys might have looked worse under the BIS microscope.”
Which brings us to the point that Sam Rice and Zach Wheat and Heinie Manush are all in the Hall of Fame because, at time of induction, their career totals and/or peak accomplishments could only be evaluated and compared to their contemporaries and predecessors – not in the light of Al Kaline and Clemente or Reggie Jackson and Tony Gwynn.
Is Cooperstown supposed to ‘deselect” (Human Resources Department-speak for ‘$#!^-canning” a co-worker) these lesser lights like all these websites talk about in light of 70 years or more of new stats amassed by near greats and stars? I get the friends of Frankie being a major issue and a real head-scratcher but some of these other guys, I’m fine with
Paul — I’ve addressed this before. If I fielder really makes 30 more assists than another fielder given all the same balls over the course of a season, that is as valuable as having 30 more hits! Would you really say that there’s basically not much difference between an OBP of .350 and .390 (all else being equal). That what 30 hits does over 700 PAs.
The question about the fielding metrics is NOT whether balls saved are roughly as valuable as the formula says, they CLEARLY ARE if you think about it from other side for more than 5 seconds.
The average value of a single, v. double v. triple v. ROE v. walk v. out is well established to within a fairly small tolerance, and if it wasn’t then pretty much all linear weights based offensive stats like wOBA, wRC+, rBat, etc. would be worthless. Getting an out where another fielder would have let it be a hit or ROE or foul ball, has to have the same value as the batter gets.
The question about fielding metrics is whether they are a fair assessment of how often the fielder saved plays vs. other fielders vs. what might be going on due to luck of the ball bounce, pitcher BIP skill, interaction between different fielders, and park factors.
Sully,
Is it that easy to tell by the eye test that, for example, Chapman makes the play but Machado doesn’t? Or, for that matter, now that Andujar has set a precedent of piss-poor glove work at the hot corner, is he doomed to never improve? Are those who are observing him for the sake of the BIS exercise going to say that, “Geeze, Andujar just made that play- of course Chapman makes it, too”? Also, the ball down the line that gets snagged by Chapman, is that saving a single or a double? It just seems to me that it’s all a judgment call.
Also, weren’t Palmer and Thorn responsible for all the linear weights business over 35 years ago? And, honestly, I thought all their specific coefficient business was rejected or oppposed a long time ago?
Mike, As Fangraphs says in explaining the multiplicity of WAR formulas, “Given how complicated baseball is, you would expect that people would arrive at different solutions to the same problem.” When it comes to “no second Win Shares formula,” I don’t think it’s possible that it’s because Jamesian WS is perfect — even if it were, there are plenty of statheads doltish enough not to recognize the fact and to offer inferior versions. I feel pretty sure that James’s authority is the major discouraging force. “RallyMonkey” is a less intimidating voice to challenge.
In fact, from what I’ve gathered, there is no second Win Shares formula because nearly all the statheads with the chops to make a new one, decided instead that wins above replacement was a more relevant framework than wins above zero.
So effectively the various WAR formulas *are* the competition for win shares, and from what I can tell most active sabermetricians prefer one or more of them to Win Shares if they have to choose.