DOA: Driven-in Over Average

“In Defense of the RBI” was the title of Graham Womack’s provocative post here at HHS on Thursday, a post that led to much interesting discussion on the topic of Runs Batted In and alternative statistics. Let’s suppose we want a stat that does what the RBI stat does, but narrowing some of its flaws. How might that be done? One approach is after the jump.

Let’s say that we want a stat that measure’s a hitter’s success at driving in runs. Not his value as a player generally, or as a hitter overall. Not his success at getting on base or his power, except to the extent those abilities directly affect his success in driving in runs. And let’s also suppose we want a counting stat, not a rate stat. Counting stats, after all, have an elegant simplicity: you want to know who the best player at accomplishing X is, the simplest and most intuitive way to do that is to count up who accomplished X the most times. The RBI is just that, a counting stat that measures a hitter’s success at driving in runs, by counting up the number of runs he drove in. But it is also flawed in that different hitters have dramatically different opportunities to drive in runs, most significantly, as a result of how many base runners a batter’s teammates puts on base and where in the batting order he bats. So merely counting up how many runs a hitter drives in may tell us more about his opportunities to drive in runs than about his actual success in doing so. Can we build a stat that counts success in driving in runs, as RBI does, but adjusts for opportunities? Here’s my modest approach to that problem.

In 2011, major league hitters had 4,344 plate appearances with the bases loaded. Those 4,344 PAs resulted in 2,842 runs batted in. That’s .654 RBI per bases loaded plate appearance, on average across the majors as a whole in 2011. Also in 2011, there were 33,073 PAs with just a man on first, nobody on second or third. Those 33,073 PAs with just a man on first resulted in 2,492 runs batted in, for a major league average of .075 RBI per man-on-first plate appearance. We can repeat this exercise for each of the eight man-on-base scenarios: bases empty, man on first, man on second, man on third, men on first and second, men on first and third, men on second and third, and bases loaded. Baseball-reference has splits of this type that you can look up for every season (and for every player and every team) going back to 1948.

Now we can check how well a particular hitter did in driving in runs, when compared to the major league average, given the particular opportunities he had to drive in runs. Let’s do an example. Matt Kemp in 2011 came to the plate with the bases loaded 10 times. We know from the numbers mentioned above that on average a major leaguer in 2011 produced 6.54 runs in 10 PAs with the bases loaded. Kemp actually had 14 RBIs in his ten bases-loaded PAs in 2011, 7.5 more than were produced on average. So let’s credit Matt for 7.5 runs driven in above average, over his 10 bases loaded PAs. Also in 2011, Kemp had 147 PAs with just a man on first. Again, we know that on average, those PAs would have produced about 11 RBI in the 2011 major leagues (147 PAs x .075 RBI per man-on-first RBI average in the majors = 11.025). But Matt Kemp actually produced 21 RBI in his 147 PAs with a man on first, ten more than the average hitter.

We can repeat this exercise for each of the eight different man-on-base situations for Matt Kemp in 2011. I did that, and found that if Kemp had driven in runs at an average rate in each of the eight man-on-base situations he encountered, given the number of PAs he actually had in each of those situations, he would have ended 2011 with 78.5 RBI. But he really had 126 RBI in 2011, which means Kemp had 48.5 RBI over what a league average hitter would have produced with the same number of opportunities as Kemp had in each man-on-base situation. One might say, then, that Matt had 48.5 DOA, runs Driven-in Over Average. There’s an acronym that might stick in your mind, epecially if you’re a fan of police procedurals or film noir.

Here are the 2011 major league RBI leaders, along with their 2011 DOA numbers. Note how Ryan Braun has a relatively high DOA, and Ryan Howard a relatively low one. Howard had 116 RBI in 2011, and Braun 111. But an average major league hitter, given Ryan H.’s number of chances with each man-on-base situation, would have produced 83.3 RBI while an average major league hitter would have produced only 64.9 RBI with Ryan B.’s opportunities. So Braun ends up with more DOA than Howard, despite having fewer RBI. Braun had more success driving in runs given the opportunities he had to work with than Howard did.

Matt Kemp 126 RBI, 47.5 DOA
Prince Fielder 120 RBI, 39.3 DOA
Curtis Granderson 119 RBI, 44.0 DOA
Robinson Cano 118 RBI, 44.2 DOA
Adrian Gonzalez 117 RBI, 33.5 DOA
Ryan Howard 116 RBI, 32.7 DOA
Ryan Braun 111 RBI, 46.1 DOA
Mark Teixeira 111 RBI, 41.1 DOA

One issue with runs Driven-in Over Average is that it is not easy to calculate in bulk with the standard tools we use here — baseball-reference and the Play Index, fangraphs and so on. I did the calculations for the eight guys above one at a time, and doing a lot more would have been a major effort. But I’m really more going for the concept here than a bulk implementation: an idea about evaluating the specific achievement that goes into RBI but adjusting for opportunities. DOA may not be the next widely accepted stat — indeed its fate may be hinted at by its name. But I hope I’ve at least helped move forward the conversation about RBI that Graham has so eloquently begun.

85 thoughts on “DOA: Driven-in Over Average

  1. 1
    Neil L. says:

    Whoa, birtelcom, an amazing post. It will take me days to wrap my head ahead around all the implications.

    Is there a simple formula for Driven In Over Average? I mean in terms of widely-available stats.

    Does it requires comparison to a league average for a given season or is only dependent on base-out data over many decades?

  2. 2
    Neil L. says:

    I love the play in the acronym, birtelcaom, DOA. As if it is fated to fail before it starts.

    So DOA is an answer to the all of the weaknesses in RBI as a counting stat. Hmmm …..

  3. 3
    Mike L says:

    Birtelcom-which are those RBI “clutch” RBI?

    • 13
      no statistician but says:

      This stat is interesting for what it does, but like RBI, it makes no distinction between a grand slam when the score is already 8-1 and a single that brings a man home from 2nd base to win the game. In fact, also like RBI, it tends to weigh in for the lesser accomplishment re winning the game in that particular instance. Nothing at all wrong with that as long as what it does show is not, like RBI, assumed to mean something else.

      • 33
        bstar says:

        nsb, how is winning the game with a single that brings home the winning run a lesser accomplishment? Wouldn’t WPA suggest otherwise? Also, aren’t hits, doubles, triples, walks, runs, home runs, grand slams, also making no distinction between leverage index? Why do we have to tag RBI or DOA with this marker when most counting stats are inherently leverage independent?

        • 53
          no statistician but says:

          I was implying just the opposite of what you inferred. To me the single is more important but DOA doesn’t show that, but the opposite.

          Winning the game is the point of playing the game at most levels. Most stats don’t distinguish grades of importance re the circumstances in which an event occurs.

          I’ll say here what i’ve said before: clutch hitting exists and it’s important—but it is situational. The DOA stat, like RBI, overlooks the importance of the situation.

          • 56
            birtelcom says:

            Baseball-reference already has a whole table worth of stats for every player that cover game situation leverage, including Win Probability Added. The DOA concept was not intended to replicate any of that. It was only intended to tweak the RBI stat to provide a count of runs driven in that adjusts for the fact that players differ in the number of opportunities they get with different men on base situations that heavily affect how easy or hard it is to rack up RBIs. In-game leverage is an extremely imnportant issue that deserves and gets its own statistical coverage; just not with this particular stat.

          • 60
            no statistician but says:


            I think we’re saying the same thing, just with different emphases. Your original post was about DOA, and my response was directed to it and the whimsical question posed by Mike L.

            Actually, I think the DOA idea gives us a telling distinction, and with some tweaking for things like walks, as discussed by others in this discussion, it could be a real winner, driving in some important runs, metaphorically speaking.

    • 62
      Mike L says:

      Birtelcom, sorry for being flip or whimsical-I do like the new rubric. I also think there’s a core alternate reality regarding what is “clutch”. Your piece does one side-it compares players to league average, which should go a long way, over a career, to analyzing how a player would stack up against an average player given the same number of opportunities. There’s another side, which is how a player does compared to his “non clutch opportunity” self. A .750 OPS player who manages .825 with men on base-is he more “clutch” than the .825 player who performs the same with and without men on base? How about the .825 player who inexplicably falls to .750 in late inning high leverage situations-is he less clutch, or is he just facing a pitcher with A+ stuff.

      Anyway, it’s a terrific effort.

      • 65
        birtelcom says:

        Mike: As you may know, b-ref has a stat called “Clutch” within the Win Probability tables that is actually derived by comparing a player’s performance in high leverage situations to his performance in low-leverage situations. A hitter who is excellent in both high-leverage and low-leverage situations but is a litle less excellent in high leverage situations than in low leverage situations will have a negative Clutch number (even though you still may want him up in those important situations because he is still a good hitter).

  4. 4
    Neil L. says:

    Please help me here, birtelcom.

    The season formula for DOA is the sum, over all seven men-on-base situations, of the differences between the player’s RBI in those situations and the league average for that season?

    I wish it were a more easily-calculated statistic but that doesn’t diminish its value.

    DOA would have to be fine-tuned for each year for offensive context and strikouts et cetera.

  5. 5
    bstar says:

    Fantastic and thought-provoking post, birtelcom, with a good dose of your subtle, witty humor as well. I feel I need to light a pipe and go stroke my long beard and give this one a good pouring over. It sure makes a lot of sense on the surface. Well done.

  6. 6
    Neil L. says:

    Interesting that players like Jose Bautista and Miguel Cabrera don’t feature prominently on the DOA list, whereas other players do.

    Or were there DOA’s not calculated? (I guess they hope they were DOA …. sorry, couldn’t resist. 🙂 )

  7. 9
    Hank G. says:

    This seems to be an improvement over straight RBI, but it is still limited by opportunities. Who is better at driving in runs, a player that drives in 20 runs over average where the average is 100, or a player who drives in 15 runs over average where the average is 50?

  8. 10
    Richard Chester says:

    birtelcom: Maybe I am wrong but aren’t you supposed to subtract solo HRs from the players RBI totals before computing DOA?

    • 16
      birtelcom says:

      I thought about that, but to me a guy who hits a home run does drive himself in and it is reasonable to credit him for that in this context, the same as RBI does. It just depends on what you want out of this sort of stat — if you only want to know how well a player drives in other guys, and don’t want that contaminated by home runs, then that’s another way of doing it. More a matter of taste than a right way or wrong way, I think.

      • 19
        bstar says:

        birtelcom, if we’re to include solo home runs, don’t we need to compute a league-average RBI expectancy based on HR hit with no one on, similar to the way we did with every other runners-on-base situation?

        • 22
          birtelcom says:

          Yes, bases empty is one of the seven different “man-on-base” states that I used in calculating my sample eight guys for 2011.

  9. 11
    Shping says:

    Well done birtelcom.

    DAM — Driven-in Above the Mean?

    Actually i like yours better.

    The only flaw i see is that it doesnt factor in the raw number of rbi opportunities that a batter gets. Players on higher-scoring teams and in more favorable batting-order-positions will still have an unfair advantage over, say, the leadoff hitter for the Padres, to use an extreme example.

    Of course we know that a leadoff hitter gets fewer rbi opportunities and we don’t expect him to drive in many runs. But it would still be nice to know if someone with a low rbi total was nonetheless efficient at making the most of his opportunities.

    Of course i have no idea know how you can include those elements in a pretty good counting stat like this without altering its intent or making it even more complex.

    Maybe i would just have to satisfy myself with looking at two stats: your DOA and BA w/RISP, and drawing my own “clutch” conclusions from there.

    Overall, like i said, kudos for your creation.

    • 14
      bstar says:

      Shping, interesting observations. I’ll apologize to anyone else who’s had to see me post this link a third time, but it’s so appropo I have to. Here’s Others Batted In, from Baseball Prospectus. It measures raw baserunner RBI opportunities(PA_ROB=plate appearances with runners on base) and efficiency at driving other players on base in(OBI%). Disregard the highlighted raw OBI(it’s the same as RBI minus RBIs from scoring yourself on a home run). Anyway, here’s the link(again, sorry) for the 2012 leaders:

      • 18
        Shping says:

        Thanks bstar. I suspected there were stats like that out there. It’s very useful, for example, to see the RBI% broken down by baserunner location. Ichiro, of course, scores very low in driving in runners from first, much better from 2nd or 3rd base. And traditional power hitters, of course, have more balanced numbers by comparison.

        Just wondering if there’s anyway to somehow meld these stats with DOA.

        #17 makes some interesting points as well.

  10. 12
    AlbaNate says:

    Am I misunderstanding something, or was Kemp actually 7.46 RsBI above expected with the bases loaded? (14-6.54=7.46)

  11. 15
    Shping says:

    Following up on my own #11 and also #9.

    I’m trying to get a handle on DOA by looking at someone like Ichiro. As many of you know, Seattle is experimenting with him batting 3rd this year. He has always had a reputation for being a good clutch hitter (whatever that means), despite the fact that he is by no means a power hitter. And he plays on a team that definitely struggles to score runs.

    So even if Ichiro rebounds this year and hits .330 with an OPS above .800, I don’t expect him to drive in 100 runs. But should i be impressed if he drives in 90? 85? And using DOA, does he have any chance of scoring high in this area? (Similar questions could be asked about, say, anyone on the light-hitting Dodgers teams of the mid-60s).

    Seems like DOA might be valuable for differentiating among the “usual” rbi candidates like Kemp, Braun, Howard, etc., but i don’t think it does justice to the Ichiros and Ron Fairly’s of the world. Does it, or should it?

    • 32
      birtelcom says:

      Leadoff men and players on low-run-scoring teams will probably never reach the very top of the DOA list, I think, because DOA is still a counting stat and like all counting stats still depends on total opportunities. But the effect I think you will see is, for example, that a leadoff guy for a low-scoring team would inevitably be well below average in RBI for a starting player, whereas with DOA, as long as that guy is a good hitter who knocks in the few runs he has a chance to knock in, he will have a positive DOA that puts him in the top half of the league.

  12. 17
    AlbaNate says:

    I like this idea a lot…one minor quibble is that if a player has a large percentage of his “man on third” plate appearances come with two outs, it would adversely affect his total. I’m not sure that this is a realistic argument, but I would think that where you bat in the order would affect your odds of coming up with two outs, and your odds of driving a runner in from third is greatly decreased if there are two outs.

    Another thing about this that bothers me a bit is that if you walk a lot, or if you are intentionally walked, it will hurt your score. I wonder how Barry Bonds did compared to expectation that year that he was walked so many times. I imagine that he did worse than expected even though he hit so many homeruns.

    • 20
      AlbaNate says:

      If I’m doing the math right, Barry Bonds drove in 12.29 runs fewer than expected in 2004.

      Bonds numbers that year were mind-boggling: .609 OBP, .812 SLG, OPS+ of 263, but only 101 RBIs…and 232 walks.

    • 29
      birtelcom says:

      I thought about the man on third , two outs vs. less than two outs issue. If one were really serious about doing this DOA thing, you are right that breaking down all the situations that include a man on third into “two outs” and “less than two outs” situations. That would make the calculation process more complicated, but it would also make it more accurate, because it is indeed meaningfully easier to get an RBI with a man on third and less than two out than with two out. This was a little more detailed than I wanted to get in the initial post, but it is a very good point.

      I had not considered the implication of walks in the way you describe it. I’ll have to give that some thought.

    • 35
      Brendan says:

      Birtelcom, I applaud your logical, well-reasoned approach to the RBI issue. This is a valuable contribution.
      Two random thoughts:
      1) Intentional walks might throw another interesting twist into the mix. I don’t know that my thinking on this matter is clear, but IBB are always intended to set up a situation that is favorable to the defensive team (create a force situation or bring a weak hitter to the plate). Assuming the defensive team is correct in employing the strategy, on balance plate appearances following IBBs will be associated with fewer RBIs. Therefore, those plate appearances (following IBB) might need to be evaluated as a special case or perhaps even subtracted from the dataset since they could be a source of bias influencing the “average” performance in a given on-base situation.
      2) I expect that DOA is highly correlated with OPS (how could it not be?). Maybe I’m unnecessarily complicating things, but if we’re looking for a definition of “clutch” or “RBI guy” perhaps DOA alone is not the measure. Perhaps the measure is something more like DOA exceeding what would have been expected based on the player’s OPS.
      Again, good work!

      • 38
        John Autin says:

        Brendan (point 1) — I would not assume that teams generally make wise decisions about issuing IBBs.

      • 83
        Michael Sullivan says:

        I don’t think you need to do 1. Most of the time, the situation that calls for an IBB is the base/out situation, so having a lower chance will be reflected in the average RBI from that runners on base situation. I suppose it’s marginally more likely that a slow guy gets walked and a few other things that are dependent on more the the base situation, but I suspect these factors are small compared to the main one which birtelcom’s algorithm already accounts for.

  13. 21
    Richard Chester says:

    I did an analysis for Ryan Howard for his 2008 season when he batted .251 and had 146 RBIs. His DOA calculated to 69.4 RBIs. His BA with men on base was .309, considerably above his overall BA. He was especially good with a runner on second (.379) and on first and third (.333).

  14. 23
    Chad Evely says:

    Interesting idea. I just ran your process for all players in 2011 if anyone wants to check out a complete list:

    • 25
      birtelcom says:

      That is fantastic, thank you!

    • 26
      AlbaNate says:

      Well, my thoughts about getting walked a lot and having lower than expected RBIs does not seem to hold up. All the BB leaders are also the players with the most RBIs over expected.

      • 63
        JDV says:

        It still depends on whether the ‘opportunities’ are considered for each AB, or for each PA. If only ABs are being looked at here, your original thought almost certainly holds up.

    • 27
      bstar says:

      Chad, that’s phenomenal. How many internet six-packs of brew would I need to buy you to convince you to run a few more years? What I’m really interested in is Ryan Howard, whose allegedly inflated RBI totals from 2006-09 are kind of a lightning rod for this RBI debate. I would love to know how he compared DOA-wise for those years vs. other NL sluggers. Other metrics have suggested he indeed was quite good in this period at driving runners in.

    • 28
      Neil L. says:

      Awesome list, Chad. I just added your site to my favorites.

      • 31
        Chad Evely says:

        That’s great. Unfortunately I don’t get to post as often as I’d like to (1-year-old daughter with another on the way) but get my fix from all the interesting things they do on this site.

        • 37

          Chad E,

          I have a one-year old daughter, and I have enough time to come here and make a smart ass comment once a day while I am sitting on the toilet.

          Could you briefly explain to a semi-luddite how the hell you created that document?

          Please tell me that there is a way to create all of those hundreds of player-links in an automated manner.

          • 39
            John Autin says:

            Voomo — I’m also dying to know Chad’s methods.

            But creating the player links is no problem. I and others routinely copy out P-I search results into Excel, and they retain the links that they have in the P-I pages.

            Also, there is a page on B-R where you can type or paste player names and then click the button to “Linkify Text.”

          • 43
            Chad Evely says:


            Typically I make my smart-ass comments WHILE sitting on the toilet… it frees up a bunch of time but, trust me, you don’t want to borrow my smart phone.

            I’m a programmer and, being interested in baseball stats, have spent some free time in the last year or so writing code that reads Retrosheet event files which I can then use to do things like this DOA calculation very easily.

            Every player has an ID in Retrosheet and even though it differs from the B-R ID, I can use Sean Lahman’s DB to find the B-R ID of each Retrosheet ID. Once I have the B-R id, the URLs follow a similar template so I can create the hyperlinks. And since I can do each step programmatically, it ends up being as simple as hitting a button and getting an HTML (or Excel or whatever) file out the other end.

            So, there you go… you gotta love the publicly available info on the Internet.

  15. 36
    kds says:

    Of course there are 8 possible runners-on-base states; none, 1, 2, 3, 1-2, 1-3, 2-3, 1-2-3.

    In my long post in the last RBI thread I described how for Pujols-Howard ’08 I looked at who had a higher % at each of the 8 states. I subtracted HBP and IBB since these are not choices of the batter. I included UIBB since they are choices, to some extent.

    If you are doing this with some large database that codes all plays, (Retro-sheet or the Lahman database), you may want to just use all 24 base-out states. This will enable you to look at the 3rd base, less than 2 outs situation.

    Since we are starting with official RBI, we are not giving credit for runs scoring on a GIDP.

    • 46
      birtelcom says:

      –Thanks so much for the correction on the runners on base states; yes, of course when you include the no men on scenario, there are eight different states. I’ve corrected the error in the text of the post.
      –Good point that in the comments on Graham’s original post you were already doing something much like this DOA idea. Great minds….
      –One way to run the numbers would indeed be to do all 24 possible base/out states. Some of the base/out states though really represent distinctions that are not likely to be meaningful for this exercise. For example, man on first with none out, man on first with one out and man on first with two outs are three different states but for purposes of the chances of the guy at the plate getting an RBI, there’s little or no difference among them.
      –You are right that because DOA is fundamentally relying on the official RBI stat, runs driven in on a GIDP are excluded in all parts of the calculation. This seems to me an entirely reasonable aspect of the RBI stat.

      • 70
        Evan says:

        It’s one of those interesting quirks of baseball that a batter has a better chance of getting an RBI from a runner on 3rd if there are less than 2 outs (because of SF), but benefits from there being 2 outs when attempting to get an RBI from a runner on 1st or 2nd (because the runner will have a better chance of advancing an extra base on a ball hit in the air). If we take the RBI expectancies for each of the 8 base-occupied situations and compare these figures based-upon how many outs there are we can get a sense of whether these differences are significant. I don’t think BB-Ref is set up to make this determination , but perhaps Chad’s datas set is.

        If we really want to make ourselves crazy we can worry about park factors, defensive efficiency of the opponents, baserunning ability of the runners and a myriad of other factors that are probably indistinguishable from the statistical noise.

        For those that love a rate stat we can easily convert this into a rate stat by dividing the DOA by the player’s expected RBI number.

  16. 47
    Fireworks says:

    Like the post, birtelcom, but you need a better acronym. RBIOA, or something. DOA might rightly be DOA as per your above comment.

    I would also like if someone could figure out a way to turn DOA into a rate stat, since that corrects for lack of opportunity and playing time.

    As it is, though, I like this stat and would like to see it referenced sometimes (which is why you need something better than ‘DOA’). It would be harder for someone to make the claim that a guy is a ‘run-producer’ or ‘drives in runs’ if he has an underwhelming DOA.

    Also, it’s time for someone to run this stat on Joe Carter.

    • 49
      birtelcom says:

      Yes, I thought about using an acronym less gruesome than DOA. But then there are so many sabermetric acronyms out there, and seemingly a new one popping up somewhere every week, I was looking for something eye-catching. In an environment where zombies, vampires and gladitorial teenagers are cultural touchstones, why not DOA for a baseball acronym? It’s an acronym that would fit right in on the new spinoff “CSI Miami Marlins”.

      • 51
        Fireworks says:

        You need to tweet this to David Cone. He’s one of the few guys on TV whose job it is to talk baseball that actually cares to support the things he says with (useful, illuminating) data.

  17. 48
    Fireworks says:

    Ignore my last sentence. Replied before looking at Chad’s link.

  18. 50
    Fireworks says:

    Looks like Joe Carter was consistently good at driving in runs. Better than I would’ve expected.

    Of course it goes without saying that his low OBP still meant that his contribution was not quite worth the glowing praise bestowed upon him; he still made a lot of outs and wasn’t on base a ton for the guys behind him to drive in, but nonetheless “he drives in runs” is accurate for Joe Carter.

    Of course that would be tempered a bit with a DOA-related stat that accounting for all the opportunities that ended in a walk. Carter would suffer some and a guy like Bonds would greatly benefit, I’d imagine.

  19. 52
    Timmy Pea says:

    Another awesome post! I am of the opinion that a player may have a special skill that helps him drive in runs. Take away 25 RBI’s from Joe Carter every year and he is a lousy player. Joe’s RBI totals don’t elevate him to great player status, but they do offer some redemption to his career. I think we’ve all seen a slugger have a bad year and limp to 95 or 100 RBIs. He still had a lousy year, and yes as the RBI haters point out anybody bating 4th will probably drive in a few runs regardless. As evidence for a player having a special skill driving in runs I would point to golf and the difference between a put on the first hole of a tournament and a 4 footer for the win at the Masters. Some guys have it and some guys don’t.

    • 61
      Lawrence Azrin says:

      “I think we’ve all seen a slugger have a bad year and limp to 95 or 100 RBIs” –

      DAVID ORTIX, 2009:
      99 RBI, but a 101 OPS+ on a .238 BA, 28 HR – literally an “average” batting year by OPS%.

      Next two years: in 2010, 102 RBI on a 137 OPS+; 2011, 96 RBI on a 154 OPS+. Just over 600 PA for all three season.

      This is one in an inummerable number of data points that show that raw seasonal RBI totals can be extremely deceiving, and not that useful in evaluating players.

  20. 54
    Dave V. says:

    Great stuff, Chad. One random question – did Don Mattingly get left off the 1980s search? I’ve searched for him by both first and last name and can’t find him listed (I do see him in the 1990s search).

  21. 55
    Dave V. says:

    Oops nevermind my comment @54, as I realized my mistake.

  22. 57
    Doug says:

    Nice work, birelcom.

    Couple of thoughts.

    While this is a counting stat, it would be an odd sort of counting stat to track over the season for a couple of reasons: (i) it wouldn’t be whole numbers, and (ii) it would fluctuate up and down through the season. All other stats that are referred to as counting stats are whole number (or understood fractions for IP), and only accumulate in one direction.

    To track a stat like this, presumably would have to use the average performance stats from previous season (or avg of 3 previous seasons, or something along those lines). Which could be odd if, for some reason, there was a sudden change in avg performance in a particular year, and you were this evaluating current performance against an outdated benchmark. And, don’t think you’d want to recalibrate to the current year avg at the end of the season as, potentially, that could change the leaders or rankings in the stat.

    A very simple rate stat along these lines could also be useful:
    – RBI% – RBI / # of RISP during PAs; or
    – RDI% – (RBI – HR) / # of RISP during PAs (RDI = runners driven in)

    In these stats, the batter gets “bonus points” for driving in a runner from first (and himself for RBI%), but isn’t “penalized” for failing to do so.

    • 58
      AlbaNate says:

      WAR is another counting stat that has fractions and that can rise and fall through a season.

      • 59
        Doug says:

        Right you are.

        But that is another example (like DOA) of a different kind of counting stat. The kind that are hard to get your head around (or, my head, anyway) if tracking them during the year.

  23. 66
    philaphan says:

    Interesting post, apologies if this was touched on (I skimmed several comments), but wouldn’t using a linear weight based-run expectancy be a better baseline to use than the previous season’s RBI totals? It seems like DOA can only be calculated retroactively, and that players’ numbers will be skewed by having their own RBI’s as part of the league average. In the Matt Kemp example, all of his RBI’s are driving up the league average (albeit very slightly) that he is being measured against. Again, not sure if this was already discussed or considered, but it seems like an improvement.

  24. 67
    Chad Evely says:

    I finished generating career DOA values for every player with at least one plate appearance between 1956 and 2011. There is one spreadsheet that breaks down by baserunning situation and another by player age… I updated previous blog post with links:

    I think I may modify to incorporate a couple of the mentioned improvements and re-run later tonight. These changes will be:
    – Eliminate plate appearances ending with BB, IBB and HBP from baseline dataset as well as player calculations.
    – Take outs into account. Creating separate situations for 0, 1 or 2 outs will create 24 situations instead of 8… I’m thinking I might combine 0/1 outs into a single category, resulting in only 16 situations… can anyone think of a reason not to combine these situations? Pulling the 2 out situations into their own categories seems like the integral move.
    – I think someone mentioned eliminating solo home runs from the equation. Instead, I think I’m going to remove the RBI from batter’s run from every HR, regardless of whether it’s a solo shot or grand slam, both in the baseline and the players calculations. It seems like an RBI from a HR would be a RBI no matter when the HR is hit, so it kind of skews what we’re looking for and gives a major advantage to HR hitters.

    birtelcom or anyone else: let me know what you think and I’ll plan on tweaking and re-running later tonight.

    • 68
      bstar says:

      Chad, I agree with IBB and HBP being taken out, but consider this about eliminating all BB from the dataset: that player who drew the walk may have had several good pitches to get a hit and drive runners in, he may have fouled several off, so to completely eliminate these at-bats from consideration is questionable to me. Also, unless the bases are loaded, a walk isn’t going to drive a runner in anyway, similar to the way a single with a man on first is very unlikely to produce an RBI. Are we going to eliminate all singles with a man on first from the equation? Then why eliminate walks? What about singles with no one on? Isn’t this exactly the same as a walk with no one on? Why eliminate BB here but not singles? Just food for thought.

    • 69
      birtelcom says:

      -I see no reason to distinguish between none out and one out, for this purpose. The only situations where the out state really makes any difference are situations that include a man on third (third only, first and third, second and third, bases loaded)and even then the only meaningful distinction is between two outs and less than two outs. So a total of twelve states I think will cover all the meaningful distinctions that might come up.
      –I’m personally not a fan of taking one run driven in out of equation for each homer, but I can understand why you would do it. To me, driving in yourself should count as a run driven in, but I can definitely see the opposite argument — it’s really a personal preference.
      –Whether you use PAs or ABs (shorthand for whether you takes walks out or not)will clearly make a big difference. I was noticing in your data that high walk guys were not doing well and that may be an unfortunate result. Maybe using ABs instead of PAs is the way to go.

      Chad: being able to see actual bulk results this way has been a totally unexpected bonus. Great work. You should be working for Sean Forman at baseball-reference!

      • 72
        Chad Evely says:

        I’m glad I was able to help visualize this stat a little bit. I’ve always been conflicted about RBIs, understanding the stat has its limitations but realizing that there is some value to what it’s telling us so it’s interesting to help explore an alternate way to measure the skill that RBI is supposed to measure.

    • 71
      Chad Evely says:

      Alright, well it sounds like we all agree that removing IBB and HBP from the equation and expanding to 12 states is the way to go so I’ll set that as my base. I just finished setting up the process to allow me to turn on/off whether to include walks and whether to include the RBI from a HR. I’ll run the four different combinations to allow birtelcom to determine which one he thinks bests accomplishes what we’re trying to measure… it is his baby after all.

      It’ll take ~1.5 hours to run each setting so I’ll probably just post them throughout the day tomorrow as they get done running. To give you an idea of the difference in introducing the 4 new 2-out states, when walks and HRs are include, here are the RBI expectations:
      Third: 0.521
      Third, 2 outs: 0.209
      First/Third: 0.603
      First/Third, 2 outs: 0.309
      Second/Third: 0.666
      Second/Third, 2 outs: 0.349
      Bases Loaded: 0.750
      Bases Loaded, 2 outs: 0.536

  25. 73
    kds says:

    I think whether or not to include UIBB and the batter on HR is mostly just a matter of exactly what questions we are trying to answer. I don’t think there is one clearly right or wrong answer.

    I would like to see tables with Actual_RBI/Expected_RBI as well as the Actual-Expected that have been produced.

    Some stats, OPS+ would be the most familiar, eliminate pitchers batting in figuring out the average. So the average NL team as of Saturday had an OPS+ of 92 because the 100 average is figured without pitchers batting. I wonder if we might not get rid of certain biases if we did this here.

    I think it also might be interesting to look at how the rate stat varies with lineup position. It would not surprise me if cleanup hitters average 1.2 or better We could then compare “RBI_Ability” = Actual/Expected with expected opportunities at each lineup spot to see if the manager is maximizing his RBI expectation in setting his lineup. (Of course there are other factors to go into lineup construction.

    Great work birtelcom and Chad.

    • 75
      birtelcom says:

      PAs by pitchers only constituted about 3.2% of the major league total of PAs last season, and those were after all actual PAs; pretending they didn’t happen seems a little odd to me.

      Does including those pitcher PAs mean that the “average” hitter performs a little worse than what we would get if we were looking at the average “non-pitcher”? Yes, but as long as you apply the same “average” formula to every player, I’m not sure it makes much difference for DOA purposes. Yes, a few more non-pitchers will come out “above average” than “belo-average” but if it’s good enough for the children of Lake Wobegone it’s good enough for me. The real value of DOA is in comparing hitters, and as long as we use a common formula for everybody, it shouldn’t make much difference.

      That whole phenomenon in which most NL teams come out with below-100 OPS+ numbers (because the pitchers’ PAs are not being included in calculating the 100 standard despite the fact that they actually coming to bat) has always given me agita.

  26. 74
    bstar says:

    Good point about eliminating the pitchers’ OPS from the equation. I guess we’ll leave that to birtelcom to decide.

    kds, what are you using to make the distinction between UIBB and regular BB?

    • 78
      kds says:

      BB-IBB. Yeah we are mixing the batter’s choice with the pitcher’s choice. Ideally we would try to figure out; 1) absolute minimum walk rate when pitcher totally wants to avoid a walk. (e.g., close game bases loaded.), 2)try to figure how many additional walks a pitcher would give up in different base/out/score situations., 3) How much the identity of the batter matters for the pitcher. Once we’ve done all that we will be able to estimate how much of each walk was up to the pitcher vs the batter. I don’t know if we could get the signal to noise ratio high enough to be really useful.

      • 79
        bstar says:

        Wow—IBB-BB. So all non-IBB walks are “unintentionally intentional”? That’s a huge leap. I would wager myself that over 75% of all walks the pitcher would immediately want to take back and are not intentional in the least, but that’s just my opinion. I can’t recall the last time I’ve seen a pitcher walk the leadoff hitter in an inning and have it look like it’s something he “intended” to do. In fact, leadoff walks seem more infuriating to both a pitcher and fan than a leadoff single.

      • 80
        bstar says:

        Oh, wait, let me correct myself before you need to reply, kds. UIBB simply means “unintentional walks”; I was misreading it to mean “unintentional intentional walks”. Sorry, now I get it although I do not think these should be subtracted from RBI opportunities.

        • 81
          birtelcom says:

          I’ve long been a skeptic about the “intentional walk” stat, and I never rely on it for anything substantive. The intentionalness of a walk is on a continuum, it’s not a yes or no issue. It seesm to me if the batter got to first base on four balls it’s a walk and separating them out into intenitonal and unintentional categories doesn’t really add anything. But I’m eccentric about some things.

          As for walks and DOA, I certainly understand the argument that “penalizing” a hitter for failing to drive in a man on base by taking a walk is sort of odd, so I get why there was much sentiment for dropping walks form the calculation entirely. Chad’s amazing work let’s us look at the results both ways.

  27. 76
    Chad Evely says:

    I just created a new post with links to a whole lot of data. Rather than settle on any of the variations suggested, I tried to incorporate them all and came up with 5 different data sets to give an idea of how the variations affect the results. I think got all of the big ones in there. Let me know what you think.

  28. 77
    Lawrence Azrin says:

    Alternate title to DOA: “RBI/OA” {RBI/Over Average}?

    • 82
      birtelcom says:

      The guys in the engineeering department like it, the guys in the marketing department — just after the second martini, before the third cigarette — say “too many characters”.

      • 84
        kds says:

        Need a like check-box. No!, we’re a a stats site, we have to be all serious. Oh wait, it’s OK, we can keep stats on the likes.

  29. 85
    Friv Kizi says:

    Hey There. I found your blog using msn. This is a really well written article.
    I’ll make sure to bookmark it and come back to read more of your useful info. Thanks for the post. I will certainly return.

Leave a Reply

Your email address will not be published. Required fields are marked *