Are No-hitters Linked to Strikeout Rates?

“The less often the ball is in play, the more likely a no-hitter becomes.”

Tom Verducci wrote that in a June 2012 Sports Illustrated column titled, “Flurry of no-hitters and perfect games reflect changes in baseball.” You’ve probably heard the same claim elsewhere.

It sounds good in theory, and it may hold for individual games. But on a league-wide basis, it’s not supported by results in the current era, once you look beyond a very small sample. The high-strikeout era as a whole has actually seen a decline in no-hitters, compared to prior rates.

_______________

Mr. Verducci’s column ran right after Matt Cain’s perfect game, the fifth no-hitter in less than half a season (including Seattle’s teamwork job). He focused on a very small period, from 2010 through mid-June 2012, and showed a rate of 1 no-hitter per 414 games. (He counted each matchup as one game.) Then he ominously compared that to the 1901-2009 average of 1 per 794 games, and more ominously to the 1967-69 rate of 1 per 345 games. (Uh-oh, look we’re we’re going….)

Reaction to last year’s flurry was inevitable, and Verducci was not the only one to sound an alarm. But such a small sample maximizes the effect of random distribution, as you can see in the yearly raw numbers. Easy example: There was one no-hitter in 1988-89 combined, then 14 over the next two years. You can’t expect to show anything about no-hitters in just a few years, because they are so rare. A small window shows many anomalies like these:

  • 1951 had the highest individual no-hit rate in the live-ball era, despite a fairly low K rate (9.7%). Then the K rate soared in 1952-55, while the no-hit rate plunged.
  • In 1973 there were 5 no-hitters, the 8th-highest no-hit rate in the live-ball era, despite the lowest overall K rate since 1961. Four of those no-hitters came in the league that had just introduced the DH, and so had the much lower K rate.

Such small samples are meaningless.

In almost a year since Verducci’s column (over 4,000 team-games), we’ve seen just 2 more no-hitters, a very normal rate. We’ve had none yet this year, despite the highest K rate ever. So the individual rate for 2010-present is down from his cited rate of 1/414 games to 1/571 games. But non-events rarely get noticed.

I’m not suggesting that there’s no connection between strikeouts and no-hitters. Obviously, no-hit pitchers are generally strikeout leaders, and especially those with more than one no-no — Ryan, Koufax, Feller, Bunning, Randy Johnson, Verlander, etc.

I only mean to show two things:

(1) Despite the 2010-12 flurry, the current high-strikeout era has actually seen a low rate of no-hitters, compared to historic norms.

(2) A correlation of K rates and no-hit rates can only be seen over a period of many years, and the correlation seems broken during the last 20 to 30 years. For just a small group of years, no-hitters are fundamentally more random than they are predicted by K rates. Their raw numbers are so small that the rate for even a 20-year period can be strongly affected by outliers like Nolan Ryan (4 in a 3-year span) and Sandy Koufax (4 in 4 years).

_______________

Quick notes on my methods:

  1. The period studied is the live-ball era, i.e., 1920-2012. The current season is excluded.
  2. I included combined no-hitters, to remove the effect of variance in complete-game rates. In the graphs, the bars for individual and total no-hitters are overlaid, so that any combined no-hitters will appear as a darker extension of the individual bar.
  3. When I speak of a pitcher game, it means his entire stay in the game. I would have preferred to ignore MLB’s restrictive definition of “no-hitter” and include all games where the starter did not allow a hit for at least 9 innings, regardless of what happened afterwards. However, I can’t get those from the Play Index. For what it’s worth, one source lists 6 such games during the study period: 1934 (Bobo Newsom), 1956 (Johnny Klippstein), 1959 (Harvey Haddix), 1965 (Jim Maloney), 1991 (Mark Gardner) and 1995 (Pedro Martinez). I did include the one non-complete game wherein the starter threw 9 no-hit innings and then left. (Francisco Cordova, this one‘s for you.)
  4. I express no-hit rates per 10,000 team-games; each game played between two teams counts as two team-games. Strikeout rates (or K rates) are a percentage of plate appearances.

_______________

Now vs. Then

The first thing I wanted to see was how our current era of very high K rates compares to previous norms, in terms of no-hit rates. If there is a strong, direct correlation between K rates and no-hitters, we would expect a high rate of no-hitters in this era.

But how should we define this era? I see two possible starting points:

(1) 1986: In 1986, the K rate jumped from 14.0% to 15.4% — a large one-year change, and a K rate not seen since 1968 — and since then it has never dipped below 14.7%.

So here are the rates of no-hitters per 10,000 team games, for 1986-2012 and before:

  • 1986-2012: 4.8 individual (5.2 total); 16.6% K rate for the period
  • 1920-1985: 6.4 individual (6.6 total); 11.5% K rate

A hard right to the theory’s chin! The no-hit rate for this 27-year period is significantly lower than for the previous 66 years of the live-ball era.

As an aside, the period just before this one, 1969-85, had a much higher no-hit rate, 6.5 (6.8), very close to the previous norm, but with just a 13.5% K rate. It also had 5 no-hitters by Nolan Ryan. Even for a long period, outliers have a large impact.

(2) 1995: The period 1995-2012 consists of the 18 highest season K rates in MLB history. (And 2013 looks to be another new record.) No-hitter rates for 1995-2012, and before:

  • 1995-2012: 4.4 individual (4.7 total); 17.2% K rate
  • 1920-1994: 6.1 individual (6.3 total); 12.1% K rate

And a left to the body! Not only is the no-hit rate for 1995-2012 much lower than the previous norm, despite an unprecedented K rate, it’s also lower than for 1986-2012, our alternate definition of the “high-K era.”

Slicing the current era any thinner makes the samples too small to be relevant. Yes, the no-hit rate for 2010-12 was very high, 10.3 per 10,000 team-games — not as high as 1990-92 (11.9) or even 1969-71 (11.1), but still eye-catching. But that eye-catching rate actually underlines my point: Those years stand out starkly against the rest of the high-K era. No other 3-year span in this era reached 7.0, and if you lop off 2010-12, the average was 3.5 for 1995-2009, 4.7 for 1986-2009.

And in terms of K rates, why start a study period in 2010? The K rate has climbed steadily every year starting with 2006. Year by year: 16.8% strikeouts in 2006 (1 no-hitter); 17.1% in 2007 (3 no-hitters); 17.5% in 2008 (2 no-hitters); 18.0% in 2009 (2 no-hitters), 18.5% in 2010 (5 no-hitters). What is the basis for starting with 2010?

I suppose it’s conceivable that there’s a K-rate threshold above which a small rise would have an exponential effect on no-hit rates. But I haven’t seen that argument made.

_______________

But let’s not count out the correlation theory just yet. Let’s look at some graphs of no-hitter rates and strikeout rates over time, to see if we can spot a trend with the naked eye.

Our first graph shows annual rates of no-hitters (the blue bars, graphed against the left axis) and strikeouts (the red line, graphed against the right axis). Again, the no-hit rate is expressed per 10,000 team-games, while the SO% is a percentage of all plate appearances. Individual and total no-hitters are overlaid.

Rates of No-Hitters and Strikeouts by Year, 1920-2012

No-hit and Strikeout Rates, by Year - 1920-2012

Yikes, what a mess! The only point of this graph was to show that you really can’t learn anything from a couple of seasons; there’s too much annual fluctuation.

But now let’s take the same data in rolling 5-year averages. In other words, the 2012 data points are actually the averages for 2008-12, and so forth. (Except that no pre-1920 data was used, so the points for 1920-23 represent the averages from 1920 through that year.)

Rates of No-Hitters and Strikeouts, Rolling 5-Year Average, 1920-2012

No-hit and Strikeout Rates, Rolling 5-Year Average - 1920-2012

Now, that’s a little more interesting. Starting from the left side, there does seem to be some correlation. But it starts to loosen up in the 1980s, and completely breaks down in the ’90s. Very recent trends might indicate a return to correlation, but it’s too soon to say. And of course, we’ve yet to see a no-hitter in 2013.

_______________

Grouping Like with Like

The next method is to forget about time periods and just group the yearly data by K rates. I sorted the seasons by SO%, then divided them into three equal groups of 31 seasons: Low-K (averaging 8.3% strikeouts), Medium-K (13.2%) and High-K (16.6%). The no-hitter rates, per 10,000 team games:

  • Low-K: 4.3 individual (4.3 total); 8.3% K rate
  • Medium-K: 6.1 individual (6.3 total); 13.2% K rate
  • High-K: 6.0 individual (6.4 total); 16.6% K rate

Conflicting pictures here. The difference between the Low-K and Medium-K groups suggests a correlation. But the no-hit rate is about the same for the Medium-K and High-K groups, despite a sizable difference in K rates (26% more in the High-K group) and a large number of seasons.

Now let’s break it down a little finer.

I formed five groups of 18 or 19 seasons: Min-K (averaging 7.7% strikeouts), Low-K (9.8%), Medium-K (13.2%), High-K (15.2%) and Max-K (17.2%, consisting of 1995-2012). The no-hitter rates:

  • Min-K: 3.4 individual (3.4 total); 7.7% K rate
  • Low-K: 6.2 individual (6.2 total); 9.8% K rate
  • Medium-K: 5.9 individual (6.2 total); 13.2% K rate
  • High-K: 7.8 individual (8.2 total); 15.2% K rate
  • Max-K: 4.4 individual (4.7 total); 17.2% K rate

The first four groups suggest a pattern. But what happened to Max? Again, those are the 18 highest season K rates in MLB history. Yet the no-hit rate for those years is far, far less than for the next-closest K group. How could you possibly fit that into a theory that no-hitters and overall strikeout rates remain directly correlated?

_______________

You might see where I’m going, but this is the end of Part 1. In Part 2, I’ll discuss the unstated assumption behind the correlation theory, and why that assumption no longer holds true.

_______________

Postscript

Besides small sample sizes, Mr. Verducci made an odd choice in citing the no-hit rate for the late ’60s. If you read the piece, you might have noticed the years that he labeled as the “late-1960s doldrums.” He chose 1967-69 as a point of no-hit comparison to 2010-12 — even though he’d already noted that 1969 was profoundly different from the prior years: “hitting was so bad they … lowered the mound (1969)….”

To further state the obvious: From 1968 to ’69, the K rate fell from 15.8% to 15.1% (the lowest rate since 1962), while batting average rose from .237 to .248. The number of plate appearances per team-game rose by 2.4%, from 37.2 to 38.1, the largest absolute change since 1904. On top of all that, there was an expansion from 20 to 24 teams.

So why lump together 1969 and 1967-68? I surely don’t know, but it’s hard to miss the fact that 1969 had 6 individual no-hitters, tying the live-ball record, while 1967-68 totaled 8 individual no-hitters, and 1966 had just 1. By using 1967-69, he showed a no-hit rate similar to 2010-12, but still higher than 2010-12 — sort of implying that we’re headed that way. Had he used the more reasonably connected 1966-68, it would have shown that the 2010-12 rate was already much higher than “the late-’60s doldrums,” which might have raised questions about small sample sizes. Even those folks who mistakenly think we’re in a sub-normal scoring context know that we’re nowhere near 1968.

If that 1969 data point meant anything, it would be a direct counterpoint to Verducci’s thesis. But one year, or three years, don’t mean anything when you’re studying something as rare as no-hitters.

Leave a Reply

19 Comments on "Are No-hitters Linked to Strikeout Rates?"

Notify of
avatar
Sort by:   newest | oldest | most voted
Doug
Guest

Tour de force, John.

The high-strikeout era has also corresponded with higher BABIP. This suggests that the strikeouts are just replacing outs that formerly were being made on balls in play. Ergo, fewer balls in play doesn’t necessarily mean fewer hits.

birtelcom
Editor
Which goes to the heart of the problem: strikeouts are only an influence on the probability of no-hitters indirectly, that is, only if they reduce batting averages. It seems like frumpy old batting average, one of the sabermetrician’s least favorite stats, would be the most obvious stat to correlate with no-hitter frequency. Only if K rate actually lowers batting average should it, theoretically, have an (indirect) effect on no-hitter probability. It’s likely that other factors have an important effect on no-hitter frequency. One indisputable fact about no-hitters is that they happen more frequently in September than any other month (21.5%… Read more »
Fireworks
Guest
Verducci is often good. I didn’t see that article but that sounds sloppy as heck. I remember after about a quarter season’s worth of games at NYS in 2009 the HR rate was almost 4 and people were talking about how the park was an embarrassment and how the Yankees had done such a poor job and, well, the HR rate declined the rest of the year and while it is still fairly high it is not at all ridiculous and it helps not to be shortsighted about the fact that the Yanks are a power team and mash homers… Read more »
Fireworks
Guest
Of course I can’t find the link to the ESPN article anymore, however, I did eventually pretty much write the vast majority of the section of the Wikipedia article on Yankee Stadium defending the decline in HR rate. I’ll be so feckless as to repost it here because I don’t post much anymore and I like a good opportunity to get JA to agree with me. “The number of home runs hit at the new stadium slowed significantly as the season progressed,[68] but a new single-season record for most home runs hit at a Yankee home ballpark was nonetheless set… Read more »
Hartvig
Guest

Perfect example of one of the many reasons I do so love this site- challenging conventional wisdom and picking it apart piece by piece until you prove that that which “everyone knows” ain’t necessarily so.

Well done (as always).

Timmy Pea
Guest

Loney hit a homer! I was just thinking, for such a great hitter he doesn’t homer much.

Timmy Pea
Guest

Loney is actually a good hitter, I should not have said great. He’s a perfect fit for the Rays, and I am not surprised he is having a great year. Look for a run by the Rays this summer. If David Price had been on track and not hurt, the Rays are the best team top to bottom in that division. They also have the best manager in baseball.

no statistician but
Guest

JA:

Any figures on the average number of SOs per no hitter over time? Not relevant to the big picture, but of interest, perhaps.

Timmy Pea
Guest

So many players wearing beards. I don’t like this trend.

mosc
Guest
Great piece. My inner statistician really needs to see the sigma squared deviation though. Basically, you’re projecting a standard bell curve for the probability of days between a no hitter and then compare that to the data to see if it’s more… clumpy than a standard random distribution. If it’s clumpy, that would imply some causality link with something (assuming the sample size is big enough, which I think we actually have here given the live ball era). If it’s not particularly clumpy, you would then imply that small sampling size combined with the rarity of the event is giving… Read more »
trackback

[…] have become more prevalent in recent seasons.” This belief is still popular, but it’s unfounded — as long as you don’t cherry-pick the comparison years, or ignore the expansion from […]

wpDiscuz