Blog
In a series of recent posts, including ones detailing my ongoing project to form a valid estimate of the impact of pitchers’ “ball in play” propensities on runs allowed (keep your eyes peeled for an enhanced, BIP_RBA_2.0), I’ve been using the metric FIPr. I thought I might as well spell out why I’m using it as opposed to the conventional FIP metric available on Baseball Reference and FanGraphs.
To start, what’s the difference between them? Well, conventional FIP applies invariant non-regression based weights to pitchers’ rates of home-runs allowed, walks, strikeouts, and hit batters and then modifies the sum by an ERA-mimicking constant. FIPr, in contrast, uses linear regression to relate the impact of those elements of performance to runs allowed...
You know I’m still perfecting BIP_RBA_v2, of course! But don’t worry, I won’t subject you to yet another report on my progress (as productive, frankly, as I think some of the latest adjustments to this metric have been).
Instead I thought I’d share a standardized career performance list.
Standardized batting-average champs
As you likely recall, standardization is a statistical technique for making performance metrics commensurable over time. Raw metrics aren’t: changes in game conditions unrelated to player skill skew batting averages, OPS, fielding-independent pitching (FIP) and the like. Standardization puts those metrics on a common scale by converting them into units (z-scores) that reflect how many standardized deviations a performance...
Well, I think I’ve exhausted myself (and all 17,936 regular readers of this blog) on this one. I don’t expect to say more about BIP_RBA (“balls-in-play runs below average”) for a while (of course, expectations are fragile things. . . ).
For fun, I’ve included the top 10—or really 13—single-season “BIP runs suppressed.” That’s the number of runs a pitcher avoided by virtue of his BIP propensities relative to a pitcher with average ones. The score is calculated by taking the pitcher’s BIP_RBA_v1, which reflects BIP runs below average per game, multiplying it by season innings pitched, and then dividing by 9.
But more importantly, here are some general reflections. I’ve divided them into...
As you likely would have guessed, I’ve been obsessively refining the “balls-in-play runs below average”—BIP_RBA—metric. I feel what I have now can be officially designated “version 1.0,” superseding BIP_RBAb, the beta version of this measure.
The most significant revision involved respecifying the “BIP eras.” To compute BIP_RBAb, I fit models separately to seven distinct periods that tracked major fluctuations in MLB-wide run scoring.
After experimenting with a number of alternatives, I concluded the best strategy was to bin seasons based as closely as possible on the contribution that pitcher BIP propensities made to variance in runs scored. Accordingly, every AL/NL pitcher’s season BIP_RBA is based on parameters associated with one of 14...
Okay, I promised to say something about Balls-in-Play Runs Below Average (BIP_RBA) and opponant Batting Average for Balls in Play (oBABIP). But I don’t plan to say a lot.
First, BABIP is an annoyingly uninformative measure. Like ERA, oBABIP merges the combined efforts of a pitcher and his fielders.
Voros McKraken famously posited that the pitcher should in fact get no credit for a low oBABIP (or high one): what happens to a ball after a batter connects with it is entirely a product of fielding proficiency and chance, he argued (for a while anyway). Opponent BABIP is one of the fronts on which the battle over his thesis has been waged.
Indeed, I was motivated to develop BIP_RBA to try to assess the extent to which a pitcher should be credited...
So how about knuckleballers? Do they throw balls that, when hit, are easier to field?
This is often claimed. Their BIP_RBARb seems to support this position.
BIP_RBARb stands for “balls-in-play runs below average,” with b denoting “beta”—because construction of this metric is expected to be refined and improved.
A pitcher’s BIP_RBAb measures the impact of his propensity to confine hitters to infield pop ups and ground balls. As its name implies, the measurement is in terms of runs allowed per game relative to a pitcher with average tendencies in those regards. I described BIP_RBAb derivation in the previous post.
At the bottom of this post is a list of season BIP_RBAbs for pitchers who threw ≥ 100 innings and who have been identified by Rob Neyer as...
This is the third instalment in the “let’s look at BIP pitcher tendencies . . . .”
The overall project is aimed at some big questions in the study of baseball performance: Does any pitcher-specific propensity genuinely limit the quality of balls hit in play? If so, what sorts of pitchers and which pitchers in particular exhibit this propensity? What is the impact of this propensity, if it exists?
I feel reasonably confident in my Retrosheet-derived data, which displays various encouraging indicia of reliability and validity, including reasonable correspondence with (and some interestingly more discerning qualities than) Baseball Info Solutions data from 2002-2024.
So now I’m turning to the task of using the data to estimate the impact on runs...
Are pitchers who suppresses the proportion of balls hit in the air more successful?
Somewhat disconcertingly, it sort of depends on who you ask.
Digital data compiled by Baseball Info Solutions (BIS), and reported by FanGraphs, suggest no.
Indeed, if one weren’t paying close attention, one might think the opposite were true. After controlling for fielding-independent pitching and team fielding, an increase in the proportion of BIP that are either line drives or fly balls indicate, counterintuitively, a very tiny decrease (r = – 0.06) in runs allowed per game.
But this is a statistical illusion. The zero-order correlation between the proportion of line drives and fly balls and runs allowed is very slightly positive, as you’d expect. ...
I received the benefit of a very generous and profitable consultation with the world’s greatest expert on how to measure and interpret performance spreads in baseball. And there is good news and even better news.
The good news is that he helped me overcome a mistake that was distorting my analysis of the Gould conjecture. For reasons that I won’t bore you with, I had figured that changes in the mean of a performance metric didn’t supply a reason to anticipate any divergence from Gould’s expectation of declining standard deviations. That’s not right, especially where the mean displays a persistent increase over a period of time, which often generates a corresponding increase in the SD. (It’s obviously still fine, though, to use season-specific...
I have a new toy: a dataset of “balls hit in play” that consists of the types of batted balls—infield ground balls, outfield flies, infield popups, etc.—that I coaxed out of the hieroglyphics that constitute Retrosheet play-by-play event codes, using a program for interpreting the same. There is a very high correlation between my results and the “Project Scoresheet” Retrosheet companion codes, which I had been steering clear of because of Sean Smith’s bad experience with them. But his troubles likely had to do with field-position coding, which I’m less focused on for now.
Anyway, the new data is useful for assessing the perpetually disputed Voros McCracken postulate that pitchers have no influence over the fieldability of balls hit in play....
It had to be done.
Well, not really. But I did it because I’m still collecting examples of season-by-season trends in performance standard-deviations for the purpose of testing the Gould conjecture.
In this case, “it” was the compiling of standardized strikeouts per 9IP. That was essentially a natural byproduct of the aforementioned testing, which required computing season-by-season K/9IP weighted means and weighted standard deviations, something I did based on the performance of every AL/NL pitcher who threw ≥ 100 innings from 1900-2024.
With that data I calculated each pitcher’s season-specific K/9IP z-score, which reflects his performance relative to the mean pitcher’s performance that year. Placed on this common, standardized scale, we can...
So I thought it would be fun to start the year with another standardized “all-time best seasons,” this one focusing on fielding-independent pitching or FIP.
Before I get to the results, let me quickly review the motivation for, and mechanics of, this sort of analysis.
If one wants to compare historical “best seasons” rate statistics in baseball, it’s usually a mistake to look at raw numbers. Changed game conditions, unrelated to player skills, bend and twist statistics like batting average, on-base percentage, and OPS, obscuring the quality of the performances behind them.
Standardization dispels this variability fog.
It transforms values from a normal distribution into units that reflect the number of standard deviations they are from the...
[Note: take a look at this post. It dispels most if not all of this particular anomaly! This post now stands as testament to the contribution error can make to enlightenment.]
Well, with reluctance and trepidation, I’ve pretty much concluded that Stephen Jay Gould’s (r)evolutionary conjecture on the variability of sports performances is wrong.
You might know (or if you have been following my own journey learned) that Gould famously conjectured that athletic performances can be expected to become less variable as a sport matures. This dynamic reflects selection forces akin to those in evolution. Just as members of a species can be expected to converge on a fitness-promoting gene mutation, so athletes can be expected to converge on a trait that...
I noticed this pretty much by accident as I was examining something else, and I thought it was pretty interesting: Baseball Reference and FanGraphs obviously disagree about the value of the five best single-season pitching performances of baseball’s most high-profile Johnsons.
I think I know what’s going on here. FanGraphs bases its preferred pitcher WAR calculations on fielding-independent pitching (FIP), albeit with some adjustments, whereas Baseball Reference uses an “expected/actual run differential” formula. FIP is a metric that favors pitchers from the last three decades or so, the period in which strikeout rates have surged.
FanGraph’s choice makes sense for pitchers of the “Great Transformation” era (the period in which the game has...
Okay, so here’s another crazy one. Look!
See it? It’s another Gould-defying trend in a baseball-performance standard deviations!
I suppose it makes sense to start with what it’s the standard deviation of.
Basically, it’s a regression-derived fielding-independent pitching (FIP) metric. FIP is an index comprising a pitcher’s strike out rate, home-run allowed rate, and hit-by-pitch rate. These are all elements of run-avoidance that don’t depend on the quality of the defense that backs the pitcher up.
To calculate a metric of this sort, I regressed runs allowed per game for every pitcher for every season (≥ 100 IP) of AL and NL history against his (season-specific) strikeouts per 9/IP, HR/IP & HBP/IP. The resulting equation generates an expected...
This was something that for sure needed inclusion in the fielding paper: an assessment of what Statcast metrics have to say about “fielding shrinkage.” They say: “yup, unh huh, fielding doesn’t matter too much anymore.”
Beyond that, the insight gleaned from adding Statcast data is that Statcast’s fielding measure is not any better than others. Indeed, it explains less variance in team runs allowed than either Smith’s DER (Defensive Efficiency Record; he uses DER as prefix, and I’m calling his measure that now to distinguish it from “Outs Above Average” label Statcast also uses) or Fielding Bible’s Defensive Runs Saved. The same as Ultimate Zone Rating, another digital metric examined...
Download it here. Just a draft, so looking for feedback! Will put up dataset & codebook in due course, but if you head over to data library, all the data are there; analyses have been refined/extended but if you’ve been following along with the posts, you shouldn’t have trouble reconstructing them.
I’m going to show you something weird that I spotted recently. [Note: I’ve learned some important things since I did this post!] Ready? Here it is:
What’s weird about this, you might well be asking.
Well, the pattern—and in particular, the increase in the variability of SLG% over time—contravenes one of the most provocative and, as far as I can tell, most universally accepted conjectures about the tendencies in baseball performance and in a variety of other athletic endeavors.
The conjecture was set forth by the brilliant evolutionary biologist Stephen Jay Gould. He was addressing the extinction of the .400 hitter in baseball. He observed that the while the mean batting average, despite erratic short- to medium-term fluctuations, had...
Okay. I lied. . . .
I really, truly did think I was done talking about fielding, at least for a good while.
But then Sean Smith dropped a file with comprehensive OAA—Outs Above Average—scores for 1912-2024. The OAA “Total Fielding Runs” (TFR) measure is pure gold. Gold on steroids.
First, recall what the relationship between fielding and pitching across AL/NL history looked like in previous posts:
That unsightly hole in the 1990s was not a consequence of a breakdown in fielding during that time. Rather it reflected a period in which Smith’s Total Zone Rating (TZR; used by both Baseball Reference and FanGraphs for computing WAR) was degraded by reliance on bad data from Retrosheet’s “Project Scoresheet” companion...
It’s way past time to move on–I know, I know!
But I wanted to add just one more piece to the fielding-measurement picture: the performance of Ultimate Zone Rating, or UZR.
I realize now that what I’ve said up to this point might make it look I’m talking only about how Baseball Reference (BBR) computes the fielding component of WAR.
So let’s expand the focus a bit.
First of all, let’s consider the difference between how FanGraphs and BBR measure the fielding component of WAR. There isn’t much up to 2002. Up until then, they both use as the basis of their formulas Sean Smith’s Total Zone Ratings. Accordingly, they both do a great job in picking up the substantial amount that fielding contributed to averting runs throughout most of the 20th...
Everyone knows (and is correct to believe) that Brooks Robinson is the greatest fielding third baseman ever. But how should all the other major leaguers who’ve manned the hot corner be ranked in fielding skill?
This is another topic suited for illustrating the impact of rfield inflation. Three of the next four places on Baseball Reference’s list of all-time position leaders in runs saved are occupied by third basemen who played either most of or their entire careers after the late 1990s: Adrian Beltré, Scott Rolen, and the still active Nolan Arenado. There is thus a high likelihood that the impact of their performances have been overstated.
“Rfield inflation” refers to the overvaluation of the consequence of fielding proficiency in the metrics...
I really like OPS. Proposed by John Thorn and Pete Palmer in 1984, it’s still the best indicator of batter run productivity that anyone has devised.
At the risk of poisoning this blog with a lethal dose of theory, what follows are some reflections on how I conceptualize OPS and what I regard as the optimal way to construct it. Plus some measurements (phew!).
1. I like OPS because it is an explanatory measure of a hitter’s propensity to generate runes.
Not all batting metrics, even good ones, are explanatory. Consider weighted on-base average (wOBA), the offensive component of WAR. wOBA consists of a set of correlations, updated every season, between runs and positive batting events. It would be circular to say that a metric derived from runs...
I’m interested in pitching evaluation metrics (who isn’t?!). Recently, I’ve been trying to understand what drove differences in pitcher proficiency throughout most of the twentieth century. We know that strikeouts are pretty much everything now, but that wasn’t so then. What sorts of factors most influenced pitching proficiency in the 1950s, say, and what is their relative power today?
The obvious metric to focus on is “fielding-independent pitching” or FIP. As an index comprising nothing more than the propensity of a pitcher to strike batters out, walk them, hit them with a pitch, and allow home runs, it’s not surprising that the power of FIP to explain differences in runs allowed has soared over the last three decades or so, as both strikeout...
Argh, I’m just stymied!
I’ve been sucked into the BABIP—batting average for balls in play—rabbit hole.
At any period in big league history, variation in BABIP and oBABIP (opponent batting average for balls hit in play)—contribute a substantial amount to runs and runs allowed. So naturally, one wants to try to explain them.
Voros McCracken is famous for the thesis that pitchers have no impact on BABIP. If a hitter makes contact with the ball, nothing a pitcher has done is going to affect it, making outcomes a mix of chance and fielding quality.
Well, for sure, variance in BABIP has historically been substantially influenced by fielding proficiency, as measured by rfield (runs saved by fielding).
And I think most would accept now that either pitching...
No, wOBA is not a delicious type of sushi. It is a hitting-production metric that is supposed to be better than OPS and that figures in the calculation of WAR.
I thought this post was going to invovle some kind of hyper-theoretical explanation of why, despite being a less successful empirical predictor of runs scored, OPS has theoretical virtues that make me prefer it to wOBA.
But I don’t think there’s any need for that. Because as far as I can tell, wOBA isn’t superior to OPS as a predictive measure after all.
wOBA stands for “weighted on-base average.” It was devised in the classic work The Book: Playing the Percentages in Baseball by Tango, Lichtman, and Dolphin (TLD), who, as I said, proposed it as a superior alternative to the much...
This is my third and (I think) last post on the declining impact of fielding proficiency.
I said in the first that I’d offer a practical illustration of how much less benefit a team of skillful glove men enjoy over a team of mediocre ones.
The illustration involves the 1967 AL pennant race.
It was a wild one! Three teams—the Red Sox, Twins, and Tigers—all entered the last weekend of the season with plausible paths to victory (and none of them involved “blue walls”. . . ). When the dust settled, the Red Sox emerged victorious, a game ahead of both the Tigers (who’d win the World Series the next year) and the Twins (largely the same team that had won the AL flag in ’65 and would win the AL Western Division in ’69). The White Sox, too,...
If the World Series is over, it must be HOF season. Who will be elected?
Obviously, Ichiro.
But I want to focus here on the candidates before the Classic Era Committee. And on one player on its ballot: Dick Allen.
Allen was considered by Era committees twice before, in 2014 and 2021. Both times he finished 1 vote short of induction.
Actually, it is insane that Allen hasn’t been elected yet. Insane that he wasn’t elected by the BBWAA. Insane that he wasn’t elected by either of the Era Committees to consider him.
Plenty have addressed his obvious qualifications. But I want to pile on. So . . .
Take a look at these players:
Aside from Allen, they are all in the HOF. Indeed, all the HOF members were BBWAA electees.
Now consider: of these players,...
In a previous post, I presented data showing that the contribution of fielding to variance in run prevention had declined about 75% in the last few decades. I now want to look at a related phenomenon.
I’ll call it rfield inflation. It looks like this:
“Rfield” is the fielding component of WAR. It represents “the value in runs of all aspects of the player’s fielding.”
In the 1950s and ’60s, one unit of rfield equated to one run saved, as it is supposed to. But thereafter, it tanked. In the 2024 season, one unit of rfield was worth about 0.5 runs saved.
Click on this if you want a reminder of the shrinking value of fielding over time
This is only an indirect consequence of the increasing contribution of fielding-independent pitching or...
Bobby Witt Jr. had a good year at the plate, winning the AL batting crown with a .331 average. But exactly how good was it?
As I’ve discussed in previous posts, one can’t answer this sort of question based solely on the raw value of a metric like batting average or OPS or whathaveyou. The reason is that extraneous variability, in the form of fluctuating means and standard deviations unrelated to hitting proficiency, makes those measures non-comparable across seasons.
A simple and valid way to restore comparability is standardization: one transforms the measure in question—here, a player’s batting average—into a so-called z-score, which reflects how many standard deviations it is from the mean for that season. This puts the measure (again,...
1. The bottom-line of this post is simple and stark: the impact of differences in the quality of major league fielders has been precipitously falling for decades and is at this point of negligible consequence for how teams fare over the course of a season.
It’ll take some space to lay out the empirical evidence, and I’ll try to confine myself here to the essentials. I have uploaded the data and analysis script relied on so others can assess the soundness of my conclusion and investigate alternative theories.
2. Before I start, though, I feel impelled to acknowledge the modest role I played in prying this inference loose from my own data.
I initially intended only to compare the relative predictive power of Baseball Reference’s pre- and post-2003...
The short answer is yes. For the longer answer that makes the case—read on.
This post is a successor to the one on who had the highest-ever single-season batting average. I’m now using the same methods to determine the highest single-season OPSs.
First propounded by John Thorn and Pete Palmer in 1984, OPS (On Base Percentage plus Slugging) remains the single best predictor of run production at the team level.
As one can readily see, league-wide OPSs, like league-wide BAs, have jumped around over time. That means one can’t use the raw value of OPS (or BA) to rank the quality of the performances that this metric measures: variability rooted in changing game conditions, rather than fluctuations in batters’ skills, make direct comparisons...
The bbcardstats-eries is ready to fill the void created by the widespread lack of interest in this year’s World Series!
Here’s how it will work:
Select a team, 1 player per position, from draft pool below. Total team salary cannot exceed $33. Post your roster and team name in a comment responding to the Sunday Challenge™ tournament announcement or in the comment space below. Players can be drafted by multiple teams. One team per participant (we know how you think).
The draft period, and thus the tournament entry window, will close at noon on Monday, Oct. 28, or when 32 teams have been submitted, whichever comes first.
The entered teams will be assigned randomly to brackets. Bracketed opponents play best-of-7 series with winning teams...
This is pretty self-explanatory:
It’s well known that the pitching-mechanics revolution has made it possible for everyone and his grandmother to throw 100 MPH fastballs, and that this development is ripping pitchers’ arms to shreds.
Note too that according to Jon Roegele, Tommy John surgeries are under-reported. Whether the upward trend is understated too is hard to say; unless for some reason under-reporting is becoming increasingly common, then the rate at which the surgery is increasing probably isn’t more dramatic than what is reflected in Roegele’s count.
Is this trend sustainable? One wouldn’t think so. But MLB is stuck in an intractable “race to the bottom”—or race to the surgeon’s table. Obviously, this sort of carnage...
The two players not only best in the game today but likely the best in over half a century. No hyperbole here. None. To find a series featuring opposing players of this stature, I think you have to go back to 1909, when Honus Wagner’s Pittsburgh Pirates played Ty Cobb’s Detroit Tigers.
The franchises of course are storied, as is their rivalry in the Fall Classic over multiple decades, multiple generations of passion for America’s game. If this doesn’t excite you, you should report immediately to the ER to make sure you are still alive.
I mean, wow!
So coming off its awe-inspiring success in forecasting the A- and NLCS series (okay, that’s a tad hyperbolic!), BBCS-imulator 9000 is back to tell us what to expect in this upcoming World Series....
The simulation routine used to predict Super Rookies vs. ’27 Yankees series is going to have to evolve if it has a future. But why not at least give the BBC&S-imulator 9000 a chance at some real-world predictions first?
Using the same model and Monte Carlo simulation algorithm as before, I took a shot at predicting/handicapping the LCS’s. The matchups aren’t nearly as close the Rookie-Yanks series was predicted to be.
Start with the NL. The respective WAR-per-games of the Dodgers and Mets are 0.286 and 0.219, respectively. The model rates that a 57% heads-up, single-game advantage for the Dodgers.
Poor Gil is caught on the middle on this one …
Naturally that translates into an even bigger edge in a seven-game series. The Dodgers won...
Who had the best-ever season batting average
There have been 36 .400 hitters in MLB history. But there hasn’t been one in over 80 seasons. The last AL/NL player to reach this mark was Ted Williams, who batted .406 in 1941, and it seems plausible (not inevitable!) that no one will ever do it again.
The seeming extinction of the .400 hitter is associated with one of the most popular baseball-statistics parlor games: the construction of a season-for-season batting average exchange rate.
The motivation is the conviction that the disappearance of .400 hitters can’t be attributed entirely to the superior hitting acumen of the old timers. Game conditions have changed in myriad ways. So is there some defensible method for...
Okay, after the last post, it had to be done: the all-time best collection of rookie performers ever versus the best team in MLB history—the 1927 Yankees!
Could these Super Rookies really compete? I built a Monte Carlo simulation to figure that out. But I’ll get to that in a minute.
First let’s take a look at the starting lineups, and see how the two squads match up!
On the mound for the Rookies—appropriately enough—is the 1910 Yankees’ Russ Ford. Ford compiled a 26-6 record that season with a 1.65 ERA—good enough for the all-time highest WAR for any rookie (11.4). We will treat him as inexhaustible, and let him pitch every game.
The ’27 Yankees aren’t going to be restricted to one pitcher in this series. But if they were it would be their...
There have been 150 ROY winners since the inception of the award in 1947. Nineteen of those players are in the Hall of Fame.
Does that seem low to you?
Let’s ignore the ROYs who aren’t yet HOF eligible–about 25 who are either still active or who retired before 2019. (Some of them are actually shoe-ins–like Mike Trout, Justin Verlander, Albert Pujols, and Ichiro Suzuki. There are also some, like Lou Whitaker and Dick Allen, who should be in already and likely will be inducted via an Era Committee vote someday.)
That’s about 15%.
About 1.5% (273 of 18,000) of the men who have appeared in an MLB game are members of the Hall of Fame.
So just winning Rookie of the Year means that one is 10x more likely than average...
So this is definitely the place to start. With these two enchanting cards.
They supply everything that this endeavor—a tiny little site; a wholly personal tribute to baseball, but glad you are here—is about.
Start with what could seem like the most prosaic element of these cards’ appeal: when turned over, they report the entire careers of these two baseball giants. Every season is there. All the times Clemente topped the .300 threshold. All the times Mickey belted, 30-, 40-, and 50-plus home runs.
Plus the totals. Five hundred and thirty-six home runs. Three thousand hits; 3,000 exactly. . . .
Having it all there really does make them special. That’s because baseball, more than any other team sport conveys the magnitude of its stars’ accomplishment...