I really like OPS. Proposed by John Thorn and Pete Palmer in 1984, it’s still the best indicator of batter run productivity that anyone has devised.
At the risk of poisoning this blog with a lethal dose of theory, what follows are some reflections on how I conceptualize OPS and what I regard as the optimal way to construct it. Plus some measurements (phew!).
1. I like OPS because it is an explanatory measure of a hitter’s propensity to generate runes.
Not all batting metrics, even good ones, are explanatory. Consider weighted on-base average (wOBA), the offensive component of WAR. wOBA consists of a set of correlations, updated every season, between runs and positive batting events. It would be circular to say that a metric derived from runs in this fashion explains what causes them. It is more in the nature of a summary of a hitter’s role in runs scored.
In contrast, runs scored don’t figure in the calculation of OPS. It’s in the nature of a latent variable measure—a quantifiable proxy for a not directly observable phenomenon, here hitting proficiency. Having a demonstrated relationship with runs but without being derived from them, it can be understood to be tapping into the hitting characteristic that causes runs to happen.
It’s just a matter of personal taste, I guess, but I feel like I’ve learned more about how the world works when I’m supplied with a statistical construct that helps explain and predict interesting consequences rather than one that summarizes them after the fact—even if the summary is extremely systematic and admiraby compact at the same time, like wOBA.
2. The conventional method for calculating OPS—simply adding a hitter’s on-base percentage to his slugging average—has understandably been criticized. On-base percentage (OBP) and slugging (SLG) are measured in different units. Imagine how strange it would be to add weight and height to construct an index of a person’s potential as a boxer!
But this defect is super easy to fix. All one has to do is standardize OBP and SLG. Standardization assigns to quantities from any normal distribution values that reflect the number of standard deviations those quantities are from the distribution’s mean. So transformed, batters’ OBPs and SLGs are put on a common scale.
It is sometimes objected that summing OBP and SLG inappropriately treats them as equally important elements of run production. This, in my view, is not a convincing criticism.
It is perfectly normal in statistics to sum the indicators—the observable proxies—of a latent variable. Doing so generates a composite index or scale that more accurately measures the latent variable than any of the indicators individually: the aspects of the indicators that correlate with the latent variable reinforce one another to form a stronger signal, while the noise associated with each—the aspects that don’t correlate with the latent variable—cancel one another out. The individual indicators’ correlations aren’t being “summed”; they are being used to form a common covariance structure, which in turn supplies the measurement power of the resulting index.
This measurement strategy does require that the combined indicators strongly covary with one another. OBP and SLG do. Measured on a season-by-season basis, they display a Cronbach’s α of 0.82, a degree of consistency that makes it perfectly kosher to fuse them together to form a latent-variable measure.
3. Baseball Reference’s OPS+ metric uses a different approach to solve the apples-and-oranges problem associated with the conventional measure of OPS. BBR forms OPS+ by adding a hitter’s OBP and SLG after each has been divided by the league average for those metrics (it then further transforms the sum onto an intuitively appealing scale centered on 100). A form of mean normalization, this technique also creates a common scale, although it lacks some of the nice properties of standardization (particularly for assessing performances over time, a point demonstrated by Michael Schell). BBR’s OPS+ also adjusts for the impact of differences in players’ home parks.
I decided to examine the relative explanatory power of BBR’s OPS+ and a variant of OPS formed by adding the standardized forms of OBP and SLG. I used BBR’s team-level OPS+ measure for every AL and NL season from 1900 to 2024. I likewise formed the standardized OPS measure, which I’ll call sOPS, on a team-level basis, season by season. To assess them as measures of run production, I regressed team runs scored (again, season by season) on OPS+ and sOPS, respectively, as well as on the conventionally formed OPS that consists of the sums of untransformed OBP and SLG.
As can be seen, sOPS performed the best. Across the run of seasons, it explained an impressive 89% of the variance in team runs scored. BBR’s OPS+, in contrast, explained 69%.
Interestingly, the conventional OPS measure finished a close second to sOPS, explaining 87% of the variance in runs scored season by season over the history of the American and National Leagues. It’s not the way one should form a latent variable index, but summing the raw OBP and SLG turns out to work very well.
Nevertheless, there are likely to be some differences of note between OPS and sOPS. I’ll report on my investigations of that another time! But when I do use sOPS for my own research, I’ll always be sure to note the relationship between results it generates and those obtained with the conventional OPS measure so as be sure there is no mistake that it’s still Thorn and Power’s insight illuminating the path ahead.
I’m not sure why BBR’s OPS+ performed less well. Maybe the problem is the way in which BBR integrated OPS with ball park effects? Such adjustments can definitely improve predictive accuracy, but incorporating them is not straightforward; depending on how they are fused onto OPS, they could easily detract from the overall power of that hitting metric rather than boost it.
The BBR analysts are great at what they do, and I expect that, as they themselves use empirical methods to validate their metrics, they’ll refine OPS+ to make it even more valuable.
As always, you can find my data and analysis script in the site’s stats-and-data library.