R^2 lost in translation: from runs produced to WAR

So let’s continue our foray into estimating runs produced. I now want to examine what happens when schemes for measuring individual hitter run production get translated into offensive measures of WAR. It’s not a pretty picture.

But first a short recap:

1. I started by constructing a regression model that generates an estimate of player run production based on OPS. I then showed that this method, when scaled up, results in a very strong prediction of team runs scored.

2. I next looked at three other schemes for measuring individual player run production: Bill James’ “runs created,” FanGraphs weighted Runs Created, and Baseball Reference’s rbat. I found that at the individual level, these measures are highly concordant—to the point of being seemingly interchangeable. I did observe, however, that rbat seems to generate more conservative estimates than the three other schemes.

So now the question is what happens when these schemes get translated into estimated team “wins.” Specifically, I want to know whether the various transformations that FanGraphs and Baseball Reference use to generate estimations of WAR retain the explanatory power of their runs-produced measures with respect to team offense.

For comparison, I wanted to do the same thing for the OPS-derived metric. So I turned its “runs produced” measure (OPSrp) into a “wins above average” measure (OPS_WAA; at the bottom, I’m reproducing and enlarging the glossary from a couple of posts ago, in case you understandably find yourself losing track of the labels) . Here’s how I did it:

First, I used James’s Pythagorean Formula (JPF) to estimate the winning percentage for every AL/NL team from 1900 to 2024 based on the differential between the OPSrp runs it was predicted to score and actual number of runs that the team allowed. By multiplying the number of games the team played by that winning percentage, I backed out the number of wins the team would have.

Second, I calculated a win percentage for every team by applying the JPF formula to the differential  between the season average OPSrp runs scored and the runs the team in question allowed. I then backed out a number of wins by multiplying games played by that winning percentage.

Third, I subtracted the latter number of estimated wins from the former for each team. That difference represents the number of wins above average for each team associated with any given team’s OPSrp runs above average. Hence OPS_WAA, for estimated OPSrp wins above average.

Okay. So take a look at these figures. The top graph examines the percentage of variance explained by all the teams’ estimated runs above average under a given schemethe bottom, the percentage of variance explained by their corresponding measure of “Wins”: OPS_WAA in the case of OPSrpaa; and offensive WAR in the case of FanGraphs’ and Baseball Reference’s respective runs-above average estimates (wRAA and rbat).

“Holy cow!”

The most obvious thing is how big a hit FanGraphs takes when it converts its runs-produced above average measure into a WAR measure. The former consistently explained over 80%, and often as much as 90%, of runs scored; the latter is consistently below 75%. I should add, too, that the FG offensive WAR measure includes estimated base-running runs that aren’t part of FanGraph’s wRAA batting runs above average, so the loss of power is likely understated (if we assume that its base-running-runs metric actually adds something to its offensive WAR measure’s explanatory power; of course, we can’t know whether it does without empirically testing that measure, too).

OPS_WAA forfeited a little bit of power but not nearly so much.

What this all amounts to is that FanGraph’s started out with a really great runs-scored measure—one just as strong (essentially) as the OPS-estimated runs produced system—but then somehow made a large chunk of that measure’s explanatory power disappear in its WAR computation.

The story is a bit more complicated for Baseball Reference.

Actually, its runs-above average measure—Rbat—performed less well than either the FanGraphs or OPS-estimated runs above average scores. A bit surprising given its concordance with those measures; either it doesn’t “scale up” from individual players as well, or Baseball Reference’s team rbat scores don’t match up perfectly with its player ones (I suspect the former but because BBR’s Stathead service doesn’t indicate what fractions of an individual rbat score should be assigned to which team when an individual plays for more than one in a season, curious members of the public can’t aggregate the individual player scores and compare them to the reported team scores; the Lahman database, in contrast, makes it really easy to compute team-specific performance totals for players who appeared for multiple franchises in a season . . .).

Nevertheless, it turns out that its team offensive WAR scores are also slightly less powerful than its (batting only) runs produced above average measure. The relationship between them is uneven (maybe because WAR, unlike rbat, has base running?), but overall BBR offensive WAR explains 61% of the variance in team runs across seasons, and rbat 67%.

Why do the WAR measures–particularly FanGraphs–lose so much explanatory power compared to their raw runs-above-average measures?

This outcome can’t be blamed entirely on the noise introduced by the imprecision of translating runs into wins. If that were it, the OPS-derived system would have suffered a dilution of power of comparable size.

I can’t be sure exactly what the problem is, but I have some hunches.

One is that the FanGraphs and Baseball Reference both use uniform, season-specific “runs-to-win” conversion factors. This distorts the precision of the conversion because in fact the win-value of runs scored isn’t uniform—it depends, a good deal, on the size of a team’s runs scored/runs allowed differential.

Another problem is likely the use of a wins above replacement rather than a wins above average baseline.

Lots of ink has been spilled on which should be used as a matter of theory.

But as an empirical matter, it is clear that replacement is an inferior choice. One (only one) of the problems is that it artificially squeezes the normal distribution of player performances into a right-skewed one with a floor close to zero. This drains variance, and hence predictive power, from differences in the performances of roughly average players, who necessarily make up the densest part of the run-productivity distribution. . . .

Of course, these are just surmises. We could empirically test them. I will, in fact. But later—I’ve said more than enough for now!

As always, you can find the data for these analyses in the library. Check them out, see what you think, and then share your impressions: I want to learn something from you!

Glossary

Baseball Reference Offensive WAR. This is the portion of team WAR associated with a team’s hitting and base running. The site’s oWAR doesn’t convey this figure—because that value includes players’ “positional adjustment.” But it can be straightforwardly derived by subtracting from a team’s total WAR the WAR totals that Baseball Reference reports for team pitching and for team fielding.

FanGraphs Offensive WAR. This is the portion of team WAR associated with a team’s offense—including both its estimated batting and its base-running runs.  FanGraphs doesn’t explicitly supply this tally but you can readily calculate it consistent with the site’s posted formula with the information in a team’s “Value” tab and the information supplied on FanGraphs’ seasonal “Guts” page.

J_RC. An estimated measure of an individual batter’s runs produced. Developed by Bill James, it involves a formula based on various offensive events.

J_RC_aa. An estimated measure of an individual batter’s runs produced above average. Derived by subtracting from the hitter’s J_RC the J_RC of a hitter with the same number of plate appearances and the league mean J_RC per plate appearance.

JPF. The Bill James “Pythagorean formula” for winning percentage: Winning percentage = Runs scored / (Runs scored + Runs allowed). The formula generates an R2 for win percentage of about 0.90 across AL/NL history. There are alternatives that adjust the exponent based on changes in overall league scoring but the differences between their estimates and the JPF estimates are so trivial as to be irrelevant.

OPSrp. OPS‐estimated runs produced. An individual player estimate of runs produced derived from the number of team runs we’d expect to be scored based on the player’s OPS and plate appearances. Team measure consists of aggregation of individual player scores.

OPSrpaa. OPS-estimated runs produced above average. Derived by subtracting from a player’s runs produced the number of runs a player with the mean season OPS would produce in the same number of at bats. Team measure consists of aggregation of individual player scores.

OPS-WAA. The estimated additional “wins” a team derives the difference between its OPS-estimated runs produced (OPS-RP) and the runs produced by a team with the league average OPS-RP estimate runs for a particular season.

Rbat. Baseball Reference’s measure of a hitter’s estimated runs produced “above average.” Team measure consists of aggregation of individual player scores.

wRAA. FanGraph’s wRC “above average.”

wRC. FanGraph’s wOBA derived measure of a hitter’s estimated runs produced.

Leave a Reply

Your email address will not be published. Required fields are marked *