A couple of weeks ago, I wrote about a new statistic I developed called genuinely Fielding Independent Pitching (gFIP for short). I came up with this metric because I didn’t understand why FIP — which purports be a “fielding independent” pitching statistic — was calculated using innings pitched. Innings pitched, of course, are measured in outs, which are usually recorded by fielders.
You can read the full piece here (I highly recommend it, although I am a little biased), but this is the gist of it:
“Strikeouts, walks, HBP, and home runs are the only results of a plate appearance that are not influenced by defense or luck on balls in play. That being the case, those four numbers should be the only inputs used to calculate FIP… Therefore, instead of innings pitched, the ideal FIP denominator is HR+BB+HBP+K. This sum roughly represents the total amount of work a pitcher has done, and it is uninfluenced by the defense that played behind him.”
In my original piece, I focused my investigation on starting pitchers. Starters pitch more, and therefore they provide a larger sample size of data to use for research. However, the truth of the matter is that gFIP might be even more valuable for evaluating relievers.
As I outlined in the original article, HR+BB+HBP+K is highly correlated with innings pitched. For this reason, the FIP and gFIP leaderboards look very similar; most starting pitchers have a very similar FIP and gFIP.
However, over a smaller sample size, there’s more room for variance. In other words, some relief pitchers are going to have a much more substantial difference between these two numbers.
A Very Simple Example
Imagine two pitchers, we’ll call them Reliever A and Reliever B.
Reliever A took the mound and induced ground balls from the first three hitters. All three ground balls found a gap and went into the outfield for base hits. With the bases loaded, Reliever A struck out the next three batters to end the inning.
Reliever B took the mound and also induced ground balls from the first three hitters. All three ground balls were picked up by the defense and converted into outs. Because that was such a quick inning, Reliever B was able to come back out for a second inning, where he struck out the side.
Here’s that information again for you:
- Reliever A: 1 IP, 3 H, 3 K, 6 batters faced
- Reliever B: 2 IP, 0 H, 3 K, 6 batters faced
Both pitchers faced six batters. Both pitchers allowed three ground balls and recorded three strikeouts. Now take a look at their respective FIPs after those outings, using the FIP constant from 2021.
- Pitcher A: -2.83 FIP
- Pitcher B: 0.17 FIP
That’s a huge difference. It’s bigger than the difference between 2021 Zack Wheeler and 2021 Patrick Corbin. And yet the only difference between Reliever A’s outing and Reliever B’s outing is what happened to those three ground balls after they were put into play.
This example uses a tiny sample size. No serious baseball analyst would use FIP to evaluate a pitcher after just six batters faced.
But the point of this example is to demonstrate that in a tiny sample size, innings pitched are a less meaningful statistic that can be more heavily influenced by balls in play.
gFIP for Relievers in 2021
The following table shows the FIP and gFIP for each qualifying relief pitcher in the 2021 season. Prepare to do a lot of scrolling here.
As was the case with starting pitchers, the FIP and gFIP leaderboards do not look that different. The very best pitchers by one metric are the same as the very best pitchers by the other.
However, as was expected, there is a more noticeable difference between reliever FIP/gFIP and starter FIP/gFIP. The average starter in 2021 had a gFIP 23 ticks different than his FIP (in one direction or the other), while the average reliever had a gFIP 31 ticks different than his FIP.
In my original piece, I argued that gFIP wasn’t really a brand new statistic, because the gFIP leaderboards ended up looking so similar to the FIP leaderboards. When it comes to relievers, there is a slightly better case to be made that gFIP is a truly separate metric, as you can see from the chart.
gFIP Is Not Perfect
One thing you might have noticed from the leaderboard above is that Josh Hader did not rank as the best reliever in baseball last season according to gFIP. This set off some alarm bells in my head, because I am immediately wary of any metric for relief pitchers that doesn’t put Hader first.
It’s a good reminder that gFIP is not a perfect, comprehensive pitching metric, nor is it meant to be. It is, however, a good metric that is completely independent of fielding, and that is a valuable statistic to have.
So, rather than writing off this whole statistic because Hader isn’t ranked first, let’s take a look at exactly why Emmanuel Clase, and not Hader, is the number one pitcher by gFIP.
The first step in calculating FIP (or gFIP) is to find the numerator: 13xHR + 3x(BB+HBP) – 2xK.
For Hader, that number is -87 while for Clase that number is -74. Hader is leading so far.
The next step is to find the denominator. In this case, a lower denominator is better. Negative numbers get higher when they are divided by positive numbers. Therefore, the lower the denominator is, the lower the FIP/gFIP will stay for these two pitchers.
Hader pitched 11 fewer innings than Clase in 2021, so Hader has the lower FIP denominator. Therefore, because he has the lower FIP numerator and the lower FIP denominator, he has the lower FIP.
However, Hader recorded far more strikeouts and walks than Clase in 2021, and so his gFIP denominator is much higher than Clase’s. Therefore, Clase ends up with the slightly lower gFIP.
Clase had the better strikeout-to-walk ratio and home run rate in 2021, and so he was rewarded with the better gFIP. Hader, on the other hand, strikes out such a high percentage of the batters he faces that the defense frequently does not have to make a play. Clase has more impressive strikeout numbers when compared to his walk and home run numbers, but Hader has more impressive strikeout numbers when compared to his total innings pitched.
In the end, however, all that really matters is that Hader and Clase are both excellent pitchers, and this is reflected by both FIP and gFIP.
gFIP for Relivers in 2022
To finish things off, here is a gFIP leaderboard for qualified relievers as of June 07, 2022 (before game time). This can be used to help you deepen your understanding of each of these pitchers.
Keep in mind, however, that it is still early in the season and several qualified relievers have faced as few as 60-70 batters. With a couple good or bad appearances, a reliever could shoot up or down these leaderboards. Many of the numbers on this leaderboard could be a little out of date by the time you’re reading this.
In addition, remember that gFIP is not a perfect pitching stat, nor is it designed to be. Used in conjunction with other statistics, however, it can be one more useful tool in your belt. For instance, while this is certainly not meant to prove that Joe Mantiply is better than Josh Hader, it is one more indication that Mantiply is absolutely the real deal.
This metric doesn’t pass the smell test for me: a pitcher who strikes out ten batters and allows ten ordinary balls in play is dominant; but a pitcher who strikes out ten batters and allows a hundred ordinary balls in play is pedestrian. Both will have perfect gFIPs.
What you describe as cFIP in the other article works better, but in that’s mostly a dressed-up version of K% and BB%.
I think you make a very valid point, and the stat is imperfect in that regard. I do sort of see it as a more advanced version of K/BB ratio, with home runs added to the mix, rather than a perfectly comprehensive pitching metric or a perfect estimator of ERA — that’s sort of the same way I feel about regular FIP.
While it does work out as a pretty good predictor of future ERA (which makes sense, since there is some excellent research from Pitcher List showing that simple K/BB ratio is one of the best predictors of future ERA), it’s definitely a flawed metric, and I’m hoping to write more about those flaws in the future.
My ultimate problem has been that it’s very hard to make a defense-independent pitching metric that isn’t flawed, because pitching and defense will always be intertwined.