Projecting Minor League Hitters Using Statistics

Which minor league statistics are best to use when projecting the future performance of hitting prospects?

Predicting which prospects will be successful in the major leagues is hard. With so many factors to consider, even the best prospect rankings are littered with a mix of successes and failures. Performance is just one part of the equation, although isolating it has a few advantages.

Purely stat-based projections provide a more objective measure of value, with subjectivity only lying in choices made when constructing the model. This isn’t to say that performance is the only measure that should be considered; rather it should be used in conjunction with scouting reports that contain information where the system is blind. A computer can also constantly churn out thousands of updated player projections, whereas asking any person to do this would surely drive them insane. I’ll talk a bit about the methodology of the model, but for those just interested in the projections, they can be found toward the bottom of this article. For the sake of mobile readers, the rankings in this article are condensed to a simplified Top 100, but there will also be a spreadsheet linked with more detailed projections.

To start, I wanted to find which minor league statistics were statistically significant in predicting major league performance. For hitters, I narrowed it down to just six: Age, Iso, K%, BB%, Spd, and wRC+. These are standardized and input into a series of logistic models trained on historical minor league data. Since there is some collinearity between the input variables, an L2 regularization penalty is applied. Multiple seasons are factored in, but recent performance and statistics at higher minor league levels are heavily weighted.

The output of the model changed over several iterations, but I eventually settled on average WAR over a player’s best three seasons before age 30. Total WAR over the same time span was also considered, though I found better results using a three-year peak because of factors like playing time and injuries adding noise to the data. It’s also worth noting here that since these are WAR-based outputs, this list isn’t strictly a fantasy list.

The logistic models predict the probability of a hitter’s peak ending up in seven different categories: never playing in the MLB, under 0.5 WAR, 0.5-1.5 WAR, 1.5-2.5 WAR, 2.5-3.5 WAR, 3.5-4.5 WAR, and more than 4.5 WAR. The full probability charts for each player can be found in the spreadsheet link at the bottom of the article, with percentages indicating the cumulative probability a player has of reaching each threshold. The xWAR column is calculated as the mean of these probabilities.

A few key principles the model abides by from the training sample:

  • Age (relative to level), K%, and Iso are the most predictive minor league statistics
  • High strikeout rates in the low minors are extremely concerning
  • High strikeout rates in the high minors are a red flag but less important if the player is young for the level
  • Walk rates are less important in the low minors and more important in the high minors
  • Players who are very slow tend to have minimal defensive value, and players who are fast tend to end up at premium positions

These may seem obvious, but quantifying how much each of these matters answers the question of how to weight different factors.

There are a couple of weaknesses with the model. If I were aiming for perfection on the first try, I’d never end up releasing it. Firstly, speed does an OK job as a proxy for defensive ability, but it’s flawed in many cases, especially for evaluating catchers. I tested adding positions as inputs, but they had minimal predictive effect since players often change positions as they approach the majors. Eventually I’d like to add some more defensive statistics. The other weak point is projecting college players in the low minors. Recently drafted college players tend to be relatively old for their levels until they reach Double-A, so they generally don’t get great projections until then. I’d also like to add some categorical variables for a player’s pre-professional background. And of course, there has to be pitching projections as well—being on Pitcher List and all.

But without further ado, here are the Top 100 hitting prospects based on the projections:

Rank Player xWAR
1 Wander Franco 3.6
2 Dylan Carlson 3.2
3 Drew Waters 3.0
4 Luis Robert 3.0
5 Gavin Lux 2.9
6 Jo Adell 2.5
7 Bo Bichette 2.5
8 Luis Urias 2.4
9 Carter Kieboom 2.4
10 Trent Grisham 2.3
11 Cristian Pache 2.3
12 Isaac Paredes 2.1
13 Kyle Tucker 2.1
14 Alejandro Kirk 2.0
15 Gabriel Moreno 2.0
16 Vidal Brujan 1.9
17 Jarred Kelenic 1.9
18 Lucius Fox 1.8
19 Luis Campusano 1.8
20 Andres Gimenez 1.7
21 Daulton Varsho 1.7
22 Josh Naylor 1.6
23 Nick Madrigal 1.6
24 Khalil Lee 1.6
25 Nate Lowe 1.6
26 Willi Castro 1.6
27 Alex Kirilloff 1.6
28 Miguel Vargas 1.6
29 Alek Thomas 1.6
30 Brendan Rodgers 1.5
31 Ty France 1.5
32 Ke’Bryan Hayes 1.5
33 Tyler Freeman 1.5
34 Mickey Moniak 1.5
35 Taylor Trammell 1.5
36 Jeter Downs 1.4
37 Heliot Ramos 1.4
38 Xavier Edwards 1.4
39 Josh VanMeter 1.4
40 Yusniel Diaz 1.4
41 Luis Garcia 1.4
42 Brice Turang 1.3
43 Keibert Ruiz 1.3
44 Cole Tucker 1.3
45 Oneil Cruz 1.3
46 Brennen Davis 1.3
47 Josh Lowe 1.2
48 Ryan Vilade 1.2
49 Abraham Toro 1.2
50 Elehuris Montero 1.2
51 Nico Hoerner 1.2
52 Isan Diaz 1.2
53 Malcom Nunez 1.1
54 Yonny Hernandez 1.1
55 Kevin Padlo 1.1
56 Geraldo Perdomo 1.1
57 Daniel Johnson 1.1
58 Leody Taveras 1.1
59 Jose Devers 1.1
60 Nick Allen 1.1
61 Triston Casas 1.1
62 Mason Martin 1.1
63 Joe McCarthy 1.1
64 Akil Baddoo 1.1
65 Luis Santana 1.0
66 Ryan McKenna 1.0
67 Nick Gordon 1.0
68 Jazz Chisholm 1.0
69 Lolo Sanchez 1.0
70 Domingo Leyba 1.0
71 Seth Beer 1.0
72 Samad Taylor 1.0
73 Daz Cameron 0.9
74 Eguy Rosario 0.9
75 Randy Arozarena 0.9
76 Joshua Rojas 0.9
77 Brandon Marsh 0.9
78 Jason Martin 0.9
79 Ryan Mountcastle 0.9
80 Jake Cronenworth 0.9
81 Josh Stephen 0.9
82 Myles Straw 0.9
83 Jarren Duran 0.9
84 Alec Bohm 0.9
85 Thairo Estrada 0.8
86 Luis Guillorme 0.8
87 Jared Oliva 0.8
88 Jaylin Davis 0.8
89 Royce Lewis 0.8
90 Ji-Hwan Bae 0.8
91 Gabriel Maciel 0.8
92 Andrew Velazquez 0.8
93 Travis Blankenhorn 0.8
94 DJ Peters 0.8
95 Edward Olivares 0.8
96 Austin Hays 0.8
97 Adam Haseley 0.8
98 Luis Barrera 0.8
99 Jonathan Arauz 0.8
100 Tucupita Marcano 0.8

Wander Franco unsurprisingly ranks No. 1 overall. He checks off every box in the model inputs and continues to put up elite numbers as an 18-year-old in High-A. I mainly want to focus on players where the projections differ from traditional rankings though, starting with No. 2 and No. 3.

Dylan Carlson has been getting some attention this year for his breakout campaign, but he’s really loved by the projections. At just 20 years old in Double-A, Carlson has put up a 143 wRC+ with strikeout and walk rates of 19.4% and 10.7%. With 17 home runs and 13 steals, he’s very good at everything even if there isn’t one elite tool.

There’s a couple of things not to like with Drew Waters, but his age and combination of power and speed place him at No. 3. The good is a 155 wRC+ with 47 extra-base hits and 13 steals as a 20-year-old in Double-A—like Carlson. However, he’s also got a 26.1% strikeout rate and .451 BABIP. Waters does profile as a high-BABIP hitter with plenty of doubles and speed, and good minor league hitters also tend to put up very high BABIPs. The strikeout rate is more concerning, though since he’s so young for Double-A he gets somewhat of a pass.

The rest of the Top 10 is fairly tame until we get to Trent Grisham. He’s massively improved his stock with a 152 wRC+ in Double-A and a 160 wRC+ in Triple-A while walking nearly as much as he’s struck out. Totaling 23 homers and 12 stolen bases so far, he’s a potential five-category producer in fantasy if he can find playing time in the crowded Brewers outfield.

For the sake of brevity, the last two players I’ll touch on are the pair of Blue Jays catchers at No. 14 and No. 15: Alejandro Kirk and Gabriel Moreno. They’re both young for their levels and have put up outstanding offensive numbers with little fanfare. Scouts don’t love the 20-year-old Kirk’s build at 5’9″, 225 pounds but he compares to Willians Astudillo of the Twins with a little less contact ability but more power upside and a high walk rate. Moreno, 19, looks the opposite of Kirk at 5’11”, 165 pounds, though he’s similarly put up a 155 wRC+ with a 9.4% strikeout rate in A-ball.

Full projections of all 1,780 rookie-eligible minor league hitters with at least 100 plate appearances this season and 200 plate appearances in their career can be found here. In future articles, I’d like to also release historical projections and scores for the test predictions when I can spend more words discussing them.

(Photo by Cliff Welch/Icon Sportswire)

Alex Isherwood

Creator of @ProspectBot and former FantasyPros writer. Studying computer science and mathematics at William & Mary.

9 responses to “Projecting Minor League Hitters Using Statistics”

  1. Jeremy says:

    Is there a way to only show expected offensive WAR? Some guys (like Pache) project to put up a lot of value defensively but that doesn’t matter for fantasy purposes. If I’m looking at guys to stash it’s only the top expected offensive performers.

    • Alex Isherwood says:

      Not at the moment, but more fantasy-specific outputs could potentially be something to add in the future. Generally, the slower guys with good offensive profiles like Nate Lowe/Seth Beer would get a boost for fantasy and you might want to drop the Isaac Paredes types who play up the middle and have contact-driven profiles without great power or speed.

  2. Dave says:

    So, the training sample was using the MiLB stats of players with several years of MLB experience and comparing the two to identify the most closely aligned stats as the best indicators. Is that correct? Also, is contact% available in the minors? I always thought that was a somewhat useful indicator.

    • Alex Isherwood says:

      Yep that’s exactly right. I know SwStr% is available for minor league players now on FanGraphs, but it doesn’t go back far enough to have a good sample for some minor league levels. There’s a lot of overlap with K% also.

      Edit: I misread the first question — the MiLB stats are fit against 3-year MLB peak WAR, not the corresponding MLB stats.

  3. Jordan Rosenblum says:

    Awesome research–love seeing the Paredes and Kirk hype…How’d you decide which MLB hitters to include in the training model? Did you include all hitters above a certain 3-year WAR peak threshold? I’ve considered a similar project, but I’ve struggled on what to do with missing data (busts who never reach the majors). I suppose imputing replacement level might work. Excluding busts entirely would make the projections too optimistic, and imputing zero WAR would make them look too pessimistic.

    Also, did you use dummy variables to capture statistics in different leagues?

    • Jordan Rosenblum says:

      Also, if you’re curious to compare (I had a lot of fun comparing!), we also created a stats-only top offensive prospects list by peak translated MLB wOBA.


    • Alex Isherwood says:

      Thanks for reading Jordan, appreciate the feedback. My sample included all minor league hitters with at least 200 PAs for the years far back enough with available data, so many of them didn’t make the major leagues. The actual model outputs (linked at the bottom of the article) are percentages of the probability a player reaching each of the peak WAR buckets I talked about in the article. I treated players who never made the majors as 0 WAR. For the sake of condensing the probabilities into one number for rankings, the xWAR category takes the weighted sum of all the probabilities for each player, so it’s not necessarily saying ‘Player X projects to be exactly this good’ — it’s more like ‘weighing the risk/upside of Player X, this is an average outcome’. That does make it more pessimistic than only looking at major league players, but I think it’s also more realistic.

      There’s three logistic regression models that go into the projections — one for AAA/AA, one for A+/A, and one for A-/R+/R-/DSL. Different statistics matter more/less at different levels in the minors, but they’re grouped mainly due to having small samples of some of the higher WAR buckets at certain levels. There’s dummy variables for separating the individual levels within each model.

      Thanks for sharing your article too, definitely interesting to compare the differences. Looks like some pretty cool stuff.

      • Jordan Rosenblum says:

        Excellent responses, thanks for elaborating on the methodology…and yeah I agree zero WAR for the non-MLB guys is the best approach after hearing your response. My suggestion was to “impute replacement level” anyway, which I somehow didn’t realize is the same thing as zero war (replacement level=zero war!).

        Anyway, super interesting stuff, helping fill the void Katoh left years ago :) Appreciate you checking out our ranking as well!

Leave a Reply

Your email address will not be published. Required fields are marked *

Account / Login