St. Louis Cardinals: Improving Upon The New Hit Probability From Statcast

Mar 14, 2016; Jupiter, FL, USA; St. Louis Cardinals left fielder Randal Grichuk (15) makes a diving catch against the Minnesota Twins during the game at Roger Dean Stadium. The Twins defeated the Cardinals 5-3. Mandatory Credit: Scott Rovak-USA TODAY Sports
Mar 14, 2016; Jupiter, FL, USA; St. Louis Cardinals left fielder Randal Grichuk (15) makes a diving catch against the Minnesota Twins during the game at Roger Dean Stadium. The Twins defeated the Cardinals 5-3. Mandatory Credit: Scott Rovak-USA TODAY Sports /

On March 4th, Statcast revealed a new “Hit Probability” metric. This metric gives St. Louis Cardinals fans a new way to watch and measure a hitter’s performance.

Spring Training is well underway now for the St. Louis Cardinals, but the most interesting statistical development for the upcoming season was revealed at the MIT Sloan Sports Analytics Conference where Statcast revealed a new Hit Probability statistic. In their words, hit probability will tell us “based on the exit velocity and launch angle of the batted ball, how likely was the ball to land for a hit?”

This statistic will be expressed as a percentage, where 0% is never a hit and 100% is always a hit. You can use these probabilities to estimate batting average on contact, BABIP, SLG%, OPS-on-contact, and wOBA-on-contact.

The days of evaluating a player based on batting average and even traditional sabermetrics like wOBA and wRC+ are limited. Instead, we’ll be evaluating players based on their expected wOBA and expected wRC+.

I’ve previous applied this concept to see if Stephen Piscotty really slumped as the team’s primary cleanup hitter last year. I’ve also used it to evaluate contact quality allowed by Mike Leake, as well as to show how Aledmys Diaz can grow offensively.

The analysis boils down to exactly how much the hitter can control, which both myself previously and Statcast have assumed is exit velocity and vertical launch angle.

That’s really only two parts of the equation, though. It’s missing the last piece: horizontal spray angle. That is, the left-to-right direction the ball comes off the bat from foul line to foul line. A hitter has control over how hard he hits the ball and whether he hits a grounder, line drive, or fly ball. He also has control over whether he pulls the ball, goes up the middle, or takes it to opposite field.

Unfortunately, horizontal spray angle isn’t measured by Statcast, at least publicly. However, we can divide the field into parts, and use the x- and y-coordinates (x,y) of the batted ball to determine the path it was traveling on. A little more math, and we can determine which part of the field a batted ball falls in.

I’m still in the very early stages of this model and analysis, so I’ve only divided the field into six parts. Additionally, I assumed home plate was at point (125,0) on the field, which may be inaccurate. Measures of exactly where home plate is vary, but this estimate gives me a starting point to work from.

Further, I had to make adjustments to the y-coordinate data to fit my model. Once I did that, I used a center point of x=125 and standardized the coordinates using the average and standard deviations.

The result is something like the following. I apologize that this is still a very crude conception:

St. Louis Cardinals
St. Louis Cardinals /

Here, there are six horizontal spray bins, each with an interior angle of fifteen degrees. Using those bins, we can estimate the probability of a hit at a given estimated horizontal spray angle (eHSA or spray angle) using exit velocity (EV) and launch angle (LA).

If you’re familiar with some of my recent work, you’ve likely seen the BACON matrix that I developed in the article I linked above. If not, here’s the matrix again, with a brief explanation of what it means:

St. Louis Cardinals
St. Louis Cardinals /

Red areas are where batted balls are most likely to go for hits, and blue areas are where batted balls are least likely to be hits. The blue area at medium exit velocities and fly ball launch angles is the batted ball donut hole. Essentially, these batted balls are generally lazy fly balls, while harder hit balls at those angles go for home runs and weaker hit balls at those angles fall in for bloop hits.

Of course, that model didn’t include eHSA. It should make sense that the direction of a batted ball at a given EV and LA impact whether it’s a hit or not. For example, a fly ball at an EV and LA combination that results in a batted ball distance of 380 feet is almost always a home run to left field, but is almost always an out to center field. To fully capture hit probability, we have to take direction into account.

By classifying each batted ball with tracked x- and y-coordinates into these sections of the field, we can get a hit probability matrix for each section. Here, I standardize the EV and LA data, but included a few references points so you can see an estimate of MPH and LA at any point. Again, these models need some more cleaning, but the overall picture is easy enough to see.

First, here is the matrix for eHSA of -45 to -30 degrees, which is the section closest to the left field line.

St. Louis Cardinals
St. Louis Cardinals /

Here, we see a similar “Nike swoosh” shape, like in the total matrix above. It’s a little less prominent, which may be a product of having less data (~9,000 tracked balls compared to ~110,000). It also might be because of the shorter depth in left field, leading to more home runs (dark red area in the upper right). It could also be due to worse defenders in LF or a whole host of other factors.

Next, the matrix for eHSA of -15 to 0 degrees, which is the left half of center field when looking from home plate.

St. Louis Cardinals
St. Louis Cardinals /

As you can see, the ‘home run’ area shifts right. This makes sense: to get the ball over the 400 foot fence in center field, you have to hit it harder and at more precise launch angles than you do to LF. Overall, a batted ball to this part of the field is less likely to be a hit than one hit down the line. Again, that should be fairly intuitive, though the exact differences could and likely do arise from many factors.

These differences show a much more prominent donut hole in center field. In other words, there are more EV and LA combinations hit to CF that lead to outs than there are combination hit to LF.

More from St Louis Cardinals News

This explains why fly ball hitters fare better the more they pull their fly balls. It’s likely why St. Louis Cardinals like Jedd Gyorko or other MLB players like Brian Dozier can overperform an expected home run total that is based only on EV and LA, without including eHSA. It’s probably part of the reason Brandon Belt hit fly balls at the 9th highest rate in the league, but still had fewer than 20 homers in 2016.

During this season, you’re going to see a lot of Statcast hit probability highlights. Those hit probabilities, as of now, do not include a measure for horizontal spray angle. So when you see a ball hit to center field, and Statcast says that batted ball at that EV and LA is a hit 60% of the time, remember the importance of spray angle. Maybe that ball is a hit 80% of the time when hit to left field, and 40% of the time it’s hit to center.

Next: Spring Training, So Far

Thanks for reading! Hopefully, I’ll have time soon to further refine the model and start applying it to players and our St. Louis Cardinals!