The Z Files: Not What You Think It Means

The Z Files: Not What You Think It Means

This article is part of our The Z Files series.

Words mean things. My favorite words are those that connote very specific meanings. I don't get paid by the word so when I'm able to make a cogent point with a paucity of words, I'm happy.

"Regression" used to be one of my favorite words. When sabermetric analysis was initially introduced into the fantasy realm, regression had a meaning rooted in statistics – movement toward the mean. When an analyst said a player would regress, this specifically meant an element out of the player's control would move toward the mean of said element. Regression wasn't a necessarily bad thing, as the movement could be in either direction.

The problem is regression has another more general meaning – a return to a former or less developed state. The connotation here is a bad thing. Over time, instead of the pointed statistical meaning, regress became synonymous with "play worse."

For a scientist turned fantasy analyst, this was frustrating. When I used the term, the intent was to contend something out of the player's control would regress to the mean. Unfortunately, others heard "play worse" and rightfully would want to know "How much worse?"

It took a while, but eventually I realized I needed to use more words to get my explicit point across. Regress wasn't relaying the proper message.

That is, until recently. About a week ago, I had the pleasure of chatting with friend and colleague Ron Shandler. We were at an event earlier that day where the word regress was

Words mean things. My favorite words are those that connote very specific meanings. I don't get paid by the word so when I'm able to make a cogent point with a paucity of words, I'm happy.

"Regression" used to be one of my favorite words. When sabermetric analysis was initially introduced into the fantasy realm, regression had a meaning rooted in statistics – movement toward the mean. When an analyst said a player would regress, this specifically meant an element out of the player's control would move toward the mean of said element. Regression wasn't a necessarily bad thing, as the movement could be in either direction.

The problem is regression has another more general meaning – a return to a former or less developed state. The connotation here is a bad thing. Over time, instead of the pointed statistical meaning, regress became synonymous with "play worse."

For a scientist turned fantasy analyst, this was frustrating. When I used the term, the intent was to contend something out of the player's control would regress to the mean. Unfortunately, others heard "play worse" and rightfully would want to know "How much worse?"

It took a while, but eventually I realized I needed to use more words to get my explicit point across. Regress wasn't relaying the proper message.

That is, until recently. About a week ago, I had the pleasure of chatting with friend and colleague Ron Shandler. We were at an event earlier that day where the word regress was used frequently and I mentioned how I cringed a little bit at every utterance. Ron recalled I had written about this in the past and said the word I am looking for is "normalize."

Have you ever had the feeling of giddy elation that a lingering nuisance is finally solved simultaneously with extreme anger and embarrassment that you didn't think of the solution yourself?

Yeah, me neither.

OK, that's exactly what happened. The two emotions cancelled each other out and I stood there motionless, my mouth agape, unable to form syllables. Eventually I was able to muster a head nod followed by "Yeah, right, why didn't I think of that?"
CTRL F – regress; replace – normalize.

OK, cool, I can go back to using fewer words to make my points. This is especially advantageous since suggesting a player was a candidate to normalize entails a whole lot more than in the past.

In large part, Voros McCracken's DIPS theory bridged sabermetrics and fantasy baseball analysis, DIPS being an acronym for Defense Independent Pitching Statistics. In short, McCracken discovered that the batting average on balls in play (BABIP) for all pitchers clustered around .300 (now about .294). It didn't matter if it was Pedro Martinez or Tippy Martinez, over the course of a player's career, their BABIP would be about .300. So for years, when a pitcher had a BABIP higher or lower than that in a particular season, the party line was it was due to regress the following year. Um, sorry…normalize.

As data collection and accessibility improved, this simplistic analysis improved. The first major upgrade was breaking batted balls into fly balls, grounders and line drives with the subsequent elucidation that the BABIP on balls in play of fly balls was considerably lower than that of groundballs. As such, a fly ball pitcher generally sported a BABIP lower than that of a groundball specialist. This set a different mean for the different type of pitchers. That is, fly-ball pitchers normalized to a mark below .300 while ground-ball hurlers settled to a higher BABIP.

Jumping ahead to the present, not only are batted balls deemed well hit, medium hit and softly hit, the classifications have been refined to include bunts, infield grounders and infield pop-ups. Each of these subsets has a global BABIP. Ongoing research is looking at the control a pitcher has with respect to hard and weak contact. As expected, the numbers confirm the BABIP of hard hit balls is greatest, regardless of the classification. If it can be shown that a pitcher can limit hard contact, that can be deemed a skill and can be factored into how we normalize his BABIP. The sample of available data to be studied is relatively small, not to mention has a lot of subjectivity associated with it. However, there's growing evidence inducing weak contact (or minimizing hard contact) is a skill, explaining some career BABIP marks that were previously assumed to be lucky or unlucky.

For what it's worth, the next major breakthrough is already underway: eliminating the subjective classifications of batted balls and instead measuring electronically (velocity of the bat, launch angles, etc.). Results will be fine-tuned even more. Of course, it also means it will be that much longer before there's ample data to yield statistically significant results.

Thus far, the BABIP discussion has focused on pitchers. As most of you know, hitters do not cluster around the same mean; each establishes his own baseline. Early research in this area suggested pitchers exhibit more control over fly balls and grounders while hitters controlled whether or not they hit a line drive. The rudimentary conclusion was a hitter's baseline BABIP was in large part proportional to their line drive rate. As more data was collected, it became evident more was involved, like a hitter's speed and power. Now, the new electronic data is also being used to get a better idea of a hitter's skills. As an example, early indications are velocity off the bat is a repeatable skill.

BABIP isn't the only metric that can normalize. Home runs per fly ball (HR/FB) has an element out of the player's control that can lead to good luck or misfortune. It's necessary to first correct for the park effect but like BABIP, pitchers tend to nestle around the same mean, presently about 11 percent HR/FB. Again, hitters establish their own mean. When looking at a batter's home run totals from year to year, it's necessary to look at the components: HR/FB and percent of fly balls hit. A drop or gain in power may not be due to HR/FB but lofting more or fewer batted balls.

A pitcher's left-on-base percentage (LOB%), also called strand rate, is a candidate to normalize. I prefer to call it LOB% since back in the day, strand rate was a proprietary term coined by the aforementioned Ron Shandler, though it's become mainstream. The formulas are very similar, both capturing the same principle. A batter getting a base hit is a random event. However, when runs are scored, hits generally come in clusters. In other words, how many hits a pitcher allows is a skill. When they're allowed involves some happenstance. If over the course of a season, the timing of hits is such that a disproportionate number result in runs scoring, a pitcher's ERA is artificially inflated by a low LOB%. Conversely, sometimes a pitcher happens to allow an unusual number of hits with no ducks on the pond, meaning he surrenders fewer with men on base. Here, fewer runners come around to score so the ERA is artificially deflated via a high LOB%. Please note that LOB% and the raw number of baserunners are related but distinct entities that need to be looked at in tandem when discerning why the actual ERA differs from that which is expected based solely on skills (like FIP, xFIP, SIERA, etc.).

The average pitcher will sport a LOB% around 72 percent. The elite can sustain a mark closer to 78 percent, in large part a result of a high strikeout rate limiting balls in play that can lead to tallies. A pitcher's LOB% should be expected to normalize, which helps identify pitchers with an impending ERA correction.

One of the more telling newfangled stats for pitching is swinging-strike rate and how that portends to strikeouts, as there's a direct correlation between the two. Each season there are hurlers whose whiffs aren't in line with the number of swinging strikes induced. When evaluating these pitchers, it's best to base your strikeout expectation on their expected swinging strike rate and not the actual number of punch outs.

Circling back to the notion that hits are mostly random events, a hitter can enjoy some luck when it comes to getting hits with runners in scoring position. Research has largely shown clutch hitting to be a myth. A hitter's average with runners in scoring position should be pretty close to that with men on base. The league average with men on base is generally .005 to .010 points higher than with the bases empty because a pitcher's peripherals decline working from the stretch. If a hitter's split is significantly more or less than that, he's been lucky or unlucky with respect to RBI. This split is not a repeatable skill, average with runners in scoring position will normalize and thus needs to accounted for when estimating the player's future RBI expectation.

Next week we'll segue from abstract into practical by featuring players exhibiting numbers from last year that should normalize. This will help identify players the market may mistakenly rank, availing buying opportunities as well as noting those to avoid.

Want to Read More?
Subscribe to RotoWire to see the full article.

We reserve some of our best content for our paid subscribers. Plus, if you choose to subscribe you can discuss this article with the author and the rest of the RotoWire community.

Get Instant Access To This Article Get Access To This Article
RotoWire Community
Join Our Subscriber-Only MLB Chat
Chat with our writers and other RotoWire MLB fans for all the pre-game info and in-game banter.
Join The Discussion
ABOUT THE AUTHOR
Todd Zola
Todd has been writing about fantasy baseball since 1997. He won NL Tout Wars and Mixed LABR in 2016 as well as a multi-time league winner in the National Fantasy Baseball Championship. Todd is now setting his sights even higher: The Rotowire Staff League. Lord Zola, as he's known in the industry, won the 2013 FSWA Fantasy Baseball Article of the Year award and was named the 2017 FSWA Fantasy Baseball Writer of the Year. Todd is a five-time FSWA awards finalist.
MLB DFS: FanDuel Plays and Strategy for Monday, May 6
MLB DFS: FanDuel Plays and Strategy for Monday, May 6
Texas Rangers at Oakland A's & More MLB Expert Picks and Predictions for May 6
Texas Rangers at Oakland A's & More MLB Expert Picks and Predictions for May 6
Miami Marlins vs. Los Angeles Dodgers & More MLB Expert Picks for May 6
Miami Marlins vs. Los Angeles Dodgers & More MLB Expert Picks for May 6
Fantasy Baseball Injury Report: Trout Out for Foreseeable Future
Fantasy Baseball Injury Report: Trout Out for Foreseeable Future