Mets All-Time Top Catcher

The Mets have had a lot of players behind the plate, “the game’s most demanding position,” according to Jesse Yomtov, starting with Hobie Landrith who, on April 11, 1962, caught the first pitch thrown by a Mets’ starter (Roger Craig).

Five catchers have stood out.

To choose them, five statistics were primarily used: WAR, WPA, RE24, Total Bases, and Times on Base (excluding by error) with WAR and WPA the two dominant ones in that order. In addition, their selection was based solely on their time with the Mets, not on their overall career, as a player could have played for multiple teams

Among the Mets top five catchers, two are in the Hall of Fame: Mike Piazzaand Gary Carter. Piazza played eight seasons for the Mets after playing seven on the Dodgers, Carter five after playing 11 for Montreal. Filling out the list are Jerry Grote, who played 12 seasons in the Big Apple, John Stearns, who played 10, and Todd Hundley, who played nine.

Sources: Stathead Baseball and Baseball Reference

Grote came closest to Piazza in Times on Base, only 91 apart; however, as a Met, Grote played four more seasons than Piazza who averaged getting on base 183.6 times a season versus 114.8 for Grote.

Based only on their Mets WAR number, the top two are Piazza and Stearns; however, when WPA and RE24 are taken into account, the difference between the two becomes quite significant. And Piazza separates himself even more from the others in Total Bases, having 607 more than the second-most — Grote’s 1278. But then, in his Mets career, Piazza amassed a .542 SLG. No one else in the group came within 100 points of that number.

  • Piazza had the third-highest JAWS rating among all catchers.

Twitter Poll

I found the tweet below after I completed the above write-up and was not surprised by Piazza’s landslide victory. He was one of the Mets most popular players.

Another stat, TOB/TB, helps lengthen Piazza’s lead over the rest of the field. Written about in 2016 by Rob Mains, the TOB/TB Number is calculated using this formula:

  • Multiply Times on Base by Total Bases.
  • Double it.
  • Divide the result by the sum of Times on Base and Total Bases.

Piazza’s TOTtb number of 1,651 was 325 points ahead of Grote’s with the average for the top five catchers 1,170.

Others’ Views

Tim Boyle, in his catcher comparison, made this comment about Mike Piazza:

“Piazza didn’t have a reputation for playing well defensively. As the years went on, he got worse. I’m not so sure anyone holds this against him. Piazza was far too amazing at the plate for anyone to criticize him for his weaknesses behind it.”

In contrast, Jennifer Khedaroo viewed Piazza’s defensive skill differently, writing

“In terms of defense, Piazza played well year after year. He was consistently in the top five for putouts, assists, double plays turned and runners caught stealing.”

And though Harold Friend agreed that Piazza was a better hitter than Gary Carter, he still pushed Piazza into second place among the best Mets catchers, Carter’s defensive skill giving him the edge:

“Gary Carter was the most valuable Mets catcher. Piazza will always be rated as the greater player, but Carter was more valuable to the Mets. Gary Carter was (and is) a world champion.

Piazza was the greatest hitting catcher ever. Although he was a good defensive player his first few seasons with the Los Angeles Dodgers, he was a defensive liability during his tenure with the Mets.”

Overall, Friend wrote, “Carter provided great defense, handled an excellent pitching staff magnificently and was a timely clutch hitter.”

In response to Friend, in my opinion the best measure of clutch hitting is WPA. For that stat, Piazza’s score was more than 10 times higher than Carter’s.

With regard to Piazza’s ability behind the plate, in an nj.com article, its author, Brendan Kuty, wrote that Hall of Famer Tom Glavine “said Piazza’s reputation as a bad defensive catcher is undeserved.”

“He did a lot of things well behind the plate,” Glavine said. “Yeah, he wasn’t the greatest thrower. That unfortunately translated into people thinking that some of this other game wasn’t as good as it was. He called a good game. He received the ball fine. He blocked balls fine.

But so often catchers are defined defensively on how well they throw and there’s much more that goes into just being a good defensive catcher than being able to throw. That aspect of his game, for whatever reason, garnered the extra attention and overshadowed the other aspects of his game.” (from Kuty article)

New Statcast leaderboard hits a grand slam

The latest feature added to Baseball Savant focuses on one of baseball’s most exciting plays: the home run. However, its creator, Daren Willman, tweeted, “Not all home runs are created equal.”

The leaderboard’s startup screen shows all those batters in 2020 who hit at least one long ball that would have been a home run in at least one of Major League Baseball’s 30 ballparks.

On August 9, before any of the day’s games have been played, Yankees slugger Aaron Judge is Major League Baseball’s home run leader with eight. In the Home Runs Leaderboard, if you click anywhere on a player’s row except on his name, details on all his homers in the season you choose will appear, each homer listed on a separate row.

Click on Judge’s row. Below his name should beS a table showing those ballparks where each long ball that Judge hit on the given date will be a homer. For example, on August 8 in Tampa Bay, the first long ball that Judge hit (against Sean Gilmartin) would have been a four-bagger in every ballpark, but the second long ball he hit (against Nick Anderson) would have been a homer in only 18 parks — video.

Therefore, for a long ball to qualify for (be included in) the Home Runs Leaderboard it must have been able to be a home run in at least one MLB stadium even if it was not a homer in the ballpark in which it was hit. Those batted balls are labelled as “Doubters,” “Mostly Gone,” or “No Doubters.”

  • If a batted ball would be a homer in fewer than 8 ballparks, it is a “doubter.”
  • If it would be a homer in 8 to 29 parks, it is “mostly gone.”
  • If would be a home run in every stadium, it is a “no doubter.”

That is why if you sum those three columns (“Doubters,” “Mostly Gone,” “No Doubters”) the total could be less than what is in the “Actual HR” column, which is the total number of homers the player hit, as occurs with Fernando Tatis Jr.’s numbers. He had six actual homers, but one “doubter,” three “mostly gone,” and six “no doubters.”

Finally, home run data is available for batters, pitchers, and teams for both 2019 and 2020.

Here is a sample of the kinds of questions that Savant’s Home Runs Leaderboard can answer.

Which player’s has the most “could-be” homers that could only be a home run in one stadium?

Which Mets’ player has hit the most actual and “almost” homers so far in 2020? Notice that one of Davis’ “homers” was a non-homer. I label that one a “Could Be” homer.

Who has hit the most “no doubt” home runs this season?

In 2020, which pitcher have given up the most “no doubters?”

The Home Runs Leaderboard is a great resource with eye-catching visuals for statistically-minded baseball fans. One thing that could make it even better is if you could get team data by both division and league. For example, now if I select “Mets” and “Pitchers,” I only get the results for the qualifying Mets pitchers.

Statcast Detective: Throw ’em high

In my first post on high-ball pitchers, I presented the top 10 high-ballers in 2019. On the bottom of the list based on the number of high-balls thrown was Astros starter Jordan Lyles, but his high-ball pitch percent of 28.4% pushed him into the two-spot. Given that the League average in 2019 was 16.8%, Lyles was more than 50% above average.

The strike zone has changed over the years. According to the OFFICIAL BASEBALL RULES, 2019 Edition,

The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoul- ders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter’s stance as the batter is prepared to swing at a pitched ball.

To appreciate what a high pitch is you need to know the strike zone’s dimensions.

Mike Fast wrote that there are two ways to dimension the strike zone.

One is to use fixed heights for the top and bottom boundaries of the zone for all batters, regardless of the height or stance of the batter.  The most commonly used fixed heights are 1.5 feet for the bottom of the zone and 3.5 feet for the top of the zone.

As the second way involves the statistical technique of normalization, it will not be covered in this post. Those mathematically inclined can find an explanation here.

As Fast’s article was published in 2011, I checked the strike zone a second way. Statcast’s “Plate Z” setting shows a pitch’s height. In 2019, the average height of a pitch in the high-ball zone was 3.65 feet. In all zones, the average height was 2.25 feet. Since 2015, the average heights have been 2.25, 2.26, 2.41, 2.25, and 2.25, so they have been almost identical in four of the past five years.

This is a HIGH pitch.

And some batters can clobber pitches even when they are high.

Lindor’s homer, his 18th, was hit off a pitch 3.53 feet high and was the only homer he hit that season off a pitch at least three above the plate. In 2018, the average pitch height of his homers was 2.16 feet.

To be continued

Exploring Baseball with R #1

Every year since 1956 at least one pitcher has won the Cy Young Award starting with the Brooklyn Dodgers’ Don Newcombe. Many questions can be asked. How many pitchers won the award more than once? Which pitchers achieved that feat? Who won it the most times?

In this post, I focus on just one question: How many players won the Cy Young Award? I will share how I got the answer using the R programming language. I will also be using both RStudio and Sean Lahman’s Baseball Database, an excellent resource. A basic familiarity with both R, dplyr, and RStudio is assumed. (Note: As you progress through this post, have RStudio open.)

Here is an introduction to the database.

The database contains multiple files in table format. The table containing the player awards data is AwardsPlayers.RData. It has six variables:

playerID       Player ID code
awardID        Name of award won
yearID         Year
lgID           League
tie            Award was a tie (Y or N)
notes          Notes about the award

What AwardsPlayers.RData does not have are the players’ names though it has each Player’s ID. Their names are in People.RData. The People table contains 24 variables. In the partial display of its variables, notice that it too contains playerID.

playerID       A unique code asssigned to each player.
birthYear      Year player was born
birthMonth     Month player was born
birthDay       Day player was born
birthCountry   Country where player was born
birthState     State where player was born
birthCity      City where player was born
nameFirst      Player's first name
nameLast       Player's last name
weight         Player's weight in pounds
height         Player's height in inches
bats           Player's batting hand (left, right, or both)        
throws         Player's throwing hand (left or right)

Fortunately, the playerID field links together the data in the tables.

For this tutorial, you need to download from Lahman’s database the “2019 – R Package“. When the webpage appears, click “data.”

You will see a list of downloadable files. The files are in RData format, a format created for use in R. For this tutorial, download these two files: AwardsPlayers.RData and People.RData. The latter is not shown in the image below, which contains a partial file list.

After you click AwardsPlayers.RData, what is shown below will appear. Click View Raw. The file will download to your device. (Note: The way I show you to do something in this tutorial is often not the only way to do it.)

Click the downloaded file, which in this case is AwardsPlayers.RData. On my Mac, it downloaded into the Downloads folder.

When this appears, click Yes.

This should appear in your RStudio Console:

load(“/Users/Home/Downloads/AwardsPlayers.RData”

Here is how the AwardsPlayers.RData looks in RStudio’s Global Environment. It is now available for you to work on.

Now, in RStudio the two downloaded tables are R data frames. Next, I created an R Markdown file and made copies of both data frames.

AP <- AwardsPlayers
P <- People

Merge AP and P into a new data frame: AP_P. The column common to both, playerID, serves as the link.

AP_P <- merge(AP, P, by="playerID")

View the merged data frame’s variables.

glimpse(AP_P)

To view the data, while in the Console type

View(AP_P).

Activate the Tidyverse library, reduce the number of columns, and display the last 10 rows. Note: If you have not used it before, you may need to install it using the R code on the next line.

install.packages("tidyverse")
library(tidyverse)
AP_P %>% select(nameFirst, nameLast, playerID, yearID, awardID) %>% tail(10)

Select five columns in AP_P, and display the last 10 rows in AP_P.

  nameFirst<chr>nameLast<chr>playerID<chr>yearID<int>awardID<chr>
6227RyanZimmermanzimmery012010Silver Slugger
6228RyanZimmermanzimmery012011Lou Gehrig Memorial Award
6229RyanZimmermanzimmery012009Silver Slugger
6230RichieZiskziskri011981TSN All-Star
6231RichieZiskziskri011974TSN All-Star
6232BarryZitozitoba012012Hutch Award
6233BarryZitozitoba012002Cy Young Award
6234BarryZitozitoba012002TSN Pitcher of the Year
6235BarryZitozitoba012002TSN All-Star
6236BenZobristzobribe012016World Series MVP

The next step is to combine the nameFirst and nameLast columns in a new column, fullname. The paste function automatically inserts a space between the names.

AP_P$fullname <- paste(AP_P$nameFirst, AP_P$nameLast)

The five columns to be displayed are selected and the last 15 observations in the AP_P data frame are displayed.

AP_P %>% select(playerID, fullname, awardID, yearID, lgID) %>% tail(15)
zimmery01Ryan ZimmermanLou Gehrig Memorial Award2011ML
zimmery01Ryan ZimmermanSilver Slugger2009NL
ziskri01Richie ZiskTSN All-Star1981AL
ziskri01Richie ZiskTSN All-Star1974NL
zitoba01Barry ZitoHutch Award2012ML
zitoba01Barry ZitoCy Young Award2002AL
zitoba01Barry ZitoTSN Pitcher of the Year2002AL
zitoba01Barry ZitoTSN All-Star2002AL
zobribe01Ben ZobristWorld Series MVP2016ML
In this table only the last eight observations are shown.

Go into the View window and set the settings you see below in the first row. Notice that the last year for the Cy Young Award is 2017, thus two years are missing.

Partial screenshot of View window

Add the missing data to the AP dataset.

AP <- add_row(AP, playerID = "degroja01", awardID = "Cy Young Award", yearID = 2018, lgID = "NL", tie = "NA", notes = "P")
AP <- add_row(AP, playerID = "snellwa01", awardID = "Cy Young Award", yearID = 2018, lgID = "AL", tie = "NA", notes = "P")
AP <- add_row(AP, playerID = "degroja01", awardID = "Cy Young Award", yearID = 2019, lgID = "NL", tie = "NA", notes = "P")
AP <- add_row(AP, playerID = "verlaju01", awardID = "Cy Young Award", yearID = 2019, lgID = "AL", tie = "NA", notes = "P")
AP <- add_row(AP, playerID = "verlaju01", awardID = "TSN All-Star", yearID = 2019, lgID = "AL", tie = "NA", notes = "P")
AP <- add_row(AP, playerID = "snellwa01", awardID = "TSN All-Star", yearID = 2018, lgID = "AL", tie = "NA", notes = "P")

Exercise: Update the AP_P data frame with the new data added to the AP dataset.

How many players won the Cy Young Award? After piping what is in the AP_P data frame to the select function, we filter it to limit the observations just to those players who won the Cy Young Award. That result is then sorted (in ascending order) and the number of observations in the awardID column are counted.

AP_P %>% select(playerID, fullname, yearID, lgID, awardID) %>% filter(awardID == "Cy Young Award") %>% arrange(yearID) %>% count(awardID)

Through 2019, 118 players have won the Cy Young.

If you find any errors in this post, please let me know. If you have any technical questions about R, please ask them on stackoverflow.com.