Unpuzzling R: Consecutive Years

When doing a statistical analysis involving baseball, I needed to find out for how many consecutive years a player has played for a team. In this article, I reveal one way of doing that using R. One of the R programming language’s amazing capabilities is how much you can accomplish with just a small amount of code1.

In the diagram below, data is shown for three players. The stint (number of years) shown in the year column is consecutive for only Player 3. Player 1 did not play in the season after his second year, and Player 2 skipped a season after his third year.

Input

Below is how the output looks. Player 1 skipped a year after his 1963 season, which is why in the yrDiff column there is a 2. Player 2 played continuously from his first thru third years, so a “1” is in the yrDiff column for each of those years, but did not play in 1969, thus there is a two-year gap between 1968 and 1970. Player 3 played continuously during his two years with the team.

Output

player_data %>% 
  mutate(yrDiff=ifelse (is.na( year - lag(year)),1, year - lag( year )))

In this code, dplyr is used. The mutate function will create a new variable named yrdiff. To create the value for yrdiff, it seeks both the first value in year (1962) and the previous year’s value — seeking that using lag; however, as 1962 is the first data item in the column, nothing precedes it so nothing can be subtracted from 1962. Therefore, the is.na check, which asks, “Is a previous year Not Available?”, returns TRUE. When the is.na result is true, yrDiff displays 1; whereas, when it is false, which means lag(year) found a number, yrDiff displays the year – lag(year) result.

To represent the input in R, you need the code below.

Input Code

player_data <- data.frame(player = c(1,1,1,2,2,2,2,3,3), year = c(1962,1963,1965,1966,1967,1968,1970,1971,1972))

Let’s look at a real-world example. I recently investigated several baseball-related questions that shows the power of R. I obtained the data from stathead.com, formatted it in Apple Numbers, and then imported it into RStudio. The dataset contained 657 observations.

Among the results I obtained was how many games each pitcher started. This R code accomplished that:

allStarters |>
  group_by(Player) |>
  summarize(SumSt = sum(GS)) |>
  arrange(desc(SumSt))

Tom Seaver started 395 games, followed by Jerry Koosman with 346 starts and Dwight Gooden with 303. No other Mets’ pitcher had 300-plus starts.

To learn how many starts each pitcher had, I grouped each one’s data.

allStarters |>
   group_by(Player)

Two hundred ninety two pitchers were grouped by season with the years they started games arranged in ascending order. The diagram below contains a sample of part of one output display.

Next, I included the previously discussed mutate code to determine for each pitcher which years were consecutive.

allStarters |>
   arrange(Player, Year) |>
   group_by((Player)) |>
   mutate(yrDiff=ifelse(is.na(Year - lag(Year)),1,Year - lag(Year))) |>
   relocate(yrDiff, .after = Year)

Here is a sample of that code’s output:

In the first yrDiff column for Al Jackson, the “1” means that he started games in 1965 and that 1965 was either the first season he started for the Mets or that he also started games in 1964; whereas, the “3” in the yrDiff column for 1968 means that it had been three seasons since he last started a Mets game.

R is a great tool for those interested in doing the statistical analysis of baseball data. To use R effectively, there is a lot to learn; however, I have found the payoff to be well worth the effort expended to get it.


1 It is assumed that you have had some interaction with R or another programming language.

Mets Best Starters by Decade

To win a baseball game, a team needs to outscore its opponents. To do that, it needs to prevent the other team from scoring as many runs as it does. The leader of the prevention part is the pitcher.

No batter leads the offense the same way that a pitcher leads the defense. He — and the catcher — are involved in the most plays in a game, but the pitcher plays a bigger role because what he does initiates the majority of a game’s plays.

A measure of a pitcher’s success in limiting other teams’ run scoring is the RE24 stat. An RE24 of zero means the player is average. On some websites, the higher a pitcher’s RE24, AKA run value, the better the pitcher performed, so a value of +24 would be much better than -24.

Sites that express it that way are Baseball Reference, FanGraphs, and Stathead with Baseball Reference now calling the RE24 for pitchers “Base-Out Runs Saved“; whereas, on other sites, such as Baseball Savant, it is the opposite: the lower a pitcher’s run value, the better. A value of -24 would be much better than +24.

Further, the complexity of the RE24 calculation has increased substantially since its early days when it was based on just base/out states and outs. For example, today on Baseball Savant, there is a Pitch Arsenal Stats Leaderboard giving a pitcher’s run value based on pitch type (e.g., changeup) “and on the runners on base, out, [and] ball and strike count,” and a Swing & Take Leaderboard giving for a pitcher a run value based on a pitch’s “outcome (ball, strike, home run, etc).”

In the chart below, the Mets top two starters in each decade based upon their RE24 totals (base-out state) in that decade are shown. The decade leaders are Tom Seaver (twice), Dwight Gooden, Rick Reed, Al Leiter, and (so far in this decade) Jacob deGrom (twice). Those five would make a starting rotation that few Mets fans would complain about.

The second-place finishers include Jerry Koosman, Jon Matlack, Sid Fernandez, Bret Saberhagen, Johan Santana, R. A. Dickey, and Marcus Stroman. Further, Matlack had a higher RE24 than did the first-place finisher in two other full decades: the 1990s and 2000s. Even the second-place finishers would make a strong starting rotation.

One pitcher yet to throw a pitch for the Mets, but who is now a member of the team, Max Scherzer, has in his 14 years in Major League Baseball accumulated an RE24 of 318.5. In that timespan, only two other pitchers have accumulated a higher RE24: Justin Verlander is at 327.22, and Clayton Kershaw is at 431.64.

And in the decade from 2010 to 2019, Scherzer remains in third place with Jacob deGrom in eighth and Carlos Carrasco 33rd.

Mets All-Time Top Catcher

The Mets have had a lot of players behind the plate, “the game’s most demanding position,” according to Jesse Yomtov, starting with Hobie Landrith who, on April 11, 1962, caught the first pitch thrown by a Mets’ starter (Roger Craig).

Five catchers have stood out.

To choose them, five statistics were primarily used: WAR, WPA, RE24, Total Bases, and Times on Base (excluding by error) with WAR and WPA the two dominant ones in that order. In addition, their selection was based solely on their time with the Mets, not on their overall career, as a player could have played for multiple teams

Among the Mets top five catchers, two are in the Hall of Fame: Mike Piazzaand Gary Carter. Piazza played eight seasons for the Mets after playing seven on the Dodgers, Carter five after playing 11 for Montreal. Filling out the list are Jerry Grote, who played 12 seasons in the Big Apple, John Stearns, who played 10, and Todd Hundley, who played nine.

Sources: Stathead Baseball and Baseball Reference

Grote came closest to Piazza in Times on Base, only 91 apart; however, as a Met, Grote played four more seasons than Piazza who averaged getting on base 183.6 times a season versus 114.8 for Grote.

Based only on their Mets WAR number, the top two are Piazza and Stearns; however, when WPA and RE24 are taken into account, the difference between the two becomes quite significant. And Piazza separates himself even more from the others in Total Bases, having 607 more than the second-most — Grote’s 1278. But then, in his Mets career, Piazza amassed a .542 SLG. No one else in the group came within 100 points of that number.

  • Piazza had the third-highest JAWS rating among all catchers.

Twitter Poll

I found the tweet below after I completed the above write-up and was not surprised by Piazza’s landslide victory. He was one of the Mets most popular players.

Another stat, TOB/TB, helps lengthen Piazza’s lead over the rest of the field. Written about in 2016 by Rob Mains, the TOB/TB Number is calculated using this formula:

  • Multiply Times on Base by Total Bases.
  • Double it.
  • Divide the result by the sum of Times on Base and Total Bases.

Piazza’s TOTtb number of 1,651 was 325 points ahead of Grote’s with the average for the top five catchers 1,170.

Others’ Views

Tim Boyle, in his catcher comparison, made this comment about Mike Piazza:

“Piazza didn’t have a reputation for playing well defensively. As the years went on, he got worse. I’m not so sure anyone holds this against him. Piazza was far too amazing at the plate for anyone to criticize him for his weaknesses behind it.”

In contrast, Jennifer Khedaroo viewed Piazza’s defensive skill differently, writing

“In terms of defense, Piazza played well year after year. He was consistently in the top five for putouts, assists, double plays turned and runners caught stealing.”

And though Harold Friend agreed that Piazza was a better hitter than Gary Carter, he still pushed Piazza into second place among the best Mets catchers, Carter’s defensive skill giving him the edge:

“Gary Carter was the most valuable Mets catcher. Piazza will always be rated as the greater player, but Carter was more valuable to the Mets. Gary Carter was (and is) a world champion.

Piazza was the greatest hitting catcher ever. Although he was a good defensive player his first few seasons with the Los Angeles Dodgers, he was a defensive liability during his tenure with the Mets.”

Overall, Friend wrote, “Carter provided great defense, handled an excellent pitching staff magnificently and was a timely clutch hitter.”

In response to Friend, in my opinion the best measure of clutch hitting is WPA. For that stat, Piazza’s score was more than 10 times higher than Carter’s.

With regard to Piazza’s ability behind the plate, in an nj.com article, its author, Brendan Kuty, wrote that Hall of Famer Tom Glavine “said Piazza’s reputation as a bad defensive catcher is undeserved.”

“He did a lot of things well behind the plate,” Glavine said. “Yeah, he wasn’t the greatest thrower. That unfortunately translated into people thinking that some of this other game wasn’t as good as it was. He called a good game. He received the ball fine. He blocked balls fine.

But so often catchers are defined defensively on how well they throw and there’s much more that goes into just being a good defensive catcher than being able to throw. That aspect of his game, for whatever reason, garnered the extra attention and overshadowed the other aspects of his game.” (from Kuty article)

Writing About Sports: Stengel At Bat

During Casey Stengel’s 14-year playing career, he stepped into the batters box 4,871 times; however, six at bats stood out.

First At Bat: Sept. 17, 1912

On September 17, 1912, Casey Stengel made his major league debut at Washington Park, a ballpark located in Park Slope, a Brooklyn neighborhood. According to Barry Petchesky, “From 1898 to 1912, Washington Park was the home of the team alternately nicknamed the Bridegrooms, Superbas and Trolley Dodgers,” though in the game’s boxscore, the team is called the “Brooklyn Dodgers.”

Washington Park — National Baseball Hall of Fame

Petchesky wrote, “The field did not seem to be beloved in its time. The nearby canal gave off a constant stench, and as a late-season call-up, Casey Stengel, once remembered, “the mosquitoes was something fierce.”

Batting second and playing centerfield, Stengel singled in his first at bat. In the game, he got four hits in four at bats, all singles, walked once, drove in two runs, stole two bases, and had a putout. He finished his first season with a .316 BA, .466 OBP, and .852 OPS.

The Sporting News exclaimed: “Charlie Stengel has come into the league with a tremendous crash, and appears to be the real thing.”

Stengel played the most years with the Brooklyn Dodgers (6) and the New York Giants (3); however, at the plate he was most successful with the Giants, averaging .349/.413/.524.

On the basepath, he showed both smarts and speed. In both 1913 and 1914 with the Brooklyn Dodgers, he stole 19 bases in an era when base theft was commonplace. In 1913, the Dodgers had 188 stolen bases and in 1914, 173. In both seasons, seven Dodgers stole at least 10.

Second At Bat: April 5, 1913

Three weeks after hitting the first home run at Ebbets Field in the new ballpark’s first game—an exhibition against the Yankees, Stengel hit the first four-bagger in a regular season game at the Flatbush ballpark. The Brooklyn Superbas, as the Dodgers were then known, defeated the Giants, 5–3.

Photo from Library of Congress

Third & Fourth At Bats: May 1, 1913

Stengel is among a select group of players who hit two inside-the-park homers in the same game, notching that feat on May 1, 1913 while batting leadoff for the Brooklyn Superbas. In a New York Times writeup of the game, the story’s headline declared “STENGEL’S HITTING LANDS CLOSE GAME,” its author, unidentified, referring to him as “Charley Stengel.”

The first blast reached the “centre field wall.” No mention is made of it being a close play unlike his second four-bagger to “deep left centre.” It “just grazed the tips of glove of Outfielder Manns’ as it hurried along to the outfield wall But because of Boston’s defensive effort, “Stengel barely made the circuit.”

Both homers were off Otto Hess.

The last time a major leaguer hit two inside-the-park homers in a game was 1986. Greg Gagne hit two and just missed a third. On his last try, he had to settle for a triple.

Fifth At Bat: May 7, 1923

The fifth at bat showed that Stengel could not only hit with a bat but also with his fists.

On May 7, 1923, the New York Giants played in Philadelphia, the team for whom Stengel played in 1920 and part of 1921 before being traded to the Phillies. On the mound for the Phillies was Lefty Weinert, who had been with the team since 1919.

In Stengel’s first at bat against Weinert, instead of using the bat to hit the ball he used it as a weapon. According to a New York Times news report published on May 8, this transpired:

The fight in the fourth was precipitated by a belief on Stengel’s part that Weinert had tried to “bean” him. Thoroughly aroused, Casey threw his bat in Weinert’s direction and then rushed out to the box. In an instant the two players were swinging at each other, while other players of both teams gathered around them and policemen poured out of the stands.

Robert Creamer, in his book, Stengel: His Life and Times provided more details about the fight.

“The Giants scored six runs in the first inning (Stengel drove in one of them with a single) and knocked out the starting pitcher. A left-hander named Phil Weinert came in to pitch for the Phils. Weinert was big and fast and wild. He hit Casey with a pitch in the second inning, and when Stengel came to bat again in the fourth Weinert threw a fastball close to his head. Casey threw his bat angrily at the pitcher and ran toward him. Weinert was four or five inches taller than Stengel, outweighed him by twenty pounds and was more than ten years younger, but when they tangled and fell to the ground Casey was on top, swinging. Art Fletcher, a former Giant shortstop who was managing the Phils, grabbed Casey with a forearm under his chin and dragged him away from Weinert. Stengel struggled to get loose and back into the fight, but several policemen came on the field and two of them took Casey in hand. Still fuming, Stengel reluctantly allowed himself to be taken off the field. Next morning he learned that he had been suspended for ten days by National League president John Heydler.”

Stengel’s affinity for fighting with his fists did not end when his playing career did. While managing the Brooklyn Dodgers in a game at Ebbets Field, he fought with St. Louis Cardinals’ shortstop Leo Durocher in the“runway behind the dugout,” according to Roscoe McGowen in the May 13, 1936 edition of The New York Times.

Sixth At Bat: October 10, 1923

The first player to hit a World Series home run at Yankee Stadium was not Babe Ruth. It was Casey Stengel who, in 1923 played for the New York Giants, their home field the Polo Grounds, a ballpark that the Yankees played in from 1913 through 1922.

That homer ranks as one of Stengel’s biggest baseball moments as a player. Damon Runyon memorialized it in his story, “Stengel’s Homer Wins It for Giants, 5–4,” which I wrote about here.

Runyon’s piece began with these lines:

This is the way old “Casey” Stengel ran yesterday afternoon, running his home run home.

This is the way old “Casey” Stengel ran running his home run home to a Giant victory by a score of 5 to 4 in the first game of the World Series of 1923.

Stengel’s inside-the-park homer in the ninth broke a 4–4 tie. He was 32-years-old. Only two teammates, Hank Gowdy and Heinie Groh, were older.

In the bottom of the ninth as the Giants returned to their positions to defend their lead, Stengel did not leave the dugout. Instead, Bill Cunningham headed toward centerfield.

It was the ninth time the Giants and Yankees had met in a World Series game, and the ninth time the Yankees did not win.