--- title: "Run Scoring Trends" author: "Martin Monkman" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Run Scoring Trends} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Run scoring trends: Sub-titles and captions with ggplot2 This vignette demonstrates how to create a plot (using `ggplot2`) showing Major League Baseball runscoring trends since the 1901 season. First, we load the necessary packages: `Lahman` (containing the baseball data), `ggplot2` to create the plots, and the data carpentry package `dplyr` (note that `ggplot2` and `dplyr` are included in the `tidyverse` package): ```{r Setup, message=FALSE} # package load library(Lahman) library(ggplot2) library(dplyr) ``` #### Read and summarize the data For this example, we'll use the data table `Teams` in the Lahman database. Once it's loaded, the data are filtered and summarized using dplyr. - filters from 1901 (the establishment of the American League) to the most recent year, - filters out (using `!=`) the Federal League - after grouping by the year, summarizes the total number of runs scored, runs allowed, and games played - finally, calculates the league runs (`leagueRPG`) and runs allowed (`leagueRAPG`) per game ```{r, message=FALSE} data(Teams) MLB_RPG <- Teams %>% filter(yearID > 1900, lgID != "FL") %>% group_by(yearID) %>% summarise(R=sum(R), RA=sum(RA), G=sum(G)) %>% mutate(leagueRPG=R/G, leagueRAPG=RA/G) ``` #### A basic plot You may have heard that run scoring in Major League Baseball has been down in recent years--or is it going back up? This is a perfect opportunity to visualize the date; what can we see in a plot? For the first version of the plot, we'll make a basic X-Y plot, where the X axis has the years and the Y axis has the average number of runs scored. With `ggplot2`, it's easy to add a trend line (using the `geom_smooth` option). The `scale_x_continuous` options set the limits and breaks of the axes. ```{r, message=FALSE} MLBRPGplot <- ggplot(MLB_RPG, aes(x=yearID, y=leagueRPG)) + geom_point() + geom_smooth(span = 0.25) + scale_x_continuous(breaks = seq(1900, 2015, by = 20)) + scale_y_continuous(limits = c(3, 6), breaks = seq(3, 6, by = 1)) MLBRPGplot ``` The way we would set the title, along with X and Y axis labels, would be something like this. ```{r message=FALSE} MLBRPGplot + ggtitle("MLB run scoring, 1901-2016") + theme(plot.title = element_text(hjust=0, size=16)) + xlab("year") + ylab("team runs per game") ``` ### Adding a subtitle: the function So now we have a nice looking dot plot showing the average number of runs scored per game for the years 1901-2016. (The 2016 data is the most recent that has been added to the database.) But a popular feature of charts--particularly in magazines--is a subtitle that has a summary of what the chart shows and/or what the author wants to emphasize. In this case, we could legitimately say something like any of the following: - The peak of run scoring in the 2000 season has been followed by a steady drop - Teams scored 20% fewer runs in 2016 than in 2000 - Team run scoring has fallen to just over 4 runs per game from the 2000 peak of 5 runs - Run scoring has been falling for 15 years, reversing a 30 year upward trend I like this last one, drawing attention not only to the recent decline but also the longer trend that started with the low-scoring environment of 1968. How can we add a subtitle to our chart that does that, as well as a caption that acknowledges the source of the data? The new `labs` function, available starting with `ggplot2` version 2.2.0, lets us do that. Note that `labs` contains the title, subtitle, caption, as well as the X and Y axis labels. ```{r, message=FALSE} MLBRPGplot + labs(title = "MLB run scoring, 1901-2016", subtitle = "Run scoring in 2016 was the highest in seven years", caption = "Source: the Lahman baseball database", x = "year", y = "team runs per game") ``` Easy. For more information about the `labs` function in `ggplot2`, the ["Modify axis, legend, and plot labels"](https://ggplot2.tidyverse.org/reference/labs.html) reference page within the [`ggplot2`](https://ggplot2.tidyverse.org/) site, part of the [Tidyverse](https://www.tidyverse.org/). *** This vignette is an update of the blog posts: * [Major League Baseball run scoring trends with R's Lahman package](https://bayesball.blogspot.com/2013/06/major-league-baseball-run-scoring.html) * [Run scoring trends: using Shiny to create dynamic charts and tables in R](https://bayesball.blogspot.com/2015/01/run-scoring-trends-using-shiny-to.html) * [github.com/MonkmanMH/MLBrunscoring_shiny](https://github.com/MonkmanMH/MLBrunscoring_shiny)