We load all the packages used in the examples.

```
library(ggpmisc)
library(tibble)
library(dplyr)
library(quantreg)
<- requireNamespace("nlme", quietly = TRUE)
eval_nlme if (eval_nlme) library(nlme)
<- requireNamespace("broom", quietly = TRUE)
eval_broom if (eval_broom) library(broom)
<- requireNamespace("broom.mixed", quietly = TRUE)
eval_broom_mixed if (eval_broom_mixed) library(broom.mixed)
```

As we will use text and labels on the plotting area we change the default theme to an uncluttered one.

`<- theme_set(theme_bw()) old_theme `

Common plot annotations based on model fits are model equations,
tests of significance and various indicators of goodness of fit. What
annotations are most useful depend on whether both *x* and
*y* are mapped to continuous variables, or *y* to a
continuous variable, and *x* to a factor. In some cases it may be
desirable to add an ANOVA table or a summary table as plot insets. Other
annotations computed from data are various summaries and features such
as location of local or global maxima and minima (Please see help pages
for `stat_peaks()`

and `stat_valleys()`

). Several
additional statistics for plot annotations computed from
`data`

are documented in the vignettes and help pages of
package ‘ggpp’ even if these statistics are imported and re-exported by
‘ggpmisc’.

Supporting computed and model-fit based annotations in ggplots can be implemented as new statistics that do computations on the plot data and pass the results to existing geometries. These new statistics are only a convenience in the sense that similar plots can be achieved by fitting models and operating on the fitted model objects before plotting and adding plot layers by passing the model fit data as arguments to individual layer functions in a plot. This approach also requires additional code to assemble character strings to be parsed into R expressions. We find such an approach rather tedious and error prone, so we have developed the statistics described in this vignette.

The statistics defined in ‘ggpmisc’ even if usable with native geometries from package ‘ggplot2’, in most cases have as default geometries defined in package ‘ggpp’. These new geometries are better suited for plot annotations than those from ‘ggplot2’.

There is always a compromise between simplicity and flexibility in which the choice of default arguments plays a key role obtaining a balance. I have aimed to make everyday use simple without constraining the flexibility inherent to the Grammar of Graphics. The assumption is that users desire/need the flexibility inherent in the layered approach of the Grammar of Graphics, and that the layer functions provided will be combined with those from ‘ggplot2’, ‘ggpp’ and other packages extending the Grammar of Graphics as implemented in ‘ggplot2’.

Statistics made available by package ‘ggpmisc’ that help with
reporting the results of model fits are `stat_poly_line()`

,
`stat_poly_eq()`

, `stat_ma_line()`

,
`stat_ma_eq()`

, `stat_quant_line()`

,
`stat_quant_band()`

, `stat_quant_eq()`

,
`stat_fit_residuals()`

, `stat_fit_deviations()`

,
`stat_fit_glance()`

, `stat_fit_augment()`

,
`stat_fit_tidy()`

and `stat_fit_tb()`

. In
addition, `stat_correlation()`

helps with the reporting of
Pearson, Kendall and Spearman correlations.

All these statistics support grouping and faceting. In the case of
those that return character strings to be mapped to the
`label`

aesthetic they return suitable `x`

,
`y`

, `hjust`

and `vjust`

values to
avoid overlaps. The automatic computation of these positions can be
adjusted through arguments passed to these statistics and also
overridden by user-supplied manual positions.

The table below summarizes their use and the type plot layer geometries they are most useful to combine with. The four statistics at the bottom of the table require in their use more effort as the mapping to aesthetics and conversion of numeric values into character strings must be coded in the layer function call, but on the other hand they support all model fit objects catered by package ‘broom’ or any of the packages extending ‘broom’, including user-fined methods.

Statistic | `after_stat` values (default geometry) |
Methods |
---|---|---|

`stat_poly_eq()` |
equation, R^{2}, P, etc.
(`text_npc` ) |
lm, rlm (weight aesthetic fully supported) |

`stat_ma_eq()` |
equation, R^{2}, P, etc.
(`text_npc` ) |
lmodel2: MA, SMA, RMA, OLS |

`stat_quant_eq()` |
equation, P, etc. (`text_npc` ) |
rq (any number of quantiles) |

`stat_correlation()` |
correlation, P-value, CI
(`text_npc` ) |
Pearson ( t), Kendall (z),
Spearman (S) |

`stat_poly_line()` |
line + conf. (`smooth` ) |
lm, rlm (weight aesthetic fully supported) |

`stat_ma_line()` |
line + conf. (`smooth` ) |
lmodel2: MA, SMA, RMA, OLS |

`stat_quant_line()` |
line + conf. (`smooth` ) |
rq, rqss (any number of quantiles) |

`stat_quant_band()` |
median + quartiles (`smooth` ) |
rq, rqss (two or three quantiles) |

`stat_fit_residuals()` |
residuals (`point` ) |
lm, rlm (weight aesthetic fully supported) |

`stat_fit_deviations()` |
deviations from observations (`segment` ) |
lm, rlm (weight aesthetic fully supported) |

`stat_fit_glance()` |
equation, R^{2}, P, etc.
(`text_npc` ) |
all those supported by ‘broom’ |

`stat_fit_augment()` |
predicted and other values (`smooth` ) |
all those supported by ‘broom’ |

`stat_fit_tidy()` |
fit results, e.g., for equation (`text_npc` ) |
all those supported by ‘broom’ |

`stat_fit_tb()` |
ANOVA and summary tables (`table_npc` ) |
all those supported by ‘broom’ |

The three statistics with names ending in `eq`

return by
default ready formatted character strings that can be mapped to the
`label`

aesthetic individually or concatenated using using
`paste()`

or `sprintf()`

functions within the call
to `aes()`

. By default the character strings are formatted
ready to be parsed as R expressions. Optionally they can be formatted as
R Markdown, \(\LaTeX\) or plain text as
shown in the table below.

The returned values for `output.type`

different from
`numeric`

are shown below. The default mapping of the label
aesthetics is indicated with an asterisk. The values for `x`

and `y`

are for the default location of the labels. The
returned data frame has one row per group of observations in each panel
of the plot. (1) depends on argument passed to `method`

.

Variable | Mode | stat_poly_eq() | stat_ma_eq() | stat_quant_eq() | stat_correlation() |
---|---|---|---|---|---|

eq.label | character | y | y | y* | n |

rr.label | character | y* | y* | n | n |

adj.rr.label | character | y | n | n | n |

cor.label | character | n | n | n | y(1) |

rho.label | character | n | n | y | y(1) |

tau.label | character | n | n | n | y(1) |

r.label | character | n | n | n | y* |

f.value.label | character | y | n | n | n |

t.value.label | character | n | n | n | y(1) |

z.value.label | character | y | n | n | y(1) |

S.value.label | character | y | n | n | y(1) |

p.value.label | character | y | y | n | y |

AIC.label | character | y | n | y | n |

BIC.label | character | y | n | n | n |

n.label | character | y | y | y | y |

grp.label | character | y | y | y | y |

r.squared | numeric | y | y | n | n |

adj.r.squared | numeric | y | n | n | n |

rho | numeric | n | n | y | y(1) |

tau | numeric | n | n | n | y(1) |

cor | numeric | n | n | n | y(1) |

p.value | numeric | y | y | n | y |

n | numeric | y | y | y | y |

rq.method | character | n | n | y | n |

method | character | n | n | n | y |

test | character | n | n | n | y |

quantile | numeric | n | n | y | n |

quantile.f | factor | n | n | y | n |

x | numeric | y | y | y | y |

y | numeric | y | y | y | y |

Even though the number of significant digits and some other
formatting can be adjusted by passing arguments, in some cases it maybe
necessary for the user to do the conversion to character strings. For
these special cases, it is possible to have numeric values. The values
returned for `output.type`

equal to `numeric`

are
shown below. The returned data frame has one row per group of
observations in each panel of the plot. (1) depends on argument passed
to `method`

.

Variable | Mode | stat_poly_eq() | stat_ma_eq() | stat_quant_eq() | stat_correlation() |
---|---|---|---|---|---|

coef.ls | list | y | y | y | n |

r.squared | numeric | y | y | n | n |

adj.r.squared | numeric | y | n | n | n |

rho | numeric | n | n | y | y(1) |

tau | numeric | n | n | n | y(1) |

cor | numeric | n | n | n | y(1) |

f.value | numeric | y | n | n | n |

f.df1 | numeric | y | n | n | n |

f.df2 | numeric | y | n | n | n |

t.value | numeric | n | n | n | y(1) |

df | numeric | n | n | n | y(1) |

z.value | numeric | n | n | n | y(1) |

S.value | numeric | n | n | n | y(1) |

p.value | numeric | y | y | n | y |

AIC | numeric | y | y | y | n |

BIC | numeric | y | y | n | n |

n | numeric | y | y | y | y |

b_0.constant | numeric | y | y | y | n |

b_0 | numeric | y | y | y | n |

b_1 | numeric | y | y | y | n |

b_2…b_n | numeric | y | y | y | n |

rq.method | character | n | n | y | n |

method | character | n | n | n | y |

test | character | n | n | n | y |

quantile | numeric | n | n | y | n |

quantile.f | factor | n | n | y | n |

x | numeric | y | y | y | y |

y | numeric | y | y | y | y |

The three statistics with names ending in `line`

and the
one with name ending in `band`

return numeric data suitable
for plotting lines or lines plus bands, with variables `x`

,
`y`

, `ymin`

and `ymax`

. They return a
basic set of variables which can be expanded by setting
`mf.values = TRUE`

adding `n`

,
`r.squared`

, `adj.r.squared`

and
`p.value`

when available, in fact, the same numeric variables
that the corresponding `stat_<name>_eq()`

statistics
add to the label data. The number of rows per group and panel in the
plot is that given by `n`

with default
`n = 80`

.

Statistics `stat_peaks()`

and `stat_valleys()`

can be used to locate and annotate global and local maxima and minima in
the variable mapped to the `y`

aesthetic.

When plotting omics data we usually want to highlight observations
based on the outcome of tests of significance for each one of 100’s or
1000’s genes or metabolites. We define some scales, as wrappers on
continuous `x`

and `y`

scales from package
`ggplot2`

that are suitable for differences in abundance
expressed as fold changes, *P*-value and FDR.

Scales `scale_x_logFC()`

and `scale_y_logFC()`

are suitable for plotting of log fold change data. Scales
`scale_x_Pvalue()`

, `scale_y_Pvalue()`

,
`scale_x_FDR()`

and `scale_y_FDR()`

are suitable
for plotting *p*-values and adjusted *p*-values or false
discovery rate (FDR). Default arguments are suitable for volcano and
quadrant plots as used for transcriptomics, metabolomics and similar
data.

Encoding of test outcomes into binary and ternary colour scales is
also frequent when plotting omics data, so as a convenience we define
wrappers on the color, fill and shape scales from ‘ggplot2’ as well as
defined functions suitable for converting binary and ternary outcomes
and numeric *P*-values into factors.

Scales `scale_colour_outcome()`

,
`scale_fill_outcome()`

and `scale_shape_outcome()`

and functions `outome2factor()`

,
`threshold2factor()`

, `xy_outcomes2factor()`

and
`xy_thresholds2factor()`

used together make it easy to map
ternary numeric outputs and logical binary outcomes to color, fill and
shape aesthetics. Default arguments are suitable for volcano, quadrant
and other plots as used for genomics, metabolomics and similar data.

The statistic `stat_correlation()`

computes one of
Pearson, Kendall or Spearman correlation coefficients and tests if the
differ from zero. This is done by calling
`stats::cor.test()`

, with all of its formal parameters
supported. Depending on the argument passed to parameter
`output.type`

, only numeric values or numeric values plus
formatted character strings are returned. The default behaviour is to
generate labels as character strings using R’s *expression*
syntax, suitable to be *parsed*. This stat can also output labels
using other mark-up languages, including \(\LaTeX\) and Markdown, or just numeric
values. Nevertheless, all examples below use expression syntax, which is
the most commonly used.

We first generate a set of artificial data suitable for the plotting examples in this and subsequent sections.

```
set.seed(4321)
# generate artificial data
<- 1:100
x <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
y <- data.frame(x,
my.data
y, group = c("A", "B"),
y2 = y * c(0.5,2),
block = c("a", "a", "b", "b"),
wt = sqrt(x))
```

For the first example we use defaults to add an annotation with Pearson’s correlation coefficient.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation()
```

Grouping is supported.

```
ggplot(my.data, aes(x, y, color = group)) +
geom_point() +
stat_correlation()
```

We can also compute Spearman’s rank correlation. (The symbol used for it
is the letter rho to distinguish it from Pearson’s correlation for which
*R* or *r* are used as symbols.)

```
ggplot(my.data, aes(x, y, color = group)) +
geom_point() +
stat_correlation(method = "spearman")
```

Statistic `stat_correlation()`

generates multiple labels as
listed in the tables above. We can combine them freely within a call to
`aes()`

to customize the annotations.

```
ggplot(my.data, aes(x, y, color = group)) +
geom_point() +
stat_correlation(mapping = aes(label = paste(after_stat(cor.label),
after_stat(t.value.label),
after_stat(p.value.label),
after_stat(n.label),
sep = '*"; "*')))
```

Facets are also supported.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation() +
facet_wrap(~group)
```

Using the numeric values returned it is possible to set other aesthetics on-the-fly.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(mapping = aes(color = ifelse(after_stat(cor) > 0.9,
"red", "black"))) +
scale_color_identity() +
facet_wrap(~group)
```

Above we use the value of the correlation coefficient, but other numeric
values such as `p.value`

can be used similarly. In addition
to color, any aesthetics recognized by the geometry used such as
`alpha`

and `face`

can be also used. The
statistics described below for adding equations of the fitted models
have user interfaces very similar to that of
`stat_correaltion()`

so examples can be easily adapted.
Please, see the next section for examples of the use of different
geometries instead of the default one, and other advanced examples.

A frequently used annotation in plots showing fitted lines is the
display of the parameters estimates from the model fit as an equation.
`stat_poly_eq()`

automates this for models of *y* on
*x* or *x* on *y* fitted with function
`lm()`

, and `MASS:rlm()`

or a user-supplied model
fitting function and arguments. By default this *statistic*
behaves similarly as `ggplot2::stat_smooth()`

with
`method = "lm"`

for *y* on *x* fits but adds
support for model formulas of *x* on *y* (in addition to
the use of `orientation = "y"`

). Its default behaviour is to
generate labels as character strings using R’s *expression*
syntax. This stat can also output equations using different mark-up
languages, including \(\LaTeX\) and
Markdown, or just numeric values selected by the argument passed to
parameter `output.type`

. Nevertheless, all examples below use
expression syntax.

The statistic `stat_poly_line()`

adds a layer identical to
that from `ggplot2::stat_smooth()`

but has the same user
interface and default arguments as `stat_poly_eq()`

.

Other *statistics* described in later sections of this
vignette are even more flexible allowing various annotations for model
fit functions supported by package ‘broom’ or ‘broom.mixed’.

We first generate a set of artificial data suitable for the plotting examples.

```
set.seed(4321)
# generate artificial data
<- 1:100
x <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
y <- data.frame(x,
my.data
y, group = c("A", "B"),
y2 = y * c(0.5,2),
block = c("a", "a", "b", "b"),
wt = sqrt(x))
```

First one example using defaults. The best practice, ensuring that
the formula used in both *statistics* is the same is assign the
formula to a variable, as shown here. This is because the same model is
fit twice to the same `data`

, once in
`stat_smooth`

and once in `stat_poly_eq`

.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(formula = formula)
```

A ready formatted equation is also returned as a string that needs to
be parsed into an *expression* for display.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)), formula = formula)
```

`stat_poly_eq()`

makes available several character strings
in the returned data frame in separate columns: `eq.label`

,
`rr.label`

, `adj.rr.label`

,
`AIC.label`

, `BIC.label`

,
`f.value.label`

, `p.value.label`

and
`n`

. One of these, `rr.label`

, is used by default,
but `aes()`

can be used to map a different one to the
`label`

aesthetic. Here we show the adjusted coefficient of
determination and AIC. We call `stat_poly_eq()`

twice to be
able to separately control the position and size of each label.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(adj.rr.label)), formula = formula) +
stat_poly_eq(aes(label = after_stat(AIC.label)),
label.x = "right", label.y = "bottom", size = 3,
formula = formula)
```

Within `aes()`

it is also possible to *compute* new
labels based on those returned plus “arbitrary” text. The labels
returned by default are meant to be *parsed* into expressions, so
any text added should be valid for a string that will be parsed.
Inserting a comma plus white space in an expression requires some
trickery in the argument passed to `sep`

. Do note the need to
*escape* the embedded quotation marks as `\"`

.
Combining the labels in this way ensures correct alignment. To insert
only white space `sep = "~~~~~"`

can be used, with each tilde
character, `~`

, adding a rather small amount of white space.
We show only here, not to clutter the remaining examples, how to change
the axis labels to ensure consistency with the typesetting of the
equation.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = paste(after_stat(eq.label), after_stat(adj.rr.label),
sep = "*\", \"*")),
formula = formula) +
labs(x = expression(italic(x)), y = expression(italic(y)))
```

Above we combined two character-string labels, but it is possible to add additional ones and even character strings constants. In this example we use several labels instead of just two, and separate them with various character strings. We also change the size of the text in the label.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rr.label), "*\", \"*",
after_stat(f.value.label), "*\", and \"*",
after_stat(p.value.label), "*\".\"",
sep = "")),
formula = formula, size = 3)
```

It is also possible to format the text using `plain()`

,
`italic()`

, `bold()`

or `bolditalic()`

as described in `plotmath`

.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = paste(after_stat(eq.label), after_stat(adj.rr.label),
sep = "~~italic(\"with\")~~")),
formula = formula)
```

As these are expressions, to include two lines of text, we need
either to add `stat_poly_eq()`

twice, passing an argument to
`label.y`

to reposition one of the labels (as shown above) or
use (as shown here) `atop()`

within the expression to create
a single label with two lines of text.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = paste("atop(", after_stat(AIC.label), ",", after_stat(BIC.label), ")", sep = "")),
formula = formula)
```

Next, one example of how to remove the left hand side
(*lhs*).

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
eq.with.lhs = FALSE,
formula = formula)
```

Replacing the *lhs*.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
eq.with.lhs = "italic(hat(y))~`=`~",
formula = formula)
```

And a final example replacing both the *lhs* and the variable
symbol used on the *rhs*.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
eq.with.lhs = "italic(h)~`=`~",
eq.x.rhs = "~italic(z)",
formula = formula) +
labs(x = expression(italic(z)), y = expression(italic(h)))
```

As any valid R expression can be used, Greek letters are also supported, as well as the inclusion in the label of variable transformations used in the model formula or applied to the data. In addition, in the next example we insert white space in between the parameter estimates and the variable symbol in the equation.

```
<- y ~ poly(x, 2, raw = TRUE)
formula ggplot(my.data, aes(x, log10(y + 1e6))) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
eq.with.lhs = "plain(log)[10](italic(delta)+10^6)~`=`~",
eq.x.rhs = "~Omega",
formula = formula) +
labs(y = expression(plain(log)[10](italic(delta)+10^6)), x = expression(Omega))
```

A couple of additional examples of polynomials of different orders, and specified in different ways.

Higher order polynomial. When using function `poly()`

it
is necessary to set `raw = TRUE`

.

```
<- y ~ poly(x, 5, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)), formula = formula)
```

Intercept forced to zero.

```
<- y ~ x + I(x^2) + I(x^3) - 1
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)), formula = formula)
```

Even when labels are returned, numeric values are also returned for
`r.square`

, `p.value`

and `n`

. This can
be used in `aes()`

for example to show the equation only if a
certain condition like a minimum value for `r.square`

has
been reached.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = ifelse(after_stat(adj.r.squared > 0.3),
paste(after_stat(eq.label), after_stat(adj.rr.label),
sep = "*\", \"*"),
after_stat(adj.rr.label))),
formula = formula) +
labs(x = expression(italic(x)), y = expression(italic(y)))
```

We give below several examples to demonstrate how other components of
the `ggplot`

object affect the behaviour of this
statistic.

Facets work as expected either with fixed or free scales. Although below we had to adjust the size of the font used for the equation.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)), size = 3,
formula = formula) +
facet_wrap(~group)
```

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)), size = 3,
formula = formula) +
facet_wrap(~group, scales = "free_y")
```

Grouping, in this example using the colour aesthetic also works as expected. We can use justification and supply an absolute location for the equation, but the default frequently works well as in the example below.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, colour = group)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)), formula = formula, vstep = 0.06)
```

To add a label to the equation for each group, we map it to a
pseudo-aesthetic named `grp.label`

. (Equation labelling uses
a “back door” in ‘ggplot2’ that may be closed in future versions,
meaning that this approach to labelling may stop working in the
future.)

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, colour = group, grp.label = group)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(paste("bold(", grp.label, "*\":\")~~",
sep = ""))),
eq.label, formula = formula)
```

Being able to label equations allows us to dispense with the use of color, which in many cases is needed for printed output.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, linetype = group, grp.label = group)) +
geom_point() +
stat_poly_line(formula = formula, color = "black") +
stat_poly_eq(aes(label = after_stat(paste("bold(", grp.label, "*':')~~~",
sep = ""))),
eq.label, formula = formula)
```

User supplied label positions relative to the ranges of the
*x* and *y* scales are also supported, both through string
constants and numeric values in the range 0 to 1, when using the default
`geom_text_npc()`

.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, colour = group)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
formula = formula,
label.x = "centre")
```

The default locations are now based on normalized coordinates, and
consequently these defaults work even when the range of the *x*
and *y* scales varies from panel to panel.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, fill = block)) +
geom_point(shape = 21, size = 3) +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(rr.label)), size = 3,
geom = "label_npc", alpha = 0.33,
formula = formula) +
facet_wrap(~group, scales = "free_y")
```

If grouping is not the same within each panel created by faceting, the automatic location of labels results in “holes” for the factor levels not present in a given panel. In this case, consistent positioning requires passing explicitly the positions for each individual group. In the plot below the simultaneous mapping to both color and fill aesthetics creates four groups, two with data in panel A and the other two in panel B.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, colour = group, fill = block)) +
geom_point(shape = 21, size = 3) +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(rr.label)), size = 3, alpha = 0.2,
geom = "label_npc", label.y = c(0.95, 0.85, 0.95, 0.85),
formula = formula) +
facet_wrap(~group, scales = "free_y")
```

It is possible to use `geom_text()`

and
`geom_label()`

but in this case, if label coordinates are
given explicitly they should be expressed in native data coordinates.
When multiple labels need to be positioned a vector of coordinates can
be used as shown here for `label.x`

and
`label.y`

.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y2, colour = group)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(geom = "text", aes(label = after_stat(eq.label)),
label.x = c(100, 90), label.y = c(-1e4, 2.1e6), hjust = "inward",
formula = formula)
```

**Note:** Automatic positioning using
`geom_text()`

and `geom_label()`

is not supported
when faceting uses free scales. In this case
`geom_text_npc()`

and/or `geom_label_npc()`

must
be used.

Occasionally, possibly as a demonstration in teaching, we may want to
compare a fit of *x* on *y* and one of *y* on
*x* in the same plot. As is the case for model fitting functions
themselves this is simply indicated by the model passed as argument to
`formula`

, or consistently with `stat_smooth()`

through parameter `orientation`

.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_poly_line(formula = y ~ x, color = "blue") +
stat_poly_eq(aes(label = after_stat(eq.label)), color = "blue") +
stat_poly_line(formula = y ~ x, color = "red", orientation = "y") +
stat_poly_eq(aes(label = after_stat(eq.label)), color = "red", orientation = "y",
label.y = 0.9)
```

Package ‘lmodel2’ implements model II fits for major axis (MA),
standard major axis (SMA) and ranged (RMA) regression for linear
relationships. These approaches of regression are used when both
variables are subject to random variation and the intention is not to
estimate *y* from *x* but instead to describe the slope of
the relationship between them. A typical example in biology are
allometric relationships.

In the figure above we see that the ordinary least squares (OLS)
linear regression of *y* on *x* and *x* on
*y* differ. Major axis regression provides an estimate of the
slope that is independent of the direction of the relationship relation.
\(R^2\) remains the same in all cases,
while the equation changes it describes the same line.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_ma_line() +
stat_ma_eq(aes(label = after_stat(eq.label)))
```

```
## RMA was not requested: it will not be computed.
##
## RMA was not requested: it will not be computed.
```

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_ma_line(color = "blue") +
stat_ma_eq(aes(label = after_stat(eq.label)), color = "blue") +
stat_ma_line(color = "red", orientation = "y") +
stat_ma_eq(aes(label = after_stat(eq.label)), color = "red", orientation = "y",
label.y = 0.9)
```

```
## RMA was not requested: it will not be computed.
##
## RMA was not requested: it will not be computed.
##
## RMA was not requested: it will not be computed.
##
## RMA was not requested: it will not be computed.
```

Package ‘quantreg’ implements quantile regression for multiple types
of models: linear models with function `rq()`

, non-linear
models with function `nlreg()`

, and additive models with
function `rqss()`

. Each of these functions supports different
methods of estimation. As with the statistics for polynomials described
in the previous section, all the statistics described in the current
section support linear models while `stat_quant_line()`

and
`stat_quant_band()`

also support additive models. (Support
for `nlreg()`

is not yet implemented.)

In the case of quantile regression it is frequently the case that
multiple quantile regressions are fitted. This case is not supported by
`stat_poly_eq()`

and we need to use
`stat_quant_eq()`

. This statistics is very similar to
`stat_poly_eq()`

both in behaviour and parameters. It
supports quantile regression of *y* on *x* or *x*
on *y* fitted with function `quantreg::rq`

.

The statistic `stat_quant_line()`

adds a layer similar to
that created by `ggplot2::stat_smooth()`

including confidence
bands for quantile regression, by default for *p* = 0.95.
However, it has the same user interface and default arguments as
`stat_quant_eq()`

, and supports both *x* and
*y* as explanatory variable in `formula`

and also
`orientation`

. Confidence intervals for the fits to
individual quantiles fulfil the same purpose as in OLS fits: providing a
boundary for the *estimated line* computed for each individual
quantile. They are not shown by default, unless a single quantile is
passed as argument, but can be enabled with `se = TRUE`

and
disabled with `se = FALSE`

.

The statistic `stat_quant_band()`

creates a line plus band
plot that is, by default equivalent to the box of a box plot showing
regressions as a band for quartiles plus a line for the median media. In
this case the band informs about the spread of the *observations*
line the box in box plots does. Which three quantiles are fitted can be
set through parameter `quantiles`

but in contrast to
`stat_quant_line()`

and `stat_quant_eq()`

only a
vector of three quantiles is accepted.

Similarly to `ggplot2::stat_quantiles()`

,
`stat_quant_eq()`

and `stat_quant_line()`

accept a
`numeric`

vector as argument to `quantiles`

and
output one equation or one line per group and quantile. Stat
`stat_quant_eq()`

also returns column `grp.label`

containing by default a character string indicating the value of the
quantile.

Here we use the default value for quantiles,
`c(0.25, 0.50, 0.75)`

, equivalent to those used for the box
of box plots, and we obtain a plot with a line for median regression and
a band highlighting the space between the quartiles.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_band(formula = y ~ poly(x, 2))
```

Overriding the values for `quantiles`

we find a boundary
containing 90% of the observations.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = y ~ poly(x, 2), quantiles = c(0.1, 0.9))
```

A line for the median regression plus a confidence band for it is the
default for a single quantile. (This is similar to the behaviour of
`ggplot2::stat_smooth()`

.)

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = y ~ poly(x, 2), quantiles = 0.5)
```

With `stat_quant_eq()`

we add the fitted equations to the
first example in this section (we also override the use of color).

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_band(formula = formula, color = "black", fill = "grey60") +
stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
after_stat(eq.label), sep = "")),
formula = formula) +
theme_classic()
```

Grouping and faceting are supported by `stat_quant_eq()`

,
`stat_quant_line()`

and `stat_quant_band()`

. Below
are some examples of grouping.

```
ggplot(my.data, aes(x, y, color = group)) +
geom_point() +
stat_quant_line(formula = formula) +
stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
after_stat(eq.label), sep = "")),
formula = formula)
```

This can be combined with a group label by mapping in
`aes()`

a character variable to `grp.label`

.

```
ggplot(my.data, aes(x, y, group = group, linetype = group,
shape = group, grp.label = group)) +
geom_point() +
stat_quantile(formula = formula, color = "black") +
stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
after_stat(eq.label), sep = "")),
formula = formula, quantiles = c(0.05, 0.95)) +
theme_classic()
```

In some cases *double quantile regression* is an informative
method to assess reciprocal constraints. For this a fit of *x* on
*y* and one of *y* on *x* in the same plot are
computed for a large quantile. As is the case for model fitting
functions themselves this is simply indicated by the model passed as
argument to `formula`

, or consistently with
`ggplot2::stat_smooth()`

through parameter
`orientation`

.

```
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = y ~ x, color = "blue", quantiles = 0.05) +
stat_quant_eq(aes(label = after_stat(eq.label)), formula = y ~ x, color = "blue",
quantiles = 0.05) +
stat_quant_line(formula = x ~ y, color = "red", quantiles = 0.95) +
stat_quant_eq(aes(label = after_stat(eq.label)), formula = x ~ y, color = "red",
quantiles = 0.95, label.y = 0.9)
```

Please, see the section on `stat_poly_eq()`

for additional
examples, as here we have highlighted mainly the differences between
these two statistics.

Sometimes we need to quickly plot residuals matching fits plotted
with `geom_smooth()`

using grouping and facets, so a new
(simple) statistic was born. In teaching it is specially important to
avoid distractions and plots residuals from different types of models
consistently.

`stat_fit_residuals()`

can be used to plot the residuals
form models of *y* on *x* or *x* on *y*
fitted with function `lm`

, `MASS:rlm`

,
`quantreg::rq`

(only median regression) or a user-supplied
model fitting function and arguments. In the returned values the
response variable is replaced by the residuals from the fit. The default
geom is `"point"`

.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y, colour = group)) +
geom_hline(yintercept = 0, linetype = "dashed") +
stat_fit_residuals(formula = formula)
```

We can of course also map the weights to aesthetics, here we use
`size`

.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y, colour = group)) +
geom_hline(yintercept = 0, linetype = "dashed") +
stat_fit_residuals(formula = formula,
method = "rlm",
mapping = aes(size = sqrt(after_stat(weights))),
alpha = 2/3)
```

Weighted residuals are available, but in this case they do not differ as we have not mapped any variable to the aesthetic.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y, colour = group)) +
geom_hline(yintercept = 0, linetype = "dashed") +
stat_fit_residuals(formula = formula, weighted = TRUE)
```

When teaching we may need to highlight residuals in plots used in
slides and notes. Statistic `stat_fit_deviations`

makes it
easy to achieve this for the residuals from models of *y* on
*x* or *x* on *y* fitted with function
`lm`

, `MASS:rlm`

, `quantreg::rq`

(only
median regression) or a user-supplied model fitting function and
arguments. The default geom is `"segment"`

and the values of
the the observations and the fitted values are mapped to aesthetics so
that each deviation is displayed as a segment.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_smooth(method = "lm", formula = formula) +
stat_fit_deviations(formula = formula, colour = "red") +
geom_point()
```

As the geometry used by default is `geom_segment()`

,
additional aesthetics can be mapped or set to constant values. Here we
add arrowheads.

```
<- y ~ poly(x, 3, raw = TRUE)
formula ggplot(my.data, aes(x, y)) +
geom_smooth(method = "lm", formula = formula) +
geom_point() +
stat_fit_deviations(formula = formula, colour = "red",
arrow = arrow(length = unit(0.015, "npc"),
ends = "both"))
```

When weights are available, either supplied by the user, or computed
as part of the fit, they are returned in `data`

. Having
weights available allows encoding them using colour. We here use a
robust regression fit with `MASS::rlm`

.

```
<- my.data
my.data.outlier 6, "y"] <- my.data.outlier[6, "y"] * 5
my.data.outlier[ggplot(my.data.outlier, aes(x, y)) +
stat_smooth(method = MASS::rlm, formula = formula) +
stat_fit_deviations(formula = formula, method = "rlm",
mapping = aes(colour = after_stat(weights)),
show.legend = TRUE) +
scale_color_gradient(low = "red", high = "blue", limits = c(0, 1)) +
geom_point()
```

Package ‘broom’ provides consistently formatted output from different
model fitting functions. These made it possible to write a
model-annotation statistic that is very flexible but that requires
additional input from the user to generate the character strings to be
mapped to the `label`

aesthetic.

As we have above given some simple examples, we here exemplify this
statistic in a plot with grouping, and assemble a label for the
*P*-value using a string parsed into a expression. We also change
the default position of the labels.

```
# formula <- y ~ poly(x, 3, raw = TRUE)
# broom::augment does not handle poly() correctly!
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_fit_glance(method = "lm",
method.args = list(formula = formula),
label.x = "right",
label.y = "bottom",
aes(label = paste("italic(P)*\"-value = \"*",
signif(after_stat(p.value), digits = 4),
sep = "")),
parse = TRUE)
```

It is also possible to fit a non-linear model with
`method = "nls"`

, and any other model for which a
`glance()`

method exists. Do consult the documentation for
package ‘broom’. Here we fit the Michaelis-Menten equation to reaction
rate versus concentration data.

```
<- y ~ SSmicmen(x, Vm, K)
micmen.formula ggplot(Puromycin, aes(conc, rate, colour = state)) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_fit_glance(method = "nls",
method.args = list(formula = micmen.formula),
aes(label = paste("AIC = ", signif(after_stat(AIC), digits = 3),
", BIC = ", signif(after_stat(BIC), digits = 3),
sep = "")),
label.x = "centre", label.y = "bottom")
```

This stat makes it possible to add the equation for any fitted model
for which `broom::tidy()`

is implemented. Alternatively,
individual values such as estimates for the fitted parameters, standard
errors, or *P*-values can be added to a plot. However, the user
has to explicitly construct the labels within `aes()`

. This
statistic respects grouping based on aesthetics, and reshapes the output
of `tidy()`

so that the values for a given group are in a
single row in the returned `data`

.

As first example we fit a non-linear model, the Michaelis-Menten
equation of reaction kinetics, using `nls()`

. We use the
self-starting function `SSmicmen()`

available in R.

```
<- y ~ SSmicmen(x, Vm, K)
micmen.formula ggplot(Puromycin, aes(conc, rate, colour = state)) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_fit_tidy(method = "nls",
method.args = list(formula = micmen.formula),
label.x = "right",
label.y = "bottom",
aes(label = paste("V[m]~`=`~", signif(after_stat(Vm_estimate), digits = 3),
"%+-%", signif(after_stat(Vm_se), digits = 2),
"~~~~K~`=`~", signif(after_stat(K_estimate), digits = 3),
"%+-%", signif(after_stat(K_se), digits = 2),
sep = "")),
parse = TRUE)
```

Using paste we can build a string that can be parsed into an R expression, in this case for a non-linear equation.

```
<- y ~ SSmicmen(x, Vm, K)
micmen.formula ggplot(Puromycin, aes(conc, rate, colour = state)) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_fit_tidy(method = "nls",
method.args = list(formula = micmen.formula),
size = 3,
label.x = "center",
label.y = "bottom",
vstep = 0.12,
aes(label = paste("V~`=`~frac(", signif(after_stat(Vm_estimate), digits = 2), "~C,",
signif(after_stat(K_estimate), digits = 2), "+C)",
sep = "")),
parse = TRUE) +
labs(x = "C", y = "V")
```

What if we would need a more specific *statistic*, similar to
`stat_poly_eq()`

? We can use `stat_fit_tidy()`

as
the basis for its definition.

```
<- function(vstep = 0.12,
stat_micmen_eq size = 3,
...) {stat_fit_tidy(method = "nls",
method.args = list(formula = micmen.formula),
aes(label = paste("V~`=`~frac(", signif(after_stat(Vm_estimate), digits = 2), "~C,",
signif(after_stat(K_estimate), digits = 2), "+C)",
sep = "")),
parse = TRUE,
vstep = vstep,
size = size,
...) }
```

The code for the figure is now simpler, and still produces the same figure (not shown).

```
<- y ~ SSmicmen(x, Vm, K)
micmen.formula ggplot(Puromycin, aes(conc, rate, colour = state)) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_micmen_eq(label.x = "center",
label.y = "bottom") +
labs(x = "C", y = "V")
```

As a second example we show a quantile regression fit using function
`rq()`

from package ‘quantreg’.

```
<- y ~ x
my_formula
ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point() +
geom_quantile(quantiles = 0.5, formula = my_formula) +
stat_fit_tidy(method = "rq",
method.args = list(formula = y ~ x, tau = 0.5),
tidy.args = list(se.type = "nid"),
mapping = aes(label = sprintf('y~"="~%.3g+%.3g~x*", with "*italic(P)~"="~%.3f',
after_stat(Intercept_estimate),
after_stat(x_estimate),
after_stat(x_p.value))),
parse = TRUE)
```

We can define a `stat_rq_eq()`

if we need to add similar
equations to several plots. In this example we retain the ability of the
user to override most of the default default arguments.

```
<-
stat_rq_eqn function(formula = y ~ x,
tau = 0.5,
method = "br",
mapping = aes(label = sprintf('y~"="~%.3g+%.3g~x*", with "*italic(P)~"="~%.3f',
after_stat(Intercept_estimate),
after_stat(x_estimate),
after_stat(x_p.value))),
parse = TRUE,
...) {<- list(formula = formula, tau = tau, method = method)
method.args stat_fit_tidy(method = "rq",
method.args = method.args,
tidy.args = list(se.type = "nid"),
mapping = mapping,
parse = parse,
...) }
```

And the code of the figure now as simple as. Figure not shown, as is identical to the one above.

```
ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point() +
geom_quantile(quantiles = 0.5, formula = my_formula) +
stat_rq_eqn(tau = 0.5, formula = my_formula)
```

This stat makes it possible to add summary or ANOVA tables for any
fitted model for which `broom::tidy()`

is implemented. The
output from `tidy()`

is embedded as a single list value
within the returned `data`

, an object of class
`tibble`

. This statistic **ignores grouping**
based on aesthetics. This allows fitting models when `x`

or
`y`

is a factor (as in such cases `ggplot`

splits
the data into groups, one for each level of the factor, which is needed
for example for `stat_summary()`

to work as expected). By
default, the `"table"`

geometry is used. The use of
`geom_table()`

is described in a separate section of this
User Guide. Table themes and mapped aesthetics are supported.

The default output of `stat_fit_tb`

is the default output
from `tidy(mf)`

where `mf`

is the fitted
model.

```
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_fit_tb(method = "lm",
method.args = list(formula = formula),
tb.vars = c(Parameter = "term",
Estimate = "estimate",
"s.e." = "std.error",
"italic(t)" = "statistic",
"italic(P)" = "p.value"),
label.y = "top", label.x = "left",
parse = TRUE)
```

When `tb.type = "fit.anova"`

the output returned is that
from `tidy(anova(mf))`

where `mf`

is the fitted
model. Here we also show how to replace names of columns and terms, and
exclude one column, in this case, the mean squares.

```
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_fit_tb(method = "lm",
method.args = list(formula = formula),
tb.type = "fit.anova",
tb.vars = c(Effect = "term",
df = "df",
"italic(F)" = "statistic",
"italic(P)" = "p.value"),
tb.params = c(x = 1, "x^2" = 2, "x^3" = 3, Resid = 4),
label.y = "top", label.x = "left",
parse = TRUE)
```

When `tb.type = "fit.coefs"`

the output returned is that
of `tidy(mf)`

after selecting the `term`

and
`estimate`

columns.

```
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_fit_tb(method = "lm",
method.args = list(formula = formula),
tb.type = "fit.coefs", parse = TRUE,
label.y = "center", label.x = "left")
```

Faceting works as expected, but grouping is ignored as mentioned
above. In this case, the color aesthetic is not applied to the text of
the tables. Furthermore, if `label.x.npc`

or
`label.y.npc`

are passed numeric vectors of length > 1,
the corresponding values are obeyed by the different panels.

```
<- y ~ SSmicmen(x, Vm, K)
micmen.formula ggplot(Puromycin, aes(conc, rate, colour = state)) +
facet_wrap(~state) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_fit_tb(method = "nls",
method.args = list(formula = micmen.formula),
tb.type = "fit.coefs",
label.x = 0.9,
label.y = c(0.75, 0.2)) +
theme(legend.position = "none") +
labs(x = "C", y = "V")
```

The data in the example below are split by `ggplot`

into
six groups based on the levels of the `feed`

factor. However,
as `stat_fit_tb()`

ignores groupings, we can still fit a
linear model to all the data in the panel.

```
ggplot(chickwts, aes(factor(feed), weight)) +
stat_summary(fun.data = "mean_se") +
stat_fit_tb(tb.type = "fit.anova",
label.x = "center",
label.y = "bottom") +
expand_limits(y = 0)
```

We can flip the system of coordinates, if desired.

```
ggplot(chickwts, aes(factor(feed), weight)) +
stat_summary(fun.data = "mean_se") +
stat_fit_tb(tb.type = "fit.anova", label.x = "left", size = 3) +
scale_x_discrete(expand = expansion(mult = c(0.2, 0.5))) +
coord_flip()
```

It is also possible to rotate the table using `angle`

.
Here we also show how to replace the column headers with strings to be
parsed into R expressions.

```
ggplot(chickwts, aes(factor(feed), weight)) +
stat_summary(fun.data = "mean_se") +
stat_fit_tb(tb.type = "fit.anova",
angle = 90, size = 3,
label.x = "right", label.y = "center",
hjust = 0.5, vjust = 0,
tb.vars = c(Effect = "term",
"df",
"M.S." = "meansq",
"italic(F)" = "statistic",
"italic(P)" = "p.value"),
parse = TRUE) +
scale_x_discrete(expand = expansion(mult = c(0.1, 0.35))) +
expand_limits(y = 0)
```

**Experimental!** Use `ggplot2::stat_smooth`

instead of `stat_fit_augment`

if possible.

For a single panel and no grouping, there is little advantage in
using this statistic compared to the examples in the documentation of
package ‘broom’. With grouping and faceting
`stat_fit_augment`

may occasionally be more convenient than
`ggplot2::stat_smooth`

because of its additional
flexibility.

```
# formula <- y ~ poly(x, 3, raw = TRUE)
# broom::augment does not handle poly correctly!
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y)) +
geom_point() +
stat_fit_augment(method = "lm",
method.args = list(formula = formula))
```

```
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y, colour = group)) +
geom_point() +
stat_fit_augment(method = "lm",
method.args = list(formula = formula))
```

We can override the variable returned as `y`

to be any of
the variables in the data frame returned by `broom::augment`

while still preserving the original `y`

values.

```
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y)) +
stat_fit_augment(method = "lm",
method.args = list(formula = formula),
geom = "point",
y.out = ".resid")
```

```
<- y ~ x + I(x^2) + I(x^3)
formula ggplot(my.data, aes(x, y, colour = group)) +
stat_fit_augment(method = "lm",
method.args = list(formula = formula),
geom = "point",
y.out = ".std.resid")
```

We can use any model fitting method for which `augment`

is
implemented.

```
<- list(formula = y ~ k * e ^ x,
args start = list(k = 1, e = 2))
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
stat_fit_augment(method = "nls",
method.args = args)
```

```
<- list(formula = y ~ k * e ^ x,
args start = list(k = 1, e = 2))
ggplot(mtcars, aes(wt, mpg)) +
stat_fit_augment(method = "nls",
method.args = args,
geom = "point",
y.out = ".resid")
```

**Note:** The tidiers for mixed models have moved to
package ‘broom.mixed’!

```
<- list(model = y ~ SSlogis(x, Asym, xmid, scal),
args fixed = Asym + xmid + scal ~1,
random = Asym ~1 | group,
start = c(Asym = 200, xmid = 725, scal = 350))
ggplot(Orange, aes(age, circumference, colour = Tree)) +
geom_point() +
stat_fit_augment(method = "nlme",
method.args = args,
augment.args = list(data = quote(data)))
```

Volcano and quadrant plots are scatter plots with some peculiarities.
Creating these plots from scratch using ‘ggplot2’ can be time consuming,
but allows flexibility in design. Rather than providing a ‘canned’
function to produce volcano plots, package ‘ggpmisc’ provides several
building blocks that facilitate the production of volcano and quadrant
plots. These are wrappers on *scales* from packages ‘scales’ and
‘ggplot2’ with suitable defaults plus helper functions to create factors
from numeric outcomes.

Manual scales for color and fill aesthetics with defaults suitable for the three way outcomes from statistical tests.

Scales for *x* or *y* aesthetics mapped to
*P*-values, FDR (false discovery rate) or log FC (logarithm of
fold-change) as used in volcano and quadrant plots of transcriptomics
and metabolomics data.

A simple function to expand scale limits to be symmetric around zero. Can be passed as argument to parameter limits of continuous scales from packages ‘ggpmisc’, ‘ggplot2’ or ‘scales’ (and extensions).

Volcano plots are frequently used for transcriptomics and
metabolomics. They are used to show *P*-values or FDR (false
discovery rate) as a function of effect size, which can be either an
increase or a decrease. Effect sizes are expressed as fold-changes
compared to a control or reference condition. Colors are frequently used
to highlight significant responses. Counts of significant increases and
decreases are frequently also added.

Outcomes encoded as -1, 0 or 1, as seen in the tibble below need to
be converted into factors with suitable labels for levels. This can be
easily achieved with function `outcome2factor()`

.

`head(volcano_example.df) `

```
## tag gene outcome logFC PValue genotype
## 1 AT1G01040 ASU1 0 -0.15284466 0.35266997 Ler
## 2 AT1G01290 ASG4 0 -0.30057068 0.05471732 Ler
## 3 AT1G01560 ATSBT1.1 0 -0.57783350 0.06681310 Ler
## 4 AT1G01790 AtSAM1 0 -0.04729662 0.74054263 Ler
## 5 AT1G02130 AtTRM82 0 -0.14279891 0.29597519 Ler
## 6 AT1G02560 PRP39 0 0.23320752 0.07487043 Ler
```

Most frequently fold-change data is available log-transformed, using either 2 or 10 as base. In general it is more informative to use tick labels expressed as un-transformed fold-change.

By default `scale_x_logFC()`

and
`scale_y_logFC()`

expect the logFC data log2-transformed, but
this can be overridden. The default use of untransformed fold-change for
tick labels can also be overridden. Scale `scale_x_logFC()`

in addition by default expands the scale limits to make them symmetric
around zero. If `%unit`

is included in the name character
string, suitable units are appended as shown in the example below.

Scales `scale_y_Pvalue()`

and
`scale_x_Pvalue()`

have suitable defaults for name and
labels, while `scale_colour_outcome()`

provides suitable
defaults for the colors mapped to the outcomes. To change the labels and
title of the `key`

or `guide`

pass suitable
arguments to parameters `name`

and `labels`

of
these scales. These x and y scales by default *squish* off-limits
(out-of-bounds) observations towards the limits.

```
%>%
volcano_example.df mutate(., outcome.fct = outcome2factor(outcome)) %>%
ggplot(., aes(logFC, PValue, colour = outcome.fct)) +
geom_point() +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome() +
stat_quadrant_counts(data = . %>% filter(outcome != 0))
```

By default `outcome2factor()`

creates a factor with three
levels as in the example above, but this default can be overridden as
shown below.

```
%>%
volcano_example.df mutate(., outcome.fct = outcome2factor(outcome, n.levels = 2)) %>%
ggplot(., aes(logFC, PValue, colour = outcome.fct)) +
geom_point() +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome() +
stat_quadrant_counts(data = . %>% filter(outcome != 0))
```

Quadrant plots are scatter plots with comparable variables on both axes and usually include the four quadrants. When used for transcriptomics and metabolomics, they are used to compare responses expressed as fold-change to two different conditions or treatments. They are useful when many responses are measured as in transcriptomics (many different genes) or metabolomics (many different metabolites).

A single panel quadrant plot is as easy to produce as a volcano plot
using scales `scale_x_logFC()`

and
`scale_y_logFC()`

. The data includes two different outcomes
and two different log fold-change variables. See the previous section on
volcano plots for details. In this example we use as shape a filled
circle and map one of the outcomes to color and the other to fill, using
the two matched scales `scale_colour_outcome()`

and
`scale_fill_outcome()`

.

`head(quadrant_example.df)`

```
## tag gene outcome.x outcome.y logFC.x logFC.y genotype
## 1 AT5G12290 AtMC9 0 0 -0.3226849 0.32887081 Ler
## 2 AT5G47040 ATFRO2 0 0 -0.1067945 -0.18828653 Ler
## 3 AT5G57560 14-3-3EPSILON 0 0 0.2232457 0.74447768 Ler
## 4 AT5G24110 ATPTEN1 0 0 -0.7253487 0.06952586 Ler
## 5 AT2G41290 RHF2A 0 0 0.4435049 -0.32347741 Ler
## 6 AT4G36650 GAE5 0 0 -0.1410215 0.18697968 Ler
```

In this plot we do not include those genes whose change in transcript abundance is uncertain under both x and y conditions.

```
%>%
quadrant_example.df mutate(.,
outcome.x.fct = outcome2factor(outcome.x),
outcome.y.fct = outcome2factor(outcome.y),
outcome.xy.fct = xy_outcomes2factor(outcome.x, outcome.y)) %>%
filter(outcome.xy.fct != "none") %>%
ggplot(., aes(logFC.x, logFC.y, colour = outcome.x.fct, fill = outcome.y.fct)) +
geom_quadrant_lines(linetype = "dotted") +
stat_quadrant_counts(size = 3, colour = "white") +
geom_point(shape = "circle filled") +
scale_x_logFC(name = "Transcript abundance for x%unit") +
scale_y_logFC(name = "Transcript abundance for y%unit") +
scale_colour_outcome() +
scale_fill_outcome() +
theme_dark()
```

To plot in separate panels those observations that are significant
along both *x* and *y* axes, *x* axis, *y*
axis, or none, with quadrants merged takes more effort. We first define
two helper functions to add counts and quadrant lines to each of the
four panels.

```
<- function(...) {
all_quadrant_counts list(
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "xy"), ...),
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "x"), pool.along = "y", ...),
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "y"), pool.along = "x", ...),
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "none"), quadrants = 0L, ...)
) }
```

```
<- function(...) {
all_quadrant_lines list(
geom_hline(data = data.frame(outcome.xy.fct = factor(c("xy", "x", "y", "none"),
levels = c("xy", "x", "y", "none")),
yintercept = c(0, NA, 0, NA)),
aes_(yintercept = ~yintercept),
na.rm = TRUE,
...),geom_vline(data = data.frame(outcome.xy.fct = factor(c("xy", "x", "y", "none"),
levels = c("xy", "x", "y", "none")),
xintercept = c(0, 0, NA, NA)),
aes_(xintercept = ~xintercept),
na.rm = TRUE,
...)
) }
```

And use these functions to build the final plot, in this case including all genes.

```
%>%
quadrant_example.df mutate(.,
outcome.x.fct = outcome2factor(outcome.x),
outcome.y.fct = outcome2factor(outcome.y),
outcome.xy.fct = xy_outcomes2factor(outcome.x, outcome.y)) %>%
ggplot(., aes(logFC.x, logFC.y, colour = outcome.x.fct, fill = outcome.y.fct)) +
geom_point(shape = 21) +
all_quadrant_lines(linetype = "dotted") +
all_quadrant_counts(size = 3, colour = "white") +
scale_x_logFC(name = "Transcript abundance for x%unit") +
scale_y_logFC(name = "Transcript abundance for y%unit") +
scale_colour_outcome() +
scale_fill_outcome() +
facet_wrap(~outcome.xy.fct) +
theme_dark()
```