12  ggplot2 syntax

Author

Jarad Niemi

R Code Button

In order to use the ggplot2 graphics system, you need a long data frame. How to obtain this data frame will be discussed in the Data Wrangling module. Here we will show the syntax of how to construct a ggplot2 graphic after you have an appropriate data frame.

library("tidyverse")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

12.1 Example

# Example
ggplot(data = mtcars) +
  geom_point(
    mapping = aes(x = disp, 
                  y = mpg)
  ) 

12.1.1 Template

# ggplot2 template
ggplot(data = <DATA>) +
  <GEOM_FUNCTION>(
    mapping = aes(<MAPPINGS>),
    stat = <STAT>,
    position = <POSITION>) +
  <OTHER FUNCTIONS>()

12.2 Data

Suppose we want to make a plot of the data in the warpbreaks data set. You can find out what information is in the warpbreaks data data set by looking at its helpfile ?warpbreaks. For now, we need to know what the names of the variables in the data set. It will likely be helpful to know what type of variables we have. We find the appropriate information by using the following commands.

# Investigate warpbreaks data set
names(warpbreaks)
[1] "breaks"  "wool"    "tension"
head(warpbreaks)
  breaks wool tension
1     26    A       L
2     30    A       L
3     54    A       L
4     25    A       L
5     70    A       L
6     52    A       L
summary(warpbreaks)
     breaks      wool   tension
 Min.   :10.00   A:27   L:18   
 1st Qu.:18.25   B:27   M:18   
 Median :26.00          H:18   
 Mean   :28.15                 
 3rd Qu.:34.00                 
 Max.   :70.00                 

Suppose we would like to make a plot of the number of breaks (y-axis) vs the tension. To do so, we can use the code below.

# Warpbreaks scatter plot
ggplot(data = warpbreaks) +                # <DATA>
  geom_point(                              # <GEOM_FUNCTION>
    mapping = aes(y = breaks,              # <MAPPINGS>
                  x = tension) 
  )

12.3 Position

Since the points are aligned in vertical lines and therefore may be covering each other up. We will use jittering to add a little randomness to the position of the points to ensure that they don’t overlap.

# Warpbreaks scatterplot with jitter
ggplot(data = warpbreaks) + 
  geom_point(
    mapping = aes(y = breaks, 
                  x = tension),
    position = position_jitter())           # <POSITION>

12.4 Stat

This rarely used option provides functionality for some types of plots. In a scatterplot, the functionality is rarely used.

# Warpbreaks scatterplot with point size as the sum
ggplot(data = warpbreaks) + 
  geom_point(
    mapping = aes(y = breaks, x = tension),
    stat = "sum")                           # <STAT>

12.5 Geom

Let’s switch the type of plot to a boxplot.

# Example boxplot
ggplot(data = warpbreaks) + 
  geom_boxplot(                # <GEOM_FUNCTION>
    mapping = aes(y = breaks, 
                  x = tension)
  )

Here is another way to specify a scatterplot with jitter.

# Example jitter <GEOM>
ggplot(data = warpbreaks) + 
  geom_jitter(                 # <GEOM_FUNCTION>
    mapping = aes(y = breaks, 
                  x = tension)
  )

12.5.1 Smoothers

Recall the original plot

# One layer
ggplot(data = mtcars) +
  geom_point(
    mapping = aes(x = disp, 
                  y = mpg))

We can add layers to this plot by using multiple calls.

# Two layers
ggplot(data = mtcars) +
  geom_point(                 # <GEOM_FUNCTION>
    mapping = aes(x = disp, 
                  y = mpg)) +
  geom_smooth(                # <GEOM_FUNCTION>
    mapping = aes(x = disp, 
                  y = mpg))
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Since we are using the same mappings in both calls, we can include the in the original ggplot call.

# Two layers
ggplot(data = mtcars,
       mapping = aes(x = disp, 
                  y = mpg)) +
  geom_point() +               # <GEOM_FUNCTION> 
  geom_smooth()                # <GEOM_FUNCTION>
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Let’s add color to represent another variable.

# Color
ggplot(data = mtcars,
       mapping = aes(
         x     = disp, 
         y     = mpg, 
         color = factor(vs))) + # <MAPPING>
  geom_point() +
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Linear regression lines

# Regression layer
ggplot(data = mtcars,
       mapping = aes(
         x     = disp, 
         y     = mpg, 
         color = factor(vs))) + 
  geom_point() +
  geom_smooth(method = "lm") # <GEOM_FUNCTION> option
`geom_smooth()` using formula = 'y ~ x'

12.6 Mappings

12.6.1 Axes

# x and y axes
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x = disp, # <MAPPINGS>
          y = mpg)
    ) 

# switched
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x = hp,   # new 
          y = mpg)) 

# replaced
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x = hp, 
          y = wt)) 

12.6.2 Colors

# color wt
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = wt)) 

# color cyl
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = cyl)) 

# color vs
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = vs)) 

For categorical variables (even if they are coded numerically), we want to have discrete colors.

# color numerically coded variable as discrete
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = factor(cyl)))

# another example
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = factor(vs)))

12.6.3 Shapes

# shape vs
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          shape = factor(vs)))

# shape cyl
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          shape = factor(cyl)))

If you try to use a continuous variable for shapes, you will receive an error.

# shape cyl (as numeric)
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          shape = cyl))
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `scale_f()`:
! A continuous variable cannot be mapped to the shape aesthetic
ℹ choose a different aesthetic or use `scale_shape_binned()`

12.6.4 Colors and Shapes

# colors and shapes
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = wt, 
          shape = factor(cyl)))

12.7 Coordinates

12.7.1 Logarithmic axes

# Logarithmic axes
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = wt, 
          shape = factor(cyl))) +
  
  # <OTHER FUNCTIONS>
  scale_x_log10() +
  scale_y_log10()

12.7.2 Axis limits

# Axis limits
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = wt, 
          shape = factor(cyl))) +
  
  # <OTHER FUNCTIONS>
  xlim(0, 500) +
  ylim(0, 50)

12.8 Facets

Facetting is a way of making many small plots.

12.8.1 Facet wrap

For one categorical variable with many levels use facet_wrap.

# facet_wrap
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x = disp, 
          y = mpg, 
          color = factor(vs))) +
  
  # <OTHER FUNCTIONS>
  facet_wrap( ~ cyl)

12.8.2 Facet grid

For two categorical variables with every combination of their levels, use facet_grid.

# facet_grid
ggplot(data = mtcars) +
  geom_point(
    mapping = 
      aes(x     = disp, 
          y     = mpg, 
          color = factor(vs))) +
  
  # <OTHER FUNCTIONS>
  facet_grid(vs ~ cyl)

12.9 Summary

Putting this all together,

library("tidyverse")
d <- mtcars %>%
  
  mutate(
    Engine = ifelse(vs, "straight", "V-shaped"),
    Transmission = ifelse(am, "manual", "automatic"),
    Horsepower = hp
  )

ggplot(data = d,
       mapping = aes(x = disp, 
                     y = mpg, 
                     color = Horsepower, 
                     shape = Transmission)) +
  
  geom_point() +
  geom_smooth() + 
  
  scale_x_log10() +
  scale_y_log10() +
  
  labs(
    x = "Displacement (cu.in.)",
    y = "Miles / gallon",
    title = "1974 Motor Trend Road Tests"
  ) +
  
  facet_grid(Engine ~ cyl, 
             labeller = label_both)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'