28 Poisson Regression

Author

Jarad Niemi

library("tidyverse"); theme_set(theme_bw())

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("Sleuth3")
library("ggResidpanel")

Poisson regression can be used when your response variable is a count with no clear maximum. Similar to regression, explanatory variables can be continuous, categorical, or both. Here we will consider models up to two explanatory variables.

When there are two (or more) explanatory variables, we can consider a main effects model or models with interactions. The main effects model will result in parallel lines or line segments while the interaction model will result in more flexible lines or line segments. These lines will be straight when plotting the response variable on a logarithmic axis.

When counts are zero, the logarithm is undefined. Thus it is common to add 1 to all counts when plotting on a logarithmic axis.

28.1 Model

Let \(Y_i\) be the count of some phenomenon over some amount of time and/or space. The Poisson regression model assumes Assume \[ Y_i \stackrel{ind}{\sim} Po(\lambda_i), \quad \eta_i = \log\left(\lambda_i \right) = \beta_0 + \beta_1X_{i,1} + \cdots + \beta_{p} X_{i,p} \] where \(Po\) designates a Poisson random variable.

If we invert, then \[ E[Y_i] = Var[Y_i] = \lambda_i = e^{\beta_0 + \beta_1X_{i,1} + \cdots + \beta_{p} X_{i,p}} = e^{\beta_0}e^{\beta_1X_{i,1}}\cdots e^{\beta_p X_{i,p}}. \] Thus, there is an exponential relationship between the explanatory variables and the expected response.

28.1.1 Assumptions

The assumptions in a Poisson regression are

observations are independent
observations have a Poisson distribution (each with its own mean)

and

the relationship between the expected response is given in the exponential expression above.

28.1.2 Diagnostics

Note that there are no errors like there were in regression. Nonetheless we do have a notion of residuals, i.e. \[ r_i = y_i - \widehat{y}_i = y_i - e^{\widehat\beta_0 + \widehat\beta_1X_{i,1} + \cdots + \widehat\beta_{p} X_{i,p}} \] because each observation has a different variance (due to the mean-variance relationship) we often standardize a residual by dividing by its standard deviation, i.e. \[ r^{standardized}_i = \frac{r_i}{\sqrt{\widehat{y}_i }}. \] Diagnostic plots are generally produced using these residuals.

# Preferred diagnostic plots
my_plots <- c("resid",
              "index",
              "qq",
              "cookd")

28.1.3 Interpretation

When no interactions are in the model, an exponentiated regression coefficient is the multiplicative effect on the mean of the response for a 1 unit change in the explanatory variable when all other explanatory variables are help constant. Mathematically, we can see this through \[ E[Y_i|X_{i,p}=x+1] = e^{\beta_p} \times E[Y_i|X_{i,p}=x]. \] If \(\beta_p\) is positive, then \(e^{\beta_p} > 1\) and the mean goes up. If \(\beta_p\) is negative, then \(e^{\beta_p} < 0\) and the mean goes down.

If the explanatory variable is continuous, then this creates a line with an exponential relationship. If the explanatory variable is categorical, then \(X\) is a dummy variable and this is then the multiplicative effect when going from the reference level to the new level of the categorical variable.

28.1.4 Inference

Individual regression coefficients are evaluated using \(z\)-statistics derived from the Central Limit Theorem and data asymptotics. This approach is known as a Wald test. Similarly confidence intervals and prediction intervals are constructed using the same assumptions and are thus refered to as Wald intervals.

Collections of regression coefficients are evaluated with drop-in-deviance where the residual deviances from the model with and without these terms in the model are compared with a \(\chi^2\)-test.

28.2 Continuous variable

First we will consider simple Poisson regression where there is a single continuous explanatory variable.

# case0801
g <- ggplot(case0801,
       aes(x = Area,
           y = Species)) +
  geom_point() +
  scale_x_log10() 

# Through trial and error plotting 
# x and y on logarithmic axis looks the most linear
g + scale_y_log10()

Fit a Poisson regression model.

# Poisson regression
m <- glm(Species ~ log(Area),
         data = case0801,
         family = "poisson")

# Summary output
summary(m)


Call:
glm(formula = Species ~ log(Area), family = "poisson", data = case0801)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.84158    0.20679   8.906   <2e-16 ***
log(Area)    0.26251    0.02216  11.845   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 224.0801  on 6  degrees of freedom
Residual deviance:   5.0141  on 5  degrees of freedom
AIC: 46.119

Number of Fisher Scoring iterations: 4

# Coefficient
m$coef[2]

log(Area) 
 0.262509

exp(m$coef[2]) # multiplicative effect of a 1 unit increase in log(Area)

log(Area) 
 1.300188

# Confidence intervals
exp(confint(m))

Waiting for profiling to be done...

               2.5 %   97.5 %
(Intercept) 4.123222 9.286372
log(Area)   1.246857 1.360208

Diagnostics

my_plots <- c("resid",
              "index",
              "qq",
              "cookd")

resid_panel(m, 
            plots = my_plots,
            qqbands = TRUE,
            smoother = TRUE)

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Predictions

# Construct data frame for prediction
nd <- data.frame(
  Area = seq(
    from   = min(log(case0801$Area)),
    to     = max(log(case0801$Area)),
    length = 1001)
  ) |>
  mutate(
    Area = exp(Area)
  )
  
# Create predictions and combine with data
p <- 
  bind_cols(
    nd, 
    
    predict(m,               
            newdata = nd,
            se.fit = TRUE) |>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        Species = exp(fit),
        lwr     = exp(lwr),
        upr     = exp(upr)
      ) 
  )

# Plot
pg <- g + 
  geom_ribbon(
    data = p,
    mapping = aes(
      ymin = lwr,
      ymax = upr
    ),
    alpha = 0.2
  ) +
  geom_line(
    data = p,
    color = "blue"
  ) 

pg

We can also plot on the link scale which provides insight into the linear nature of the model.

pg + scale_y_log10()

You can use geom_smooth() to plot the mean function.

# To use geom_smooth
# set the method and the family argument
gs <- g + geom_smooth(
  method      = 'glm', 
  method.args = list(family = 'poisson'))

gs

`geom_smooth()` using formula = 'y ~ x'

# But you cannot plot on the log axis
gs + scale_y_log10()

`geom_smooth()` using formula = 'y ~ x'

Warning in dpois(y, mu, log = TRUE): non-integer x = 2.033424

Warning in dpois(y, mu, log = TRUE): non-integer x = 1.653213

Warning in dpois(y, mu, log = TRUE): non-integer x = 1.724276

Warning in dpois(y, mu, log = TRUE): non-integer x = 1.204120

Warning in dpois(y, mu, log = TRUE): non-integer x = 1.041393

Warning in dpois(y, mu, log = TRUE): non-integer x = 0.845098

28.3 Categorical variable

Poisson regression can be used with categorical variables.

# Construct data frame with only 1 categorical variable
woolA <- warpbreaks |>
  filter(wool == "A")

# Plot
g <- ggplot(woolA,
            aes(x = tension,
                y = breaks)) +
  geom_jitter(width = 0.1, height = 0)

g + scale_y_log10()

Fit regression model

# Fit Poisson regression model
m <- glm(breaks ~ tension,
         data = woolA,
         family = 'poisson')

# Model summary
summary(m)


Call:
glm(formula = breaks ~ tension, family = "poisson", data = woolA)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.79674    0.04994  76.030  < 2e-16 ***
tensionM    -0.61868    0.08440  -7.330 2.30e-13 ***
tensionH    -0.59580    0.08378  -7.112 1.15e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 194.97  on 26  degrees of freedom
Residual deviance: 119.62  on 24  degrees of freedom
AIC: 264.99

Number of Fisher Scoring iterations: 4

# Confidence intervals
exp(confint(m))

Waiting for profiling to be done...

                 2.5 %     97.5 %
(Intercept) 40.3356421 49.0597741
tensionM     0.4558586  0.6347433
tensionH     0.4670131  0.6486862

Calculate predictions

# Create data frame with unique explanatory variable values
nd <- woolA |>
  select(tension) |>
  unique()

# Predict
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        breaks = exp(fit),
        lwr     = exp(lwr),
        upr     = exp(upr)
      ) 
  )

Use geom_pointrange() to display uncertainty in group means.

# Plot group estimates
pg <- g + geom_pointrange(
  data = p,
  mapping = aes(
    ymin = lwr,
    ymax = upr
  ),
  color = "#F8766D"
)

pg

# Consider logarithmic axis
pg + scale_y_log10()

28.4 Two categorical variables

Poisson regression can be used with two categorical variables.

# Plot
g <- ggplot(warpbreaks,
            aes(x     = tension,
                y     = breaks,
                color = wool,
                shape = wool)) +
  geom_point(
    position = position_jitterdodge(
      jitter.width = 0.1, jitter.height = 0,
      dodge.width  = 0.1))

g + scale_y_log10()

28.4.1 Main effects

Fit main effects regression model

# Fit main effects Poisson regression model
m <- glm(breaks ~ tension + wool,
         data   = warpbreaks,
         family = 'poisson')

# Model summary
summary(m)


Call:
glm(formula = breaks ~ tension + wool, family = "poisson", data = warpbreaks)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.69196    0.04541  81.302  < 2e-16 ***
tensionM    -0.32132    0.06027  -5.332 9.73e-08 ***
tensionH    -0.51849    0.06396  -8.107 5.21e-16 ***
woolB       -0.20599    0.05157  -3.994 6.49e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 210.39  on 50  degrees of freedom
AIC: 493.06

Number of Fisher Scoring iterations: 4

# Confidence intervals
exp(confint(m))

Waiting for profiling to be done...

                 2.5 %     97.5 %
(Intercept) 36.6684652 43.8135426
tensionM     0.6441360  0.8158394
tensionH     0.5248964  0.6745203
woolB        0.7354572  0.9002669

Calculate predictions

# Create data frame with unique explanatory variable values
nd <- warpbreaks |>
  select(tension, wool) |>
  unique()

# Predict
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        breaks = exp(fit),
        lwr    = exp(lwr),
        upr    = exp(upr)
      ) 
  )

Use geom_pointrange() to display uncertainty in group means.

# Plot group estimates
pg <- g + geom_pointrange(
  data = p,
  mapping = aes(
    ymin = lwr,
    ymax = upr
  ),
  position = position_dodge(width = 0.1)
) +
  geom_line(
    data = p,
    mapping = aes(
      group = wool
    )
  )

pg

# Line segments are parallel
# on logarithmic scale
pg + scale_y_log10()

28.4.2 Interaction

Fit regression model

# Fit Poisson regression model w/ interaction
m <- glm(breaks ~ tension * wool, # interaction
         data   = warpbreaks,
         family = 'poisson')

# Model summary
summary(m)


Call:
glm(formula = breaks ~ tension * wool, family = "poisson", data = warpbreaks)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     3.79674    0.04994  76.030  < 2e-16 ***
tensionM       -0.61868    0.08440  -7.330 2.30e-13 ***
tensionH       -0.59580    0.08378  -7.112 1.15e-12 ***
woolB          -0.45663    0.08019  -5.694 1.24e-08 ***
tensionM:woolB  0.63818    0.12215   5.224 1.75e-07 ***
tensionH:woolB  0.18836    0.12990   1.450    0.147    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 182.31  on 48  degrees of freedom
AIC: 468.97

Number of Fisher Scoring iterations: 4

# Analysis of deviance
anova(m, test = "Chi")

Analysis of Deviance Table

Model: poisson, link: log

Response: breaks

Terms added sequentially (first to last)

             Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                            53     297.37              
tension       2   70.942        51     226.43 3.938e-16 ***
wool          1   16.039        50     210.39 6.206e-05 ***
tension:wool  2   28.087        48     182.31 7.962e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Calculate predictions

# Predict
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        breaks = exp(fit),
        lwr    = exp(lwr),
        upr    = exp(upr)
      ) 
  )

Use geom_pointrange() to display uncertainty in group means.

# Plot group estimates
pg <- g + geom_pointrange(
  data = p,
  mapping = aes(
    ymin = lwr,
    ymax = upr
  ),
  position = position_dodge(width = 0.1)
) +
  geom_line(
    data = p,
    mapping = aes(
      group = wool
    )
  )

pg

# Line segments are parallel
# on logarithmic scale
pg + scale_y_log10()

With an interaction, the line segments need not be parallel.

28.5 Continuous and categorical variable

# Create categorical variables
d <- ex2224 |>
  mutate(
    Operator = factor(Operator),
    Operator = fct_recode(Operator,
      air            = "1",
      solenoid       = "2",
      `motor-driven` = "3",
      manual         = "4"
    )
  )

# Plot data
g <- ggplot(d,
       aes(x = Time,
           y = Failures,
           color = Operator,
           shape = Operator)) +
  geom_jitter() +
  scale_x_log10()

g + 
  scale_y_log10()

Warning: Transformation introduced infinite values in continuous y-axis

28.5.1 Main effects

Fit main effects model

# Fit model
m <- glm(Failures ~ Operator + log(Time),
         data     = d,
         family   = 'poisson')

# Model summary
summary(m)


Call:
glm(formula = Failures ~ Operator + log(Time), family = "poisson", 
    data = d)

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)           0.01936    0.17769   0.109  0.91325    
Operatorsolenoid     -0.65929    0.51167  -1.289  0.19757    
Operatormotor-driven -0.27843    0.18543  -1.502  0.13320    
Operatormanual       -1.44738    0.46308  -3.126  0.00177 ** 
log(Time)             0.54250    0.08166   6.643 3.07e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 413.76  on 89  degrees of freedom
Residual deviance: 345.41  on 85  degrees of freedom
AIC: 459.75

Number of Fisher Scoring iterations: 6

# Drop-in-deviance test
anova(m, test = "Chi")

Analysis of Deviance Table

Model: poisson, link: log

Response: Failures

Terms added sequentially (first to last)

          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                         89     413.76              
Operator   3   27.047        86     386.71 5.755e-06 ***
log(Time)  1   41.302        85     345.41 1.304e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Create prediction data frame
nd <- expand.grid(
  Operator = unique(d$Operator),
  Time = seq(
    from   = min(d$Time),
    to     = max(d$Time),
    length = 1001)
)

# Predictions
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        Failures = exp(fit),
        lwr      = exp(lwr),
        upr      = exp(upr)
      ) 
  )

Plot model predictions

pg <- g + 
  geom_line(
    data = p,
    mapping = aes(
      group = Operator
    ))
pg

pg +
  scale_y_log10()

Warning: Transformation introduced infinite values in continuous y-axis

On the logarithmic axis, the main effects model has parallel lines.

28.5.2 Interaction

Fit regression model

# Fit interaction model
m <- glm(Failures ~ Operator * log(Time),
         data     = d,
         family   = 'poisson')

# Model summary
summary(m)


Call:
glm(formula = Failures ~ Operator * log(Time), family = "poisson", 
    data = d)

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    -0.05390    0.19426  -0.277   0.7814    
Operatorsolenoid                0.35890    0.70768   0.507   0.6121    
Operatormotor-driven           -0.07884    0.36002  -0.219   0.8267    
Operatormanual                 -1.39858    0.76127  -1.837   0.0662 .  
log(Time)                       0.58366    0.09028   6.465 1.01e-10 ***
Operatorsolenoid:log(Time)     -0.95896    0.67204  -1.427   0.1536    
Operatormotor-driven:log(Time) -0.14235    0.23427  -0.608   0.5434    
Operatormanual:log(Time)       -0.01793    0.55905  -0.032   0.9744    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 413.76  on 89  degrees of freedom
Residual deviance: 342.51  on 82  degrees of freedom
AIC: 462.85

Number of Fisher Scoring iterations: 6

# Drop-in-deviance test
anova(m, test = "Chi")

Analysis of Deviance Table

Model: poisson, link: log

Response: Failures

Terms added sequentially (first to last)

                   Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                  89     413.76              
Operator            3   27.047        86     386.71 5.755e-06 ***
log(Time)           1   41.302        85     345.41 1.304e-10 ***
Operator:log(Time)  3    2.898        82     342.51    0.4076    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Make predictions.

# Predictions
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        Failures = exp(fit),
        lwr      = exp(lwr),
        upr      = exp(upr)
      ) 
  )

Plot model predictions

pg <- g + 
  geom_line(
    data = p,
    mapping = aes(
      group = Operator
    ))
pg

pg +
  scale_y_log10()

Warning: Transformation introduced infinite values in continuous y-axis

28.6 Two continuous variables

dim(case2202)

[1] 47  4

summary(case2202)

      Site       Salamanders        PctCover       ForestAge    
 Min.   : 1.0   Min.   : 0.000   Min.   : 0.00   Min.   :  0.0  
 1st Qu.:12.5   1st Qu.: 0.000   1st Qu.:18.00   1st Qu.: 29.5  
 Median :24.0   Median : 1.000   Median :83.00   Median : 64.0  
 Mean   :24.0   Mean   : 2.468   Mean   :58.98   Mean   :168.8  
 3rd Qu.:35.5   3rd Qu.: 3.000   3rd Qu.:88.00   3rd Qu.:266.5  
 Max.   :47.0   Max.   :13.000   Max.   :93.00   Max.   :675.0

The minimum values include zero. The logarithm of zero is undefined which will create difficulties in plotting on logarithmic axes. One way to get around this is to add a small value before taking the logarithm. A common choice is to add the smallest non-zero value for that variable.

# Function to extract minimum non-zero value
min_nonzero <- function(x) { min(x[x>0]) }

# Find minimum nonzero
mnzForestAge   <- min_nonzero(case2202$ForestAge)
mnzPctCover    <- min_nonzero(case2202$PctCover)

g <- ggplot(case2202,
       aes(x     = ForestAge   + mnzForestAge,
           y     = Salamanders + 1,
           color = PctCover)) +
  geom_point() 

g +
  scale_x_log10() + 
  scale_y_log10()

28.6.1 Main effects

Fit regression model

# Main effects model
m <- glm(Salamanders ~ 
           log(ForestAge + mnzForestAge) + 
           log(PctCover + mnzPctCover),
         data = case2202,
         family = 'poisson')

summary(m)


Call:
glm(formula = Salamanders ~ log(ForestAge + mnzForestAge) + log(PctCover + 
    mnzPctCover), family = "poisson", data = case2202)

Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   -3.84934    0.98282  -3.917 8.98e-05 ***
log(ForestAge + mnzForestAge) -0.00752    0.10510  -0.072    0.943    
log(PctCover + mnzPctCover)    1.16057    0.26653   4.354 1.33e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 190.22  on 46  degrees of freedom
Residual deviance: 129.58  on 44  degrees of freedom
AIC: 220.64

Number of Fisher Scoring iterations: 6

When both explanatory variables are continuous, select certain value of the non-x-axis variable to make lines for.

# Create prediction data frame
nd <- expand.grid(
  PctCover = seq(
    from   = min(case2202$PctCover),
    to     = max(case2202$PctCover),
    length = 3),
  ForestAge = seq(
    from   = min(case2202$ForestAge),
    to     = max(case2202$ForestAge),
    length = 1001)
)

# Predictions
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        Salamanders = exp(fit),
        lwr         = exp(lwr),
        upr         = exp(upr)
      ) 
  )

Plot model predictions

pg <- g + 
  geom_line(
    data = p,
    mapping = aes(
      group = PctCover
    ))
pg

pg +
  scale_x_log10() + 
  scale_y_log10()

On the logarithmic axis, the main effects model has parallel lines.

28.6.2 Interaction

Fit regression model

# Main effects model
m <- glm(Salamanders ~ 
           log(ForestAge + mnzForestAge) * # <- interaction 
           log(PctCover + mnzPctCover),
         data = case2202,
         family = 'poisson')

summary(m)


Call:
glm(formula = Salamanders ~ log(ForestAge + mnzForestAge) * log(PctCover + 
    mnzPctCover), family = "poisson", data = case2202)

Coefficients:
                                                          Estimate Std. Error
(Intercept)                                                 0.6389     1.5579
log(ForestAge + mnzForestAge)                              -1.9670     0.7703
log(PctCover + mnzPctCover)                                 0.1779     0.3620
log(ForestAge + mnzForestAge):log(PctCover + mnzPctCover)   0.4349     0.1680
                                                          z value Pr(>|z|)   
(Intercept)                                                 0.410  0.68173   
log(ForestAge + mnzForestAge)                              -2.554  0.01066 * 
log(PctCover + mnzPctCover)                                 0.492  0.62306   
log(ForestAge + mnzForestAge):log(PctCover + mnzPctCover)   2.588  0.00965 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 190.22  on 46  degrees of freedom
Residual deviance: 123.65  on 43  degrees of freedom
AIC: 216.71

Number of Fisher Scoring iterations: 6

Main effects weren’t significant, but the interaction is. By convention, we will include the main effect when we include an interaction.

Make predictions.

# Predictions
p <- bind_cols(
  nd,
  predict(m,
          newdata = nd,
          se.fit = TRUE)|>
      as.data.frame() |>
      
      # Manually construct confidence intervals
      mutate(
        lwr = fit - qnorm(0.975) * se.fit,
        upr = fit + qnorm(0.975) * se.fit,
        
        # Exponentiate to get to response scale
        Salamanders = exp(fit),
        lwr         = exp(lwr),
        upr         = exp(upr)
      ) 
  )

Plot model predictions

pg <- g + 
  geom_line(
    data = p,
    mapping = aes(
      group = PctCover
    ))
pg

pg +
  scale_x_log10() + 
  scale_y_log10()