35  Functions

Author

Jarad Niemi

R Code Button

R is a functional programming language.

class(log)
[1] "function"

35.1 Function basics

Functions take in some input and return some output. The input are a collection of arguments to the function and the output is the return value.

35.1.1 Arguments

log(10)
[1] 2.302585
log(x = 10)
[1] 2.302585
log(10, base = exp(1))
[1] 2.302585
log(10, base = 10)
[1] 1
log(x = 10, base = 10)
[1] 1

Take a look at the arguments.

args(log)
function (x, base = exp(1)) 
NULL

35.1.1.1 Default arguments

In the log function, the default value for the base argument is exp(1).

all.equal(
  log(10),
  log(10, base = exp(1))
)
[1] TRUE

35.1.1.2 Positional matching

log(10, exp(1))
[1] 2.302585
log(exp(1), 10)
[1] 0.4342945

35.1.1.3 Name matching

log(x = 10, base = exp(1))
[1] 2.302585
log(base = exp(1), x = 10)
[1] 2.302585

35.1.1.4 Partial matching

log(10, b = exp(1))
[1] 2.302585
log(10, ba = exp(1))
[1] 2.302585
log(10, bas = exp(1))
[1] 2.302585
log(10, base = exp(1))
[1] 2.302585

35.1.1.5 R objects as input

y <- 100
log(y)
[1] 4.60517

35.1.2 Return value

class(log(10))
[1] "numeric"
class(as.data.frame(10))
[1] "data.frame"
class(all.equal(1,1))
[1] "logical"
class(all.equal(1,2))
[1] "character"
m <- lm(len ~ dose, data = ToothGrowth)
class(m)
[1] "lm"
class(summary(m))
[1] "summary.lm"

35.2 Building functions

35.2.1 Function definition

# Create a function
add <- function(x, y) {
  x + y
}

# Argument by order
add(1, 2)
[1] 3
# Argument by name
add(x = 1, y = 2)
[1] 3
# Vector arguments
add(1:2, 3:4)
[1] 4 6
# Vector and scalar argument
add(1:2, 3)
[1] 4 5
add(1:2, 3:5)
Warning in x + y: longer object length is not a multiple of shorter object
length
[1] 4 6 6

35.2.2 Default arguments

# Define function
add <- function(x = 1, y = 2) {
  x + y
}

# Uses both default arguments
add()
[1] 3
# Only uses second default argument
add(3)
[1] 5
# Specify argument by name
add(y = 5)
[1] 6

35.2.3 Explicit return

R functions will return the last value created, but better practice is to explicitly return using the return() function.

# Create function with explicit return
add <- function(x, y) {
  return(x + y)
}

# Run function
add(1, 2)
[1] 3

Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE If you don’t find the character, you can return FALSE.

# Define function to check string for a character
is_char_in_string <- function(string, char) {
  for (i in 1:nchar(string)) {
    if (char == substr(string, i, i))
      return(TRUE)
  }
  return(FALSE)
}

# Examples
is_char_in_string("this is my string", "a")
[1] FALSE
is_char_in_string("this is my string", "s")
[1] TRUE

35.2.4 Error handling

Errors will automatically be produced for the underlying functions you used.

# Error since "a" is not numeric
add(1, "a")
Error in x + y: non-numeric argument to binary operator

35.2.4.1 message()

There are a variety of ways to communicate to the user. Use message() for communicating a message.

# Create function with message
add <- function(x, y) {
  message(paste(x, "+", y, "="))
  return(x + y)
}

# Example message
add(1, 2)
1 + 2 =
[1] 3

35.2.4.2 warning()

Use warning() when you think the results may not be what the user is expecting.

add <- function(x, y) {
  if (length(x) != length(y))
    warning("'x' and 'y' have inequal length.")
  return(x + y)
}

# No warning
add(1, 2)
[1] 3
# Warning 
add(1:2, 3)
Warning in add(1:2, 3): 'x' and 'y' have inequal length.
[1] 4 5

35.2.4.3 stop()

Use stop when it is clear the user is not doing what they intend.

# Define function with 
add <- function(x, y) {
  if (!is.numeric(x) | !is.numeric(y))
    stop("Either 'x' or 'y' or both are not numeric!")
  return(x + y)
}

# No Error
add(1, 2)
[1] 3
# Error
add("a", 2)
Error in add("a", 2): Either 'x' or 'y' or both are not numeric!

As shown previously, this would have caused an error in our original add() function. Perhaps, this version of the error is more helpful.

35.2.4.4 stopifnot()

The stopifnot() function can be used to construct reasonably informative error messages.

# stopifnot() example
add <- function(x, y) {
  stopifnot(is.numeric(x))
  stopifnot(is.numeric(y))
  return(x + y)
}

# No error
add(1, 2)
[1] 3

Now with an error.

add(1, "b")
Error in add(1, "b"): is.numeric(y) is not TRUE

35.3 Function issues

These are some issues I want you to be aware of so you (I hope) avoid issues in the future.

35.3.1 Argument vs object assignment

Specifying argument values must be done using =. You can simultaneously define R objects using <- when specifying argument values. Generally, this should be avoided.

# Define function
my_fancy_function <- function(x, y) {
  return(x + y*100)
}

What is the result of the following?

# Weird assignment
my_fancy_function(y <- 5, x <- 4)
[1] 405

What happened? We assigned y the value 5 and x the value 4 outside the function. Then, we passed y(5) as the first argument of the function and x(4) as the second argument fo the function.

This was equivalent to

# Prefer assignment outside the function
y <- 5
x <- 4
my_fancy_function(x = y, y = x)
[1] 405

So, when assigning function arguments, use =. Also, it is probably helpful to avoid naming objects the same name as the argument names.

35.3.2 Scoping

R functions will look outside their function if objects are missing.

# Define function with missing object
f <- function() {
  return(y) # y was never defined
}

What is the result of the following?

# What will this result be.
f()
[1] 5

Basically, R searches through a series of environments to find the variable called y.

But, if you change an object’s value inside the function, this will not be retained outside the function,

# Create function
f <- function() {
  a <- a + 1 # change object's value
  print(a)
}

# Create an object
a <- 1
f()
[1] 2
a # object's value is not changed outside the function
[1] 1

35.3.3 Closure errors

Sometimes you get baffling error messages due to closure errors or special errors.

mean[1]
Error in mean[1]: object of type 'closure' is not subsettable
log[1]
Error in log[1]: object of type 'special' is not subsettable

This is related to functions having a typeof closure or special.

# typeof
typeof(mean)
[1] "closure"
typeof(log)
[1] "special"

You will see closure errors much more commonly than special errors. Both of these errors indicate problems using a function.

35.3.4 Generic functions

Generic functions in R will use the class() of the first argument to determine what specific version of the function will be used.

35.3.4.1 mean()

The mean() function is an example of a generic function.

# UseMethod() indicates a generic function
print(mean)
function (x, ...) 
UseMethod("mean")
<bytecode: 0x106a43188>
<environment: namespace:base>

Take a look at the help file

# Look at the help file
?mean

Notice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.

# Using mean() on different object types
mean(1:5)
[1] 3
mean(as.Date(c("2023-01-01","2022-01-01")))
[1] "2022-07-02"

I bring up generic functions primarily to point out that it can be hard to track down the appropriate helpfile. Generally you will look up <function>.<class>.

For example,

# Determine the class
class(as.Date(c("2023-01-01","2022-01-01")))
[1] "Date"

So you need to look up the helpfile for mean.Date().

# Look up the function
?mean.Date

This didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.

# Integer
class(1:5)
[1] "integer"
# Try mean.integer help
?mean.integer

There is typically a default method that will be used if a specific method can’t be found.

# Try mean.default
?mean.default

35.3.4.2 summary()

Another function that we have used for multiple different data types is summary().

# Various uses of the generic summary function
summary(ToothGrowth$len)                    # numeric
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.20   13.07   19.25   18.81   25.27   33.90 
summary(ToothGrowth$supp)                   # factor
OJ VC 
30 30 
summary(ToothGrowth)                        # data.frame
      len        supp         dose      
 Min.   : 4.20   OJ:30   Min.   :0.500  
 1st Qu.:13.07   VC:30   1st Qu.:0.500  
 Median :19.25           Median :1.000  
 Mean   :18.81           Mean   :1.167  
 3rd Qu.:25.27           3rd Qu.:2.000  
 Max.   :33.90           Max.   :2.000  
summary(lm(len ~ supp, data = ToothGrowth)) # lm object

Call:
lm(formula = len ~ supp, data = ToothGrowth)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.7633  -5.7633   0.4367   5.5867  16.9367 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   20.663      1.366  15.127   <2e-16 ***
suppVC        -3.700      1.932  -1.915   0.0604 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.482 on 58 degrees of freedom
Multiple R-squared:  0.05948,   Adjusted R-squared:  0.04327 
F-statistic: 3.668 on 1 and 58 DF,  p-value: 0.06039

Take a look at the helpfiles for the different summary functions

# Summary function helpfiles
?summary
?summary.numeric
?summary.factor
?summary.data.frame
?summary.lm

35.3.5 … argument

Some functions have a ... argument. This argument will get expanded by the underlying code and treated appropriately.

# Sum helpfile
?sum

For the sum() function, it will sum everything.

# Sum scalars
sum(1,2,3)
[1] 6
# Sum a vector
sum(5:6)
[1] 11
# Sum scalars and vetor
sum(1,2,3,5:6)
[1] 17

Typos get ignored

# Typo in argument name
sum(c(1,2,NA), na.mr = TRUE) # vs 
[1] NA
sum(c(1,2,NA), na.rm = TRUE) 
[1] 3

35.4 Suggestions

  • Define functions for tasks that you do 2 or more times
  • Use informative names (verbs)
  • Use consistent return values