34  Objects

Author

Jarad Niemi

R Code Button

This page provides an introduction to commonly used R object types and their dimensions. The page begins with a discussion of object dimensions including scalar, vector, matrices and arrays. It then discusses some of the basic object types including logical, numeric, and character. The page finishes with a discussion of advanced objects including factors, dates, data frames, and lists.

34.1 Dimensions

34.1.1 Scalars

So far we have dealt with all R objects as scalars, i.e. a single value of that data type. For example, the following are all scalars.

# Scalars
TRUE
[1] TRUE
2.5
[1] 2.5
"a scalar"
[1] "a scalar"

Note that “a scalar” is considered a scalar character.

34.1.2 Vectors

If we want to combine multiple scalars, we will typically put them into a vector. We construct vectors using the c() function.

# Vectors
c(TRUE, FALSE)
[1]  TRUE FALSE
c(2.5, 6, 7.2)
[1] 2.5 6.0 7.2
c("a", "character", "vector")
[1] "a"         "character" "vector"   

Vector length can be determined using the length() function.

# Vector length
vl <- c(TRUE, FALSE)
length(vl)
[1] 2
vn <- c(1, 2, 3, 4)
length(vn)
[1] 4
vc <- c("a", "character", "vector")
length(vc)
[1] 3

The data type can be determined using the class() function.

# Vector type
class(vl)
[1] "logical"
class(vn) 
[1] "numeric"
class(vc)
[1] "character"

Sometimes it is useful to give names to elements of the vector. Names can be assigned using the names() function.

# Surface area of Great Lakes
area <- c(82100, 57800, 59600, 25670, 19010) # km^2

# Add names
names(area) <- c("Superior", "Ontario", "Huron", "Michigan", "Erie")

area
Superior  Ontario    Huron Michigan     Erie 
   82100    57800    59600    25670    19010 

Sometimes these names are created from a function.

# Named vector
table(OrchardSprays$treatment)

A B C D E F G H 
8 8 8 8 8 8 8 8 

34.1.2.1 Accessing

Accessing vector elements can be done using indices or a logical vector. Indexing in R starts at 1 (rather than 0 like in some languages). To utilize a logical vector, the vector

# Construct vector
v <- c("one", "two", "three")

# Access using indices
v[1] 
[1] "one"
v[-2]
[1] "one"   "three"
v[2:3]
[1] "two"   "three"
v[-c(1,2)]
[1] "three"
# Access using logical vector
v[c( TRUE, FALSE, FALSE)]
[1] "one"
v[c( TRUE, FALSE,  TRUE)]
[1] "one"   "three"
v[c(FALSE,  TRUE,  TRUE)]
[1] "two"   "three"
v[c(FALSE, FALSE, FALSE)]
character(0)

When using indices, you can repeat elements.

# Index repeats
v[c(1,1,2,2,3)]
[1] "one"   "one"   "two"   "two"   "three"

Care is needed when using a logical vector because the logical vector will be (if possible) replicated to be the same length as the original vector. This replication

# TRUE is replicated 3 times
v[TRUE]
[1] "one"   "two"   "three"
# New vector
v <- LETTERS[1:4]
v[c(TRUE, FALSE)] # Replicated 
[1] "A" "C"

If the vector has names, the names can be used to access member elements.

area[c("Ontario", "Erie")]
Ontario    Erie 
  57800   19010 

34.1.3 Matrices

We can extend from vector, a one-dimensional object, to a matrix, a two-dimensional object. There are a variety of ways to construct a matrix including matrix(), cbind(), and rbind().

# Construct a matrix 
matrix(1:6, nrow = 3, ncol = 2)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
# Construct a matrix inputting elements by rows
matrix(1:6, nrow = 3, ncol = 2, byrow = TRUE)
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
# Construct a matrix be binding columns
cbind(1:3, 4:6)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
# Construct a matrix by binding rows
rbind(1:3, 4:6)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

The dimensions of the matrix can be found using dim(), nrows(), and ncols().

# Dimensions
m <- matrix(1:6, nrow = 2)

dim(m)  # row by column
[1] 2 3
nrow(m)
[1] 2
ncol(m)
[1] 3

The type of matrix can be determined using typeof().

# Matrix type
typeof(matrix(c(TRUE, FALSE), nrow = 1))
[1] "logical"
typeof(matrix(c(1L, 2L),      nrow = 1))
[1] "integer"
typeof(matrix(c("a", "b"),    nrow = 1))
[1] "character"

A matrix can have row and column names.

# Construct a matrix
m <- cbind(
  area,
  c(12100, 4920, 3540, 484, 1640)
)

# Assign row and column names
rownames(m) # taken from area
[1] "Superior" "Ontario"  "Huron"    "Michigan" "Erie"    
colnames(m)[2] <- "volume"

# Print matrix
m
          area volume
Superior 82100  12100
Ontario  57800   4920
Huron    59600   3540
Michigan 25670    484
Erie     19010   1640

34.1.3.1 Accessing

Matrices can be accessed in the same way vectors were, but now there are two dimensions. If a dimension is blank, then we will get all elements in that dimension.

# Matrix accessing
m[1,2]
[1] 12100
m[1:2,]
          area volume
Superior 82100  12100
Ontario  57800   4920
m[-3,]
          area volume
Superior 82100  12100
Ontario  57800   4920
Michigan 25670    484
Erie     19010   1640
m[,"volume"]
Superior  Ontario    Huron Michigan     Erie 
   12100     4920     3540      484     1640 
m["Ontario",]
  area volume 
 57800   4920 

It is common to forget the comma when and you get something you probably don’t expect.

# Forgot the comma
m[4]
[1] 25670
m[7]
[1] 4920

34.1.4 Arrays

Arrays can be constructed that extend beyond the two-dimensional objects to higher dimensions. These are not commonly used and thus we will only show a quick example of how to construct one.

# Construct an array
a <- array(1:12,
           dim = 4:2)

# Print the array
a
, , 1

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

, , 2

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

34.2 Basic Types

34.2.1 Logical

Logical (or boolean) variables have only two values: TRUE and FALSE.

# Logical
TRUE
[1] TRUE
FALSE
[1] FALSE
is.logical(TRUE)
[1] TRUE
is.logical(FALSE)
[1] TRUE

other variable types are not logicals

is.logical(1)
[1] FALSE
is.logical("a")
[1] FALSE

34.2.1.1 Use TRUE and FALSE

Returning to our discussion of logicals. By default, T and F are assigned the values TRUE and FALSE.

# Use TRUE and FALSE
T
[1] TRUE
F
[1] FALSE
is.logical(T)
[1] TRUE
is.logical(F)
[1] TRUE

Caution: the objects T and F can be redefined.

# Confusing redefinition
T <- "a"

is.logical(T)
[1] FALSE

34.2.2 Numeric

Numeric is the default data type for any number.

# Numeric
4
[1] 4
is.numeric(4)
[1] TRUE
2.5
[1] 2.5
is.numeric(2.5)
[1] TRUE
is.numeric(pi)
[1] TRUE
x <- sqrt(2)
is.numeric(x)
[1] TRUE

34.2.2.1 Constructing

Numeric vectors can be constructed in a variety of ways.

# Sequential integer vector construction
2:5 
[1] 2 3 4 5
5:1
[1] 5 4 3 2 1
-1:3
[1] -1  0  1  2  3
1:5.99
[1] 1 2 3 4 5
# General vector construction
seq(from = 2, to = 10, by = 2)
[1]  2  4  6  8 10
seq(from = 0, to = 1,  by = 0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0

34.2.2.2 Integer

In R, a numeric can be thought of as equivalent to a double in other languages. You can also explicitly create and use integers. To create an integer postpend the integer with L.

# Integer
3L
[1] 3
is.integer(1L)
[1] TRUE

I mention integers so that if you see code where a number is postpended with an L you will know what is going on. Otherwise, this distinction is not important because 1) an integer is treated as a numeric by R and 2) when doing any calculation, the integer is converted to a numeric for calculations.

# Integer conversion
x <- 3L
is.integer(x)
[1] TRUE
is.numeric(x)
[1] TRUE
is.integer(sqrt(x^2))
[1] FALSE
is.numeric(sqrt(x^2))
[1] TRUE

34.2.3 Character

A character is the default data type for any non-logical and non-numeric value. A character variable must be enclosed in quotes. If not, R will look for an object with the same name.

# Character
is.character("a")
[1] TRUE
is.character("2b")
[1] TRUE
is.character("2023-07-06")
[1] TRUE
is.character("Is this really a character?")
[1] TRUE

As you can see from the example above, the character refers to single characters but also strings of multiple characters.

To construct character vectors, the paste() and paste0() functions are often useful. By default, the paste() includes a space as a separator while the paste0() function has no separation.

# Example paste() usage
paste( "group", 1:2)
[1] "group 1" "group 2"
paste0("group", 1:2)
[1] "group1" "group2"
paste( "group", 1:2, sep = "-")
[1] "group-1" "group-2"

To extract individual characters, sue the substr() function.

# Extract characters
s <- "This is my very long character."
substr(s, start = 1, stop = 4)
[1] "This"
substr(s, start = nchar(s)-9, stop = nchar(s))
[1] "character."

As we will see characters form the basis for many advanced data types including dates and factors.

34.3 Advanced Types

Here we introduce a variety of advanced data types including factors, dates, data frames, and lists.

34.3.1 Factors

Factors are a special type of character vector that allows the user more control over how the elements of the vector are used in visualizations and modeling.

Character vectors are inherently ordered alphabetically with lowercase letters coming before their uppercase equivalents.

# Character vector sorting examples
v <- c("A", "B", "a", "b")
sort(v)
[1] "a" "A" "b" "B"
v <- c("1", "2", "10")
sort(v)
[1] "1"  "10" "2" 
# Alphabetical ordering may not be desired
treatment <- c(
  paste0("A_dose", c(10,20,100)),
  paste0("D_dose", c(10,20,100)),
  "control")

sort(treatment)
[1] "A_dose10"  "A_dose100" "A_dose20"  "control"   "D_dose10"  "D_dose100"
[7] "D_dose20" 

If we are creating a visualization or performing an analysis, the order we probably wanted was “control” followed by treatment A doses 10, 20, and 100 followed by treatment D doses 10, 20, and 100.

To create a factor vector use the factor() function.

# Construct factor
f <- factor(c("A","A","B","B","B"))

# Check if `f` is a factor
is.factor(f)
[1] TRUE
# Check class of `f`
class(f)
[1] "factor"
# Look at `f`
f
[1] A A B B B
Levels: A B

Compare the output above to a character vector:

# Convert factor to character
as.character(f)
[1] "A" "A" "B" "B" "B"

Notice that there are no quotes and there is a second line that indicates the levels of the character vector.

The internal representation of factors in R is a numeric vector with a lookup table for the levels of the factor.

# Numeric vector
as.numeric(f)
[1] 1 1 2 2 2
# Lookup table
levels(f)
[1] "A" "B"

You can use the levels function to change the values for the factor.

# Change levels
levels(f) <- c("C","D")
f
[1] C C D D D
Levels: C D

Rather than changing the levels this way, I suggest you use forcats::fct_recode() or something similar.

34.3.1.1 Reorder levels

By default, factor levels will be ordered alphabetically. For example,

# Factor default ordering
f <- factor(rep(c("Dose10","Dose20","Dose100"), each = 2))
f
[1] Dose10  Dose10  Dose20  Dose20  Dose100 Dose100
Levels: Dose10 Dose100 Dose20
levels(f)
[1] "Dose10"  "Dose100" "Dose20" 

This ordering will affect all aspects of statistical analysis including descriptive statistics and visualizations. For example,

# Descriptive statistics
summary(f)
 Dose10 Dose100  Dose20 
      2       2       2 

Thus, you may want to order the factors in a different order. To order the factor when constructed use the levels argument.

# Construct a factor in order
f_in_order <- factor(f, levels = c("Dose10", "Dose20", "Dose100"))

# Check factor level order
f_in_order
[1] Dose10  Dose10  Dose20  Dose20  Dose100 Dose100
Levels: Dose10 Dose20 Dose100
levels(f_in_order)
[1] "Dose10"  "Dose20"  "Dose100"
summary(f_in_order)
 Dose10  Dose20 Dose100 
      2       2       2 

In the factor function, there is an argument called ordered. This serves a different purpose and is beyond the scope of this course. Thus, just ignore the ordered argument.

Sometimes, you simply need to just reorder one level to the first position. This is particularly true when performing a regression analysis and you are trying to set the reference level.

To move a single level to the first position, use the relevel function. For example,

# Construct a factor
f <- factor(c("Control","A10","A20","A30"))

# Check factor order
levels(f)
[1] "A10"     "A20"     "A30"     "Control"
# Put `Control` as first factor level
f <- relevel(f, ref = "Control")
levels(f)
[1] "Control" "A10"     "A20"     "A30"    

In regression modeling, the first factor level will be treated as the reference level, i.e. the level associated with the intercept.

34.3.2 Dates

Dates can be extremely difficult to work with due to inconsistency in how dates are formatted to dealing with time zones.

For any organization that utilizes dates, those dates should be recorded using the ISO 8601 standard. Specifically, the date should be represented in

YYYY-MM-DD

where YYYY represents the 4 digit year, MM represents the 2 digit month, and DD represents the two digit day. For example, 2023-07-02 is July 2, 2023. Note that the preceding zeros are required.

Dates start off as a character.

# Character
d1 <- "2023-07-02"
d1
[1] "2023-07-02"
class(d1)
[1] "character"

This date can be converted to a Date object by using the as.Date() function.

# Date
d2 <- as.Date(d1)
d2
[1] "2023-07-02"
class(d2)
[1] "Date"

The date will be printed out (by default) using the YYYY-MM-DD standard.

# ISO Format
as.Date("2023-07-02")
[1] "2023-07-02"

Most other formats will not give the date you are expecting.

# Alternative formats
as.Date("07-02-2023")
[1] "0007-02-20"
as.Date("02/07/2023")
[1] "0002-07-20"
as.Date("07-02-23")
[1] "0007-02-23"
as.Date("02/07/23")
[1] "0002-07-23"

If you need to read dates that are not in the standard YYYY-MM-DD format, you can use the format argument to specify the correct format.

# Read dates using format
as.Date("07-02-2023", format = "%m-%d-%Y")
[1] "2023-07-02"
as.Date("07/02/2023", format = "%m/%d/%Y")
[1] "2023-07-02"
as.Date("07-02-23",   format = "%m-%d-%y")
[1] "2023-07-02"
as.Date("07/02/23",   format = "%m/%d/%y")
[1] "2023-07-02"

Read the helpfiles for strftime and striptime for more options for setting the format of date (and time) objects.

When printing out the date, the default is to print out using the YYYY-MM-DD standard. If you want to print out in a different format, you will need to specify the desired format using the format function.

# Format date output
d2 # default
[1] "2023-07-02"
format(d2, "%m/%d/%y")
[1] "07/02/23"
format(d2, "%a %b %d, %Y")
[1] "Sun Jul 02, 2023"

34.3.3 Data Frames

Vectors, matrices, and arrays can only have one type of data.

# Mixing types
c(TRUE, FALSE, 2)   # logicals converted to numeric
[1] 1 0 2
c("a", 2, pi)       # numeric converted to character
[1] "a"                "2"                "3.14159265358979"
c(TRUE, "a", FALSE) # logicals converted to character
[1] "TRUE"  "a"     "FALSE"
c(TRUE, "a", 2)     # everything converted to character
[1] "TRUE" "a"    "2"   
c(c(TRUE, 1), "a")  # logical converted to numeric first
[1] "1" "1" "a"

Data frames are special matrices that allow each column to be a different data type.

Construct a data frame using the data.frame function.

# Construct data frame
d <- data.frame(
  var1 = c(1:2),
  var2 = c(TRUE, FALSE),
  var3 = c("a","b")
)

is.data.frame(d)
[1] TRUE

The techniques to access matrix elements can also be used with data frames.

# Access using indices
d[1, 2]
[1] TRUE
d[1,  ]
  var1 var2 var3
1    1 TRUE    a
d[ ,-2]
  var1 var3
1    1    a
2    2    b
# Access using names
rownames(d)
[1] "1" "2"
colnames(d)
[1] "var1" "var2" "var3"
d$var1
[1] 1 2
d[,c("var2","var3")]
   var2 var3
1  TRUE    a
2 FALSE    b
# Type conversion
is.vector(d$var1) # converted to vector
[1] TRUE
is.data.frame(d[,c("var2","var3")]) # still a data frame
[1] TRUE
is.vector(    d[,"var1"])
[1] TRUE
is.data.frame(d[,"var1"])
[1] FALSE

There are a couple of other functions that making construct data frames easier. The expand.grid() function constructs a data frame with every combination of the variables provided.

# Every combination
eg <- expand.grid(
  var1 = c(1:2),
  var2 = c(TRUE, FALSE),
  var3 = c("a","b")
)

eg
  var1  var2 var3
1    1  TRUE    a
2    2  TRUE    a
3    1 FALSE    a
4    2 FALSE    a
5    1  TRUE    b
6    2  TRUE    b
7    1 FALSE    b
8    2 FALSE    b

Another function is tribble() which allows us to construct data frames in a more user-readable way. The first row in this function are the column names.

# Construct data frame by rows
d2 <- tibble::tribble(
  ~var1, ~var2, ~var3,
      1,  TRUE,   "a",
      2,  FALSE,  "b"
)

is.data.frame(d2)
[1] TRUE

Generally tidyverse functions are constructed to 1) have a data.frame as the first argument and 2) return a data.frame. This allows for a tidyverse data pipeline.

34.3.4 Lists

Vectors in R can only contain one data type. A list is an alternative type of vector that can contain any other type of object in each element of the list.

To construct a list, you can use the list() function.

# Construct a simple list
l <- list(1, "a", TRUE)

# Check if this is a list
is.list(l)
[1] TRUE

List elements can be named.

# By default there are no names
names(l) 
NULL
# Assign some names
names(l) <- c("one", "two", "three")

# View the assigned names
names(l)
[1] "one"   "two"   "three"

If the list elements are named, they can be accessed using those names and a $ sign.

l$one # first element of the list
[1] 1
l$two # second element of the list
[1] "a"

Lists can also be constructed by using the names during construction.

# Construct a list using names
l <- list(
  numbers.   = c(1, 2, 3.5),
  characters = c("a", "character", "vector", "in", "a", "list"),
  logicals   = c(TRUE, FALSE, TRUE)
)

l[[1]]       # first list element
[1] 1.0 2.0 3.5
l$characters # list element named `characters`
[1] "a"         "character" "vector"    "in"        "a"         "list"     

As we have already seen, list elements can be any type of object. Thus you can have lists within lists.

# Construct a list within a list
l <- list(
  this = 1,
  that = list(a = "list element within a list")
)

is.list(l)
[1] TRUE

Lists can be accessed in a variety of ways similar to vectors, but you need to use double square brackets.

# Access using indices
l[[1]]
[1] 1
l[[2]]
$a
[1] "list element within a list"
# Access using names
l[['that']]
$a
[1] "list element within a list"
l[['that']][['a']]
[1] "list element within a list"
# Accessing using $
l$that
$a
[1] "list element within a list"
l$that$a
[1] "list element within a list"

Many functions that produce statistical analyses have output that are lists.

# Regression
m <- lm(breaks ~ wool + tension, 
        data = warpbreaks)
is.list(m)
[1] TRUE
s <- summary(m)
is.list(s)
[1] TRUE