8  Scripts

Author

Jarad Niemi

R scripts are the primary file type for running analyses in R. By convention, R scripts are plain text files that use the .R extension.

In this chapter, we will run through a few R scripts as examples to demonstrate a data science pipeline. On aspect that will not be represented very well is the feedback loops that occur during data analysis.

8.1 Code

Here is a basic script that utilizes R to calculate the area and circumference of a circle.

Copy-and-paste this following code into a new script and then run the code.

radius <- 3

area          <- pi * radius^2 
circumference <- 2 * pi * radius

area
circumference

For readability in scripts, it is often helpful to align code. Notice the area and circumference lines are aligned at the arrow. This helps the reader focus on the important aspects of the code: the new objects being constructed and the formulas for their calculation.

8.2 Comments

The previous script was sufficiently short and used informative variable names and thus did not require the user to include any comments. Most code should include sufficient comments so that a later user (who might be yourself) can understand what the code is doing. The script below includes comments to help the user understand what is being done.

Copy-and-paste this following code into a new script and then run the code.

# Cone dimensions
r <- 3 # radius
h <- 8 # height

pi * r^2 * h / 3 # volume

surface_area <- pi * r * (r + sqrt(h^2 + r^2))
base_area    <- pi * r^2

# lateral area
pi * r * sqrt(h^2 + r^2) # direct calculation
surface_area - base_area # by calculating difference

A comment is indicated by the # character and anything after that character is not executed. Thus we can put the comment on its own line or we can put the comment at the end of a line.

The first comment indicates to the user that we are working with a cone and not a cylinder (which would also have a radius and height).

The next comments indicate that we will use r and h as shorthand for radius and height. This allows the calculation lines to not be so wide and use formulas we have memorized (at some point in the past).

8.3 Sections

It is helpful to break your code up into sections. You can do this simply by creating a simple comments, e.g.

# Section

But this can often get overlooked quite easily. A more dramatic section uses a # followed by a repeating symbol. Here are a couple of examples

################################################################################
# Section
################################################################################
#-------------------------------------------------------------------------------
# Section
#-------------------------------------------------------------------------------
#===============================================================================
# Section
#===============================================================================

8.4 Metadata

Metadata can be included at the beginning of a script as a set of comments. I encourage you to always include the author (yourself), date(s) the file was written, and the purpose of the script. So metadata might look something like the following

# Author:  Jarad Niemi
# Date:    2024-01-24
# Purpose: Demonstrate the use of metadata

8.5 Packages

R packages used within an R script should come at the beginning of the file but after the metadata. For example,

# Author:  Jarad Niemi
# Date:    2024-01-24
# Purpose: Demonstrate the loading of a package immediately after the metadata
#-------------------------------------------------------------------------------
library("tidyverse")

8.6 Whitespace

R ignores whitespace, but you can use this to your advantage to create easy to read code. For example, compare the following two scripts that do exactly the same thing.

library("tidyverse")
ChickWeight|>filter(Time==0)|>group_by(Diet)|>summarize(mean=mean(weight),sd=sd(weight),n=n())
library("tidyverse")
ChickWeight |>
  filter(Time == 0) |>
  group_by(Diet) |>
  summarize(
    mean = mean(weight),
    sd   = sd(weight),
    n    = n())

Which of these two scripts is easier to read? Can you guess what the scripts do (even though we haven’t talked about any of this functionality)?

8.7 Style

There are a number of aspects of style that relate to writing a good R script. I suggest you take a look at the tidyverse style guide for advice.

In RStudio, you can utilize

  • Code > Reindent Lines
  • Code > Reformat Code

to help reformat code. From this post, Reindent Lines, as the name suggests, will only Reindent Lines while Reformat Code is more aggressive.

An alternative is to use the styler package to help with a variety of tasks related to making your code easily readable.