1 Synopsis

You can add notes about your code in comments
There are a number of different types of data in R:
- Numerics, aka numbers
- Characters, aka text strings
- Logicals, aka Booleans, aka true/false
- Nulls, aka missing data
Data can be assigned to variables for easier use
A list of data points is called a vector
Actions can be performed using functions
Data can be converted to different types

2 Comments

If you’re brand new to R and coding, the first thing to know about is comments.

These are lines in your program that do absolutely nothing - they are ignored when your program is run. But they are very useful for making notes for yourself and, especially, for others who will have to read your code.

You write a comment by starting your line with a hash “#”:

# This is a comment.
# It will be ignored by your program.
# But you can use it to write notes that explain what you are doing.

If you are using RStudio, the keyboard shortcuts to comment and uncomment lines are:

Windows/Linux: Ctrl+Shift+C

macOS: Cmd+Shift+C

Don’t underestimate the power of comments! Just because your code makes sense to you now while you’re writing it doesn’t mean it always will. It’s surprisingly common for someone to re-open a piece of code they haven’t worked on for a considerable amount of time only to have completely forgotten what their thought process was, so use your comments as notes to yourself! Similarly, if someone else needs to use your code for something they will need to work out what it does; an almost impossible task if you don’t explain yourself…

3 Numerics

As the name suggests, these are numbers. If you type a plain number R will output it to the console, which is shown here by the box starting with “## [1]”:

# This is just a number
27.03

## [1] 27.03

# You can use numbers to do arithmetic
5 + 7

## [1] 12

The [1] indicates that it is the first thing in the output.

4 Variables

A number is useless on its own. In order to use it you need to assign it to a variable:

# Assigning numbers to variables
x <- 5
y <- 7

In the above code snippet, the number “5” is being assigned to the variable “x” and the number “7” is being assigned to the variable “y”. The arrow notation “<-” tells you that whatever is on the right is being assigned to the variable on the left. Note that it can only work one way: you can only have a left-pointing arrow, never a right-pointing one.

Notice that when you assign a number to a variable it no longer gets outputted to the console; there was no output with “## [1]” below the above snippet. Variable assignment suppresses output!

You can use these numbers by calling the variables:

# You can use variables to do arithmetic by calling them
# If you assign the answer to a third variable (z), the output will be
# suppressed:
z <- x + y
# If you don't assign the answer to a variable, it will output to the console:
x + y

## [1] 12

4.1 The Equals Sign

Sometimes, instead of using the arrow notation, R programmers will use an equals sign when assigning values to variables (=). This is 100% equivalent to using the leftwards arrow, so it makes no difference. However, some people who are used to coding in other languages such as Python are more used to the equals sign, so you will often see it.

# The equals sign is the same as the leftwards arrow in R
x = 6
# You may find this to be a better notation
x = x + 1
# Of course, the answer will be the same
x

## [1] 7

Here’s something that often confuses beginner coders: the statement x = x + 1. In mathematics, this statement doesn’t make any sense! There’s no value of x that will satisfy the equation! However, in programming, this statement is not an equation but an assignment: you are giving a variable x a value that is equal to that of an existing variable, plus one. Take a look at the above snippet again:

# Give the variable 'x' a value of six
x = 6
# Give the variable 'x' a value that is equal to the value of 'x', plus one
x = x + 1
# 'x' should now equal seven. Check that it does by calling it and outputting
# its value to the console:
x

## [1] 7

4.2 Variable Names

Note: there are rules that have to be followed when making variable names:

They may not start with a number
They may not contain spaces

Additionally, there are a few conventions that programmers tend to adhere to:

Variable names should be all lowercase (unless it makes sense to use uppercase)
Words should be separated with underscores
If the number has a unit, show it in the variable name

The following are GOOD examples of variable names:

# Lowercase
people <- 34
# Use underscores to separate words
final_answer <- 100
# A person's height in centimetres
height_cm <- 170

5 Characters (aka Strings)

Characters are strings of letters, and as such they will often be referred to as ‘strings’. You write a string by using single or double quotation marks:

# This is a string, representing text
my_string <- "Hello, World!"

Note: a number can be written as a character, but if you do this it will be interpreted by your program as text, not as a number! This is something that confuses many beginner coders:

# This is a string, so you will not be able to use it to do maths!
x <- "5"
# If you try to run the following code, you will get an error:
x + 2

6 Logicals (aka Booleans)

These are TRUE and FALSE:

passed_test <- TRUE
largest_prime_number <- FALSE

Note that the values “TRUE” and “FALSE” above are not strings because we have not used quotation marks. They are treated as their own object type by R.

Booleans become useful when we start to use “if statements” later on. Essentially, they are used to test things: we can ask R if something is true or not.

7 Nulls

Nulls are, as the name suggests, equal to nothing. While it may seem weird that you can set a variable to nothing it becomes useful when you’re dealing with missing data.

In R, you have three null characters:

NaN - “not a number”. This is treated as a number/numeric.
NA - “not available”. This is treated as a logical (so you have TRUE, FALSE and NA as the three types of logicals).
NULL. This is treated as a data type of its own.

not_a_number <- NaN
not_available <- NA
null_character <- NULL

8 Built-In Functions

Functions are bits of code that can be used again and again just by calling the function’s name. You can tell that a function is being used because it will have (round brackets).

The first function you should know about is the print() function. This function takes a string and outputs it to the console:

print("Hello, World!")

## [1] "Hello, World!"

See what happened there? We called the function print() by using its name and round brackets. Then we told it what to print - the string 'Hello, World!'. This string is the argument of the function. You pass an argument to a function.

Let’s look at a different function, typeof(). This can take any piece of data as an argument and will return (ie output) the type of data that the argument is. We can then print that answer to the console with the print() function. Take a look:

# The type of a number depends on how it is stored in the computer's memory. A
# "double" means it is stored as a double-precision value.
print(typeof(7))
# As mentioned earlier, a string is made up of characters
print(typeof("Hello, World!"))
# A logical
print(typeof(TRUE))
# A numeric null, stored as a double-precision value
print(typeof(NaN))
# A logical null
print(typeof(NA))
# A pure null
print(typeof(NULL))

## [1] "double"
## [1] "character"
## [1] "logical"
## [1] "double"
## [1] "logical"
## [1] "NULL"

9 Vectors

A very common built-in function in R is the concatenate function, shortened to c() because it is used so often. This concatenates (joins) all the arguments that are passed to it into one element. This element, made up of many numbers and strings in order, is called a vector.

# A vector of numerics
x <- c(4, 5, 6, 7, 8)
# A vector of strings
y <- c("One", "Two", "Three")
# A vector of logicals
z <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

The typeof() function returns the data type of the components of the vector:

# A vector of numerics
x <- c(4, 5, 6, 7, 8)
print(typeof(x))

## [1] "double"

Note: if you concatenate elements with different data types together into the same vector, they will all be coerced into characters:

# A mixed vector
x <- c(1, "Two", TRUE)
print(x)

## [1] "1"    "Two"  "TRUE"

print(typeof(x))

## [1] "character"

10 Type Conversion

The last functions we will learn about on this page are the as.character() and as.numeric() functions. These convert numerics to characters and characters to numerics, respectively:

# Start with a number
x <- 15
# Convert it to a character
x <- as.character(x)
# Check that it is indeed now a character
print(typeof(x))

## [1] "character"

# Now take a character that is the text version of a number
y <- "15"
# Convert it to a numeric
y <- as.numeric(y)
# Check that it is indeed now a double-precision numeric
print(typeof(y))

## [1] "double"

11 Summary

You can add notes about your code in comments
There are a number of different types of data in R:
- Numerics, aka numbers
- Characters, aka text strings
- Logicals, aka Booleans, aka true/false
- Nulls, aka missing data
Data can be assigned to variables for easier use
A list of datapoints is called a vector
Actions can be performed using functions
Data can be converted to different types

⇦ Back

Introduction to R:Comments, Variables and Data Types