1 Concatenate Strings

There are a number of different ways to combine strings depending on the exact functionality you want:

Use paste() or paste0() if you want a character object to be returned
Use c() if you want a vector to be returned
Use cat() if you want nothing to be returned (ie you just want the string to print to screen)

1.1 Return a Character Object

paste() combines strings by adding a space in between them. The result (which is a character object) can then be assigned to a variable:

st <- paste("Hello", "World")
print(st)

## [1] "Hello World"

The default behaviour of adding a space in between each string can be changed by using the sep keyword argument. For example, you can have a comma and a space between each string by doing the following:

st <- paste("Hello", "World", sep = ", ")
print(st)

## [1] "Hello, World"

If you want to have no space between your strings you can either use paste() with no separation or you can use paste0() which does this by default:

st <- paste("Hello", "World", sep = "")
print(st)

## [1] "HelloWorld"

st <- paste0("Hello", "World")
print(st)

## [1] "HelloWorld"

1.2 Return a Vector

The c() command stands for “Concatenate” and is similar to paste() in that it also combines/concatenates objects and allows you to assign them to variables, but c() turns them into vectors (a list of objects) instead of characters (a single object):

st <- c("Hello", "World")
print(st)

## [1] "Hello" "World"

1.3 Print with no Return

The cat() command is also short for “conCATenate”, also performs the same function as paste(), also uses a single space by default and can also use the sep keyword argument to change this default:

cat("Hello", "World")

## Hello World

The difference between paste() and cat() is that cat() does not return a value, it only prints to the console. This means that it cannot be saved to a variable like this:

a <- cat("Hello", "World")

The code below prints NULL because nothing was returned by the cat() function and so nothing was saved to the a variable:

print(a)

## NULL

2 Split Strings

Use strsplit() to do the opposite of concatenation:

st <- "Split the words in a sentence."
st <- strsplit(st, " ")
print(st)

## [[1]]
## [1] "Split"     "the"       "words"     "in"        "a"         "sentence."

3 Search, Find, Lookup

You can:

Search a string to see if it contains a certain letter or sub-string
Find where that letter or sub-string is located within the string
Lookup what letter or sub-string is at a certain location within a string

3.1 Search for a Character or Sub-string

Does a string contain the letter(s) you are looking for? Search the string and return a Boolean (true or false) to find out. This is done with the grepl() command:

# Does the letter "o" appear in "Hello World"?
grepl("o", "Hello World", fixed = TRUE)

## [1] TRUE

In case you were wondering, this function’s name is short for “globally search for a regular expression and print matching lines - logical”. You can now see why we shorten it!

You may have noticed that we set the keyword argument fixed to TRUE - this causes the grepl() function to search for exactly the string that was provided. If this is omitted the default behaviour is to treat the string being searched for as a regular expression, which may create unexpected results.

3.1.1 Search for a Character or Sub-string at the Start or End of a String

Does a string start with or end with a certain thing?

startsWith("alphabet", "a")

## [1] TRUE

endsWith("Filename.csv", ".csv")

## [1] TRUE

3.2 Find a Character or Sub-string

Now that we know our string contains the letter we are looking for, get the indices (positions) where that certain character is by using the gregexpr() command:

st <- "Hello World"
idx_o <- unlist(gregexpr(pattern = "o", st))
print(idx_o)

## [1] 5 8

In this example, the letter “o” can be found at both position 5 and 8 in the string “Hello World”.

3.3 Lookup the Characters at Particular Indices

You can see what characters are at a certain location within a string by using the substr() command. This takes the start and stop keyword arguments, which are pretty self-explanatory (they tell the command where to start and stop looking):

st <- "Hello World"
sub <- substr(st, start = 6, stop = 8)
print(sub)

## [1] " Wo"

You can lookup all the characters from a given start point to the end of a string by using the total number of characters in a string (nchar()) as the stop point:

sub <- substr(st, start = 6, stop = nchar(st))
print(sub)

## [1] " World"

Similarly, you can lookup all the characters from the start of a string until a given stop point by using “1” as the start point:

sub <- substr(st, start = 1, stop = 8)
print(sub)

## [1] "Hello Wo"

4 Delete, Overwrite, Replace

You can:

Delete all the characters before or after a given point in a string
Overwrite the characters at a given location within a string
Replace all occurrences of a certain letter or sub-string within a string with a given replacement

4.1 Delete Characters

This is again done using the substr() function but this time we are looking up text before and after a character, not an index.

Remove the text before a certain character:

st <- "Hello World"
idx_o <- unlist(gregexpr(pattern = "o", st))
sub <- substr(st, start = idx_o[1], stop = nchar(st))
print(sub)

## [1] "o World"

Remove the text after a certain character:

sub <- substr(st, start = 1, stop = idx_o[1])
print(sub)

## [1] "Hello"

As a more practical example, here’s how to remove the extension from a filename:

# Remove extension from filename
filename <- "My_File.txt"
idx <- unlist(gregexpr(pattern='\\.', filename))
filename_root <- substr(filename, start = 1, stop = idx[1] - 1)
print(filename_root)

## [1] "My_File"

4.2 Overwrite Characters

Use the str_sub() command from the stringr library:

library(stringr)

st <- "Hello World"
str_sub(st, 6, 6) <- "_"
str_sub(st, 8, 7) <- "O"
print(st)

## [1] "Hello_WOorld"

4.3 Replace Characters

Use the gsub() command:

st <- "Hello World"
st <- gsub("ello", "i", st)
print(st)

## [1] "Hi World"

4.4 Trim White Space

White space is what is created by the space bar or the tab key. You can trim a string by removing any white space at its start and end using the trimws() function:

str <- "  Hello World  "
print(trimws(str))

## [1] "Hello World"

5 Change the Case

A string can be UPPERCASE, lowercase, Sentence case or Title Case:

st <- "The quick, brown Fox jumped over the lazy Dog"
print(str_to_upper(st))

## [1] "THE QUICK, BROWN FOX JUMPED OVER THE LAZY DOG"

print(str_to_lower(st))

## [1] "the quick, brown fox jumped over the lazy dog"

print(str_to_sentence(st))

## [1] "The quick, brown fox jumped over the lazy dog"

print(str_to_title(st))

## [1] "The Quick, Brown Fox Jumped Over The Lazy Dog"

These functions come from the stringr library.

6 Formatted Output

If you want to print a variable as part of a string using a specific format, use the sprintf() function (string print formatted) along with the “%” character to indicate where you want the variable to be inserted. You also need to have a letter after the “%” to indicate what type of format to use:

6.1 Number formats

When working with numbers, use the “f” for ‘floating-point’ in conjunction with “%”:

x <- 1
sprintf("x = %f", x)

## [1] "x = 1.000000"

Use a decimal point and a number to set the number of decimal places you see:

x <- 1
sprintf("x = %.2f", x)

## [1] "x = 1.00"

Use a number before the decimal point to set the ‘width’ of the number, ie how many spaces it takes up (including the decimal point). If the width is set to be wider than the number is long, the extra space with be filled up with blanks:

x <- 1
sprintf("x = %8.2f", x)

## [1] "x =     1.00"

6.2 String formats

Instead of the “f”, use an “s” together with the “%”.

You can print to the console immediately:

st <- "Hello world"
sprintf("Output text is: %s", st)

## [1] "Output text is: Hello world"

…or you can write the value to a variable and print it to the console at a later stage:

st <- "Hello world"
st <- sprintf("Output text is: %s", st)
print(st[1])

## [1] "Output text is: Hello world"

6.3 Using formatted output with a dataframe

If you extract a single string from any vector or dataframe, it will be treated in the same way by the sprintf() function:

df <- data.frame(
    name <- c("Alfa", "Bravo", "Charlie")
)
sprintf("Output text is: %s", df$name[1])

## [1] "Output text is: Alfa"

7 Unicode and Emojii

Unicode characters can be called by using \U and referencing their unicode number. Some can then simply be printed:

st1 <- "\U03BC"  # mu
st2 <- "\U03B5"  # epsilon
st3 <- "\U03C9"  # omega
print(c(st1, st2, st3))

## [1] "μ" "ε" "ω"

Others (eg emojis) cannot be printed this easily and need the special utf8_print() function from the utf8 library:

st1 <- "\U2705"  # white heavy check mark
st2 <- "\U274c"  # cross mark
st3 <- "\U0001f609"  # winking face
utf8::utf8_print(c(st1, st2, st3))

## [1] "✅" "❌" "😉"

⇦ Back

Introduction to R:Strings and Characters