There are a number of different ways to combine strings depending on the exact functionality you want:
paste()
or paste0()
if you want a character object to be returnedc()
if you want a vector to be returnedcat()
if you want nothing to be returned (ie you just want the string to print to screen)paste()
combines strings by adding a space in between them. The result (which is a character object) can then be assigned to a variable:
st <- paste("Hello", "World")
print(st)
## [1] "Hello World"
The default behaviour of adding a space in between each string can be changed by using the sep
keyword argument. For example, you can have a comma and a space between each string by doing the following:
st <- paste("Hello", "World", sep = ", ")
print(st)
## [1] "Hello, World"
If you want to have no space between your strings you can either use paste()
with no separation or you can use paste0()
which does this by default:
st <- paste("Hello", "World", sep = "")
print(st)
## [1] "HelloWorld"
st <- paste0("Hello", "World")
print(st)
## [1] "HelloWorld"
The c()
command stands for “Concatenate” and is similar to paste()
in that it also combines/concatenates objects and allows you to assign them to variables, but c()
turns them into vectors (a list of objects) instead of characters (a single object):
st <- c("Hello", "World")
print(st)
## [1] "Hello" "World"
The cat()
command is also short for “conCATenate”, also performs the same function as paste()
, also uses a single space by default and can also use the sep
keyword argument to change this default:
cat("Hello", "World")
## Hello World
The difference between paste()
and cat()
is that cat()
does not return a value, it only prints to the console. This means that it cannot be saved to a variable like this:
a <- cat("Hello", "World")
The code below prints NULL
because nothing was returned by the cat()
function and so nothing was saved to the a
variable:
print(a)
## NULL
Use strsplit()
to do the opposite of concatenation:
st <- "Split the words in a sentence."
st <- strsplit(st, " ")
print(st)
## [[1]]
## [1] "Split" "the" "words" "in" "a" "sentence."
You can:
Does a string contain the letter(s) you are looking for? Search the string and return a Boolean (true or false) to find out. This is done with the grepl()
command:
# Does the letter "o" appear in "Hello World"?
grepl("o", "Hello World", fixed = TRUE)
## [1] TRUE
In case you were wondering, this function’s name is short for “globally search for a regular expression and print matching lines - logical”. You can now see why we shorten it!
You may have noticed that we set the keyword argument fixed
to TRUE
- this causes the grepl()
function to search for exactly the string that was provided. If this is omitted the default behaviour is to treat the string being searched for as a regular expression, which may create unexpected results.
Does a string start with or end with a certain thing?
startsWith("alphabet", "a")
## [1] TRUE
endsWith("Filename.csv", ".csv")
## [1] TRUE
Now that we know our string contains the letter we are looking for, get the indices (positions) where that certain character is by using the gregexpr()
command:
st <- "Hello World"
idx_o <- unlist(gregexpr(pattern = "o", st))
print(idx_o)
## [1] 5 8
In this example, the letter “o” can be found at both position 5 and 8 in the string “Hello World”.
You can see what characters are at a certain location within a string by using the substr()
command. This takes the start
and stop
keyword arguments, which are pretty self-explanatory (they tell the command where to start and stop looking):
st <- "Hello World"
sub <- substr(st, start = 6, stop = 8)
print(sub)
## [1] " Wo"
You can lookup all the characters from a given start point to the end of a string by using the total number of characters in a string (nchar()
) as the stop point:
sub <- substr(st, start = 6, stop = nchar(st))
print(sub)
## [1] " World"
Similarly, you can lookup all the characters from the start of a string until a given stop point by using “1” as the start point:
sub <- substr(st, start = 1, stop = 8)
print(sub)
## [1] "Hello Wo"
You can:
This is again done using the substr()
function but this time we are looking up text before and after a character, not an index.
Remove the text before a certain character:
st <- "Hello World"
idx_o <- unlist(gregexpr(pattern = "o", st))
sub <- substr(st, start = idx_o[1], stop = nchar(st))
print(sub)
## [1] "o World"
Remove the text after a certain character:
sub <- substr(st, start = 1, stop = idx_o[1])
print(sub)
## [1] "Hello"
As a more practical example, here’s how to remove the extension from a filename:
# Remove extension from filename
filename <- "My_File.txt"
idx <- unlist(gregexpr(pattern='\\.', filename))
filename_root <- substr(filename, start = 1, stop = idx[1] - 1)
print(filename_root)
## [1] "My_File"
Use the str_sub()
command from the stringr
library:
library(stringr)
st <- "Hello World"
str_sub(st, 6, 6) <- "_"
str_sub(st, 8, 7) <- "O"
print(st)
## [1] "Hello_WOorld"
Use the gsub()
command:
st <- "Hello World"
st <- gsub("ello", "i", st)
print(st)
## [1] "Hi World"
White space is what is created by the space bar or the tab key. You can trim a string by removing any white space at its start and end using the trimws()
function:
str <- " Hello World "
print(trimws(str))
## [1] "Hello World"
A string can be UPPERCASE, lowercase, Sentence case or Title Case:
st <- "The quick, brown Fox jumped over the lazy Dog"
print(str_to_upper(st))
## [1] "THE QUICK, BROWN FOX JUMPED OVER THE LAZY DOG"
print(str_to_lower(st))
## [1] "the quick, brown fox jumped over the lazy dog"
print(str_to_sentence(st))
## [1] "The quick, brown fox jumped over the lazy dog"
print(str_to_title(st))
## [1] "The Quick, Brown Fox Jumped Over The Lazy Dog"
These functions come from the stringr
library.
If you want to print a variable as part of a string using a specific format, use the sprintf()
function (string print formatted) along with the “%” character to indicate where you want the variable to be inserted. You also need to have a letter after the “%” to indicate what type of format to use:
When working with numbers, use the “f” for ‘floating-point’ in conjunction with “%”:
x <- 1
sprintf("x = %f", x)
## [1] "x = 1.000000"
Use a decimal point and a number to set the number of decimal places you see:
x <- 1
sprintf("x = %.2f", x)
## [1] "x = 1.00"
Use a number before the decimal point to set the ‘width’ of the number, ie how many spaces it takes up (including the decimal point). If the width is set to be wider than the number is long, the extra space with be filled up with blanks:
x <- 1
sprintf("x = %8.2f", x)
## [1] "x = 1.00"
Instead of the “f”, use an “s” together with the “%”.
You can print to the console immediately:
st <- "Hello world"
sprintf("Output text is: %s", st)
## [1] "Output text is: Hello world"
…or you can write the value to a variable and print it to the console at a later stage:
st <- "Hello world"
st <- sprintf("Output text is: %s", st)
print(st[1])
## [1] "Output text is: Hello world"
If you extract a single string from any vector or dataframe, it will be treated in the same way by the sprintf()
function:
df <- data.frame(
name <- c("Alfa", "Bravo", "Charlie")
)
sprintf("Output text is: %s", df$name[1])
## [1] "Output text is: Alfa"
Unicode characters can be called by using \U
and referencing their unicode number. Some can then simply be printed:
st1 <- "\U03BC" # mu
st2 <- "\U03B5" # epsilon
st3 <- "\U03C9" # omega
print(c(st1, st2, st3))
## [1] "μ" "ε" "ω"
Others (eg emojis) cannot be printed this easily and need the special utf8_print()
function from the utf8
library:
st1 <- "\U2705" # white heavy check mark
st2 <- "\U274c" # cross mark
st3 <- "\U0001f609" # winking face
utf8::utf8_print(c(st1, st2, st3))
## [1] "✅" "❌" "😉"