⇦ Back

As introduced in the data types page, ‘strings’ are characters that have been ‘strung’ together to form words.

1 Types of Strings

1.1 String Literals

A string literal is a ‘normal string’ - it’s the type you will most often encounter in coding. You can write a string by using single or double quotation marks, or two matched sets of three double or single quotation marks:

# These are strings, representing text
st = "Hello, World!"
st = 'Hello, World!'
st = """Hello, World!"""
st = '''Hello, World!'''

Strings that take up more than one line can only be defined using the latter two methods:

# This is a string that crosses lines
st = '''Hello,
World!'''
print(st)
## Hello,
## World!

Or, alternatively, a ‘newline character’ can be used. In this case, one pair of single or double quotation marks can be used:

# This is a string that crosses lines
st = 'Hello,\nWorld!'
print(st)
## Hello,
## World!

If you want to include quotation marks in a string, you need to define the string using one of the other types of quotation marks. For example, if you want to have double quotation marks in your string you need to use single quotation marks to define it. If you try to use double quotation marks to define it Python will get confused as to whether you have created one string or two:

# Correct
st = 'This string contains a "quote"'
# Incorrect - will create an error! The letters "quote" are outside your string
# st = "This string contains a "quote""

Python strings are particularly robust: they can handle many foreign characters:

# Foreign characters entered explicitly
print('À Æ Ç Ë Õ Ω 💩')
## À Æ Ç Ë Õ Ω 💩

One thing to take note of is the backslash character (“\”). In string literals, this used as an escape character. This means that whatever immediately follows a backslash is NOT simply interpreted as text and, if it has been used correctly, it will actually be interpreted as an instruction to insert a special character:

# These following strings all contain escape characters that cause special
# characters to be inserted
print('\" Quotation marks')
print('\\ A backslash')
print('\t A tab')
print('\n A newline')
print('\xe1 \xA9 Foreign characters entered as hex code')
print('\u00df \u03A9 Foreign characters entered as Unicode')
## " Quotation marks
## \ A backslash
##   A tab
## 
##  A newline
## á © Foreign characters entered as hex code
## ß Ω Foreign characters entered as Unicode

1.2 Raw String Literals

A raw string literal is exactly the same as a regular string literal except that the backslash is NOT an escape character - everything is interpreted as ‘raw’ text. A raw string literal is created by putting an “r” or “R” in front of the string:

# These characters are not escaped
print(r'\" \\ \t \n \xe1 \xA9 \u00df \u03A9 💩')
## \" \\ \t \n \xe1 \xA9 \u00df \u03A9 💩

1.3 F-Strings

These strings are used for formatting and are created by prepending the letter “f” or “F”. An f-string can display the value of a variable directly in it, and it uses curly brackets to indicate the position of the variable:

st = 'Hello, World!'
print(f'My string is: {st}')
## My string is: Hello, World!

More information about these can be found on the f-string page.

1.4 Raw F-Strings

As the name suggests, these combine raw strings and f-strings, ie they are f-strings where backslashes are not treated as escape characters. As you would expect, they are created by prepending “fr” or “FR”:

st = 'Hello, World!'
print(fr'\n {st}')
## \n Hello, World!

2 Concatenate Strings

Combining strings in Python is very easy, simply use “+” to add them together!

st1 = 'Alpha'
st2 = 'bet'
st = st1 + st2
print(st)
## Alphabet

If the strings are in a list, they can be combined using the .join() method:

ls = ['Alpha', 'bet']
word = ''.join(x for x in ls)
print(word)
## Alphabet

2.1 Displaying Strings Together

Note that it is also possible to display two strings as one without actually combining them: in the next example the same strings Alpha and bet are shown side-by-side whilst remaining two objects:

print(st1, st2, sep='')
## Alphabet

The above was achieved by changing the default behaviour of the print() function - which usually displays elements with a space in between them - through the use of the sep keyword argument. This determines what characters separate the elements being printed. Note that this doesn’t affect the strings themself, it merely changes how they are shown in the console on that particular occasion:

print(st1, st2, sep=', ')
## Alpha, bet

Similarly, the end keyword argument determines what characters end a string when it gets displayed. By default, this is a ‘new line’ character, ie everything gets shown on a new line when you use the print() function. This can be changed to get a result similar to the above:

print(st1, end='')
print(st2)
## Alphabet

2.2 File Paths

If you are dealing with files and folders on your computer it means you will be using file paths. The temptation will be to do something like this:

# Define the location of a file on your computer
home_dir = 'home'
new_dir = 'New Folder'
file_name = 'My File.py'
# Create a file path
file_path = home_dir + '/' + new_dir + '/' + file_name
print(file_path)
## home/New Folder/My File.py

However, this is bad practice. In addition to being cumbersome, this method will not work if someone runs your code on a Windows computer (because on Windows the slashes between elements in a path point the other way to those on Unix-based OSs like macOS and Linux). The best way to do this is actually to import the os package and use a function specially created for this purpose:

import os

# Create a proper file path
file_path = os.path.join(home_dir, new_dir, file_name)
print(file_path)
## home/New Folder/My File.py    > this will appear on macOS and Linux
## home\New Folder\My File.py    > this will appear on Windows

3 Split Strings

Use the .split() method to split a string at a certain character or sub-string. Remember that Python uses zero-indexing and so the number 0 refers to the first element in something whilst 1 refers to the second element and so on.

3.1 One Split

Split a string into two parts by splitting it once. In this case, split a file named Filename.py at the full stop to get its name and its extension:

st = 'Filename.png'
file_name = st.split('.')[0]
extension = st.split('.')[1]
print('File name:', file_name, '    Extension:', extension)
## File name: Filename     Extension: png

Notice that this approach caused us to lose the dot. When we use the .split() method it does not include the character we are splitting on in either of the sections.

3.2 Split into a List

Divide an articulated string up into its constituent parts:

st = '/User/username/Downloads'
ls = st.split('/')
print(ls)
## ['', 'User', 'username', 'Downloads']

4 Search, Find, Lookup

You can:

  • Search a string to see if it contains a certain letter or sub-string
  • Find where that letter or sub-string is located within the string
  • Lookup what letter or sub-string is at a certain location within a string

4.1 Search for a Character or Sub-string

Does a string contain the letter(s) you are looking for? Search the string and return a Boolean (true or false) to find out. This is done with the in statement:

# Do the letters "lena" appear in "Filename.png"?
boolean = 'lena' in 'Filename.png'
print(boolean)
## True

There are special methods to search for sub-strings at the start and at the end of strings:

# Does this website use SSL encryption? 
address = 'https://docs.python.org/3/library/string.html'
boolean = address.startswith('https')
print(boolean)
## True
# Is this file a PNG?
filename = 'My File.png'
boolean = filename.endswith('.png')
print(boolean)
## True

You can test for multiple prefixes or suffixes by creating a ‘tuple’ of sub-strings to search for:

# Is this file a PNG or a JPG?
filename = 'My File.docx'
# Create a tuple of sub-strings to search for
image_file_extensions = ('.png', '.jpg')
boolean = filename.endswith(image_file_extensions)
print(boolean)
## False

4.2 Find a Character or Sub-string

Now that we know our string contains the letter(s) we are looking for, get the indices (positions) where they are by using the correct method. You have two options here - .index() and .find() - with the difference being how they behave if they do not find what they are looking for:

  • .index() will return a ValueError if it does not find what it is looking for, which (under normal circumstances) will immediately cause your script to stop running
  • .find() will return “-1” if it does not find what it is looking for, meaning that your script will continue to run

If they do find what they are looking for, they will both return the index of the first occurrence of that sub-string (remembering that Python uses zero-indexing and so the first element is number 0):

st = '/User/username/Downloads'
# First appearance of a sub-string - return "ValueError" on failure
idx = st.index('/')
# First appearance of a sub-string - return "-1" on failure
idx = st.find('/')
print(idx)
## 0

The index of the last occurrence of a sub-string can be found with either .rindex() or .rfind() - the differences between these two is the same as before. As their names suggest, these methods search from the Right:

# Last appearance of a sub-string - return "ValueError" on failure
idx = st.rindex('/')
# Last appearance of a sub-string - return "-1" on failure
idx = st.rfind('/')
print(idx)
## 14

All occurrences can be found by using a more complicated statement which requires the re (regular expression) package:

import re

# Find all occurrences of a sub-string - return an empty string on failure
idx = [m.start() for m in re.finditer('/', st)]
print(idx)
## [0, 5, 14]

4.2.1 Insert a Character or Sub-String

The above methods which find locations within strings can be used to insert characters:

st = 'Hello World'
# Insert a comma between the words
idx = st.index(' ')
st = st[:idx] + ',' + st[idx:]
print(st)
## Hello, World

4.3 Lookup the Characters at Particular Indices

You can see what characters are at a certain location within a string by indexing it. Strings in Python are basically ‘lists of characters’ and thus they can be indexed in a similar way to lists:

st = 'Hello, World!'
# Return the first, fourth and eighth characters
print(st[0], st[3], st[7])
## H l W

Use negative numbers to refer to positions of elements starting from the right-hand-side (ie element “-1” is the last character, “-2” is the second-last, etc):

# Return the last, third-last and eighth-last characters
print(st[-1], st[-3], st[-8])
## ! l ,

Use a colon to refer to a range of characters. The end of the range (ie the second number) must be one more than the index of the last character you want. In Python, you index up until but not including a certain index:

# Return the second to fifth characters
print(st[1:5])
## ello

To return all characters from an index until the end of the string, leave the end of the range blank. To return all characters up until an index, leave the start of the range blank:

# Return the first word and the second word
print('First word:', st[:5], '   Second word:', st[7:])
## First word: Hello    Second word: World!

5 Delete, Overwrite, Replace

You can:

  • Delete all the characters before or after a given point in a string
  • Not overwrite the characters at a given location within a string
  • Replace all occurrences of a certain letter or sub-string within a string with a given replacement

5.1 Delete Characters

To delete characters at, before, after or between certain indexes simply index the string:

st = 'Hello, World!'
# Delete character at index 5
st1 = st[:5] + st[5 + 1:]
# Delete characters before index 7
st2 = st[7:]
# Delete characters after index 4
st3 = st[:5]
# Delete characters between index 4 and 7
st4 = st[7:][:5]

print(st1, st2, st3, st4, sep=', ')
## Hello World!, World!, Hello, World

To delete characters at, before, after or between certain sub-strings your first need to find the sub-string, get its index, then perform the deletion:

st = 'Hello, World!'

# Find the comma
idx_comma = st.find(',')
# Delete all characters after the comma
st = st[:idx_comma]

print(st)
## Hello

5.2 Overwrite Characters

This is not possible in Python. Strings are “immutable” which means that they cannot be changed. The only thing you can do is create a new string that omits the part you want to overwrite:

st = 'Hello, World!'
st = st[0] + 'i' + st[5:]
print(st)
## Hi, World!

5.3 Replace Characters

Instead of using indexes, sub-strings can be searched for and replaced by other sub-strings using the .replace() method:

st = 'Seven + 8 = 15'
st = st.replace('Seven', '7')
print(st)
## 7 + 8 = 15

This is useful if you have characters that cause problems in certain instances. For example, file names cannot contain slashes, so you can remove them as follows:

st = '01/02/03'
st = st.replace('/', '_')
print(st)
## 01_02_03

Similarly, if you are exporting something to Latex you will need to add escape characters:

st = '01_02_03'
st = st.replace('_', r'\_')
print(st)
## 01\_02\_03

5.4 Remove Characters From the Start or the End

The .removeprefix() and .removesuffix() methods do exactly what they say on the tin: they remove characters from the front and from the end of a string, respectively. This is useful for things like file extensions:

filename = 'Document.docx'
fileroot = filename.removesuffix('.docx')

print(fileroot)
## Document

5.5 Trim White Space

Leading and trailing white space (space and tab characters) can be removed with the .strip() method:

st = ' Hello  '
st = st.strip()
print('|' + st + '|')
## |Hello|

5.6 Remove Duplicated White Space

Here’s a string that is formatted to look like a table:

st = """
Year NDay Pos0 Pos1 Pos2 Pos3 Pos4
2015    4    8   11   13   14   18
2016    4   18   18   19   18   17
2017    4   17   20   25   26   27
2018    4   27   26   26   26   25
2019    4   25   24   23   22   21
"""

Replace the duplicated spaces with single spaces by using a combination of .join() and .split() like so:

line = ' '.join(st.split())
print(line)
## Year NDay Pos0 Pos1 Pos2 Pos3 Pos4 2015 4 8 11 13 14 18 2016 4 18 18 19 18 17 2017 4 17 20 25 26 27 2018 4 27 26 26 26 25 2019 4 25 24 23 22 21

Here’s how to do it while keeping the linebreaks in:

# Remove duplicate whitespace and convert to list of strings
st = [' '.join(row.split()) for row in st.split('\n')]
# Add linebreak characters back in
st = [row + '\n' for row in st]
# Collapse to string
st = ''.join(st)
print(st)
## Year NDay Pos0 Pos1 Pos2 Pos3 Pos4
## 2015 4 8 11 13 14 18
## 2016 4 18 18 19 18 17
## 2017 4 17 20 25 26 27
## 2018 4 27 26 26 26 25
## 2019 4 25 24 23 22 21

6 Change the Case

A string can be UPPERCASE, lowercase, Sentence case or Title Case:

# lowercase
st = 'Filename.PNG'
filename = st.split('.', 1)[0]
extension = st.split('.', 1)[1].lower()
print(filename + '.' + extension)
## Filename.png
# UPPERCASE
st = 'Hello, World!'
print(st.upper())
## HELLO, WORLD!
# Sentence case
st = 'hello. greetings to the world.'
print('. '.join(x.capitalize() for x in st.split('. ')))
## Hello. Greetings to the world.
# Title Case
st = 'alice in wonderland'
print(st.title())
## Alice In Wonderland

6.1 Pascal Case

‘Pascal case’ is when there are no spaces between words and, instead, every word is capitalised is order to make it readable (LikeThisForExample). Here is a function that converts a string written in this manner into sentence case:

def pascal_to_sentance_case(w):
    """Convert a string in Pascal case into sentence case."""
    new_word = ''
    for i, word in enumerate(re.findall('([A-Z][a-z]*)', w)):
        if i == 0:
            new_word = new_word + word
        else:
            new_word = new_word + ' ' + word.lower()
    return new_word


# Turn Pascal case into sentence case
new_word = pascal_to_sentance_case('PascalCase')
print(new_word)
## Pascal case

⇦ Back