Using an array is an efficient way to handle multiple elements. It is a better choice than a list (or a list-of-lists) if you are doing calculations with a set of numbers, but not as good if you are working with strings or doing general data handling and cleaning as opposed to mathematics.
Note that a vector is the name for a one-dimensional array and a matrix is a two-dimensional array. These do not exist as separate data types in Python - there are only arrays - but you can still call an array a vector or a matrix if it fits the definition.
You have two options for creating arrays:
Python has a built-in module called array
that can be
used as follows:
import array
ar = array.array('i', [1, 2, 3, 4, 5, 6, 7])
print(ar)
## array('i', [1, 2, 3, 4, 5, 6, 7])
print(type(ar))
## <class 'array.array'>
This object ar
is more memory-efficient than a list
would have been. The i
that has been used is the type
code which specifies what data type will apply for all elements in
the array. In this case they will be signed integers. For all
the available type codes, see the
documentation.
However, this module has a drawback: it only allows you to create one-dimensional arrays. It also does not have a particularly large set of mathematical operations that are compatible with this data type. For greater functionality we need to use the NumPy package.
If we have NumPy installed, we can create an array manually by first
creating a list (or a list-of-list) of numbers and then using the
array()
function to convert it:
import numpy as np
ls = [1, 2, 3, 4, 5, 6, 7]
ar = np.array(ls)
print(ar)
## [1 2 3 4 5 6 7]
print(type(ar))
## <class 'numpy.ndarray'>
Note the data type of this object: it is an ndarray
. It
is a 1D array (a vector); here’s a 2D array (a matrix):
ls_of_ls = [
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7]
]
ar = np.array(ls_of_ls)
print(ar)
## [[1 2 3 4 5 6 7]
## [1 2 3 4 5 6 7]
## [1 2 3 4 5 6 7]]
To create an array of all ones for a given size:
ar = np.ones(5)
print(ar)
## [1. 1. 1. 1. 1.]
…and of all zeroes for a given size:
ar = np.zeros(9)
print(ar)
## [0. 0. 0. 0. 0. 0. 0. 0. 0.]
For the rest of this page we will use NumPy arrays.
Indexing is done with square brackets. Remember that Python uses
zero-indexing, meaning that the first element is at index 0, the second
is at index 1 and so on. This means that the number at index 4 of the
list/array [10, 20, 30, 40, 50, 60, 70]
is “50”:
ar = np.array([10, 20, 30, 40, 50, 60, 70])
print(ar[4])
## 50
Use the enumerate()
function to iterate over both the
values in an array and their indexes. This can be used, for example, to
find the index of the first value in an array to meet a certain
condition. In the below example, the first occurrence of the number “4”
is searched for using the next()
function and it is found
at index 3 in the array:
# Index of first value meeting a condition
idx = next(i for i, v in enumerate(ar) if v == 40)
print(idx)
## 3
For the record, the actual value (as opposed to the index) can be returned in a similar fashion:
# First value meeting a condition
value = next(v for i, v in enumerate(ar) if v == 40)
print(value)
## 40
Sort an array into numerical order and find the index where, if you were to insert a given number, it would maintain the numerical order:
# Index where a number can be inserted while maintaining order
idx = np.searchsorted(ar, 35)
print(idx)
## 3
When indexing an array with a Boolean mask (a list of trues and falses) it has the effect of filtering the array:
ar = np.array([10, 20, 30, 40, 50, 60, 70])
# Create a Boolean mask
mask = [False, False, False, False, True, True, True]
# Filter the array
filtered = ar[mask]
print(filtered)
## [50 60 70]
This is more usually used when searching for values that meet certain criteria, eg here’s how to get the values that are greater than 40:
# Filter the array
filtered = ar[ar > 40]
print(filtered)
## [50 60 70]
Logical operators can be used to combine Boolean masks and create more complicated filters, eg getting the values above 50 and below 30:
# Filter the array
filtered = ar[(ar > 50) | (ar < 30)]
print(filtered)
## [10 20 60 70]
Add to an array using Numpy’s append()
function. You can
append a single element, an array of elements or a list of them:
ar = np.array([10, 20, 30, 40, 50, 60, 70])
# Append a single number
ar = np.append(ar, 80)
print(ar)
## [10 20 30 40 50 60 70 80]
# Append an array
ar = np.append(ar, np.array([90, 100, 110]))
print(ar)
## [ 10 20 30 40 50 60 70 80 90 100 110]
# Append a list
ar = np.append(ar, [120, 130, 140])
print(ar)
## [ 10 20 30 40 50 60 70 80 90 100 110 120 130 140]
To append elements as a new row you need to use the
concatenate()
function, although this can be a bit
confusing. If you simply concatenate two arrays it will act identically
to the append()
function (although note that you need to
use an extra set of round brackets when specifying the arrays to
concatenate):
ar1 = np.array([1, 2, 3])
ar2 = np.array([4, 5, 6])
# Flatten arrays before concatenating them
ar = np.concatenate((ar1, ar2)) # Note the double round brackets
print(ar)
## [1 2 3 4 5 6]
The reason you need to use two sets of round brackets is because
there is a hidden keyword argument included in the
concatenate()
function called “axis
”. If we
leave it out (as we did in the previous example) it will take the
default value of None
and as a result it will flatten
the arrays before concatenating them. This is why the functionality
of the above example was identical to the append()
function. Here is the example again but with the axis
keyword argument explicitly shown:
# Flatten arrays before concatenating them
ar = np.concatenate((ar1, ar2), axis=None)
print(ar)
## [1 2 3 4 5 6]
As you can see, the above example is identical to the one before it,
and both are identical to using the append()
function.
If we want to take control of the behaviour we need to specify which
axis
to concatenate along, either the “0” axis or the “1”
axis. In our previous examples, our arrays have only had one dimension
each and with one-dimensional data we only have the option to
concatenate along the 0th axis:
ar1 = np.array([1, 2, 3])
ar2 = np.array([4, 5, 6])
# Concatenate arrays horizontally
ar = np.concatenate((ar1, ar2), axis=0)
print(ar)
## [1 2 3 4 5 6]
If we changed the above to axis=1
the script would fail
because it would be looking for an extra dimension to the data which
doesn’t exist. If we instead used two-dimensional data (ie arrays made
from lists-of-lists) we could do the following:
ar1 = np.array([[1, 2, 3]]) # Note the double square brackets creating a list-of-list
ar2 = np.array([[4, 5, 6]]) # Note the double square brackets creating a list-of-list
# Concatenate arrays vertically
ar = np.concatenate((ar1, ar2), axis=0)
print(ar)
## [[1 2 3]
## [4 5 6]]
Note that axis=0
doesn’t consistently mean “concatenate
horizontally” or “concatenate vertically”. It means “concatenate along
the highest dimension”. This means that, for this data,
axis=1
will concatenate horizontally and we will yet again
produce 1 2 3 4 5 6
as an output (although this time it
will be two-dimensional data - although only one dimension is occupied -
as shown by the double square brackets in the output):
# Concatenate arrays horizontally
ar = np.concatenate((ar1, ar2), axis=1)
print(ar)
## [[1 2 3 4 5 6]]
If you recall our first concatenate()
example you’ll
remember that not specifying an axis
(or, equivalently,
specifying it as None
) it will flatten the arrays before
concatenating, resulting in one-dimensional data (note the single square
brackets in the output):
# Concatenate arrays horizontally
ar = np.concatenate((ar1, ar2), axis=None)
print(ar)
## [1 2 3 4 5 6]
Finally, here is a way to do a ‘line-break’: convert a one-dimensional array into a two-dimensional one:
ar = np.array([1, 2, 3, 4, 5, 6])
# Perform a line-break
ar = np.concatenate(([ar[0:3]], [ar[3:]]), axis=0)
print(ar)
## [[1 2 3]
## [4 5 6]]
Flipping an array along a diagonal axis can be done with the
.T
method:
ar = np.array([
[10, 20, 30],
[40, 50, 60]
])
# Original array
print(ar)
## [[10 20 30]
## [40 50 60]]
# Transposed array
print(ar.T)
## [[10 40]
## [20 50]
## [30 60]]
The .transpose()
method also works:
# Transposed array
print(ar.transpose())
## [[10 40]
## [20 50]
## [30 60]]
Here’s how to use transposition to concatenate arrays in the exact way you want:
ar1 = np.array([[1, 2], [3, 4]])
ar2 = np.array([[5, 6]])
ar = np.concatenate((ar1, ar2.T), axis=1)
print(ar)
## [[1 2 5]
## [3 4 6]]
Arrays can be added, subtracted, multiplied, etc.
ar1 = np.array([10, 20, 30])
ar2 = np.array([10, 20, 30])
# Add elements
ar = ar1 + ar2
print(ar)
## [20 40 60]
# Subtract elements
ar = ar - ar2
print(ar)
## [10 20 30]
Here’s the difference between multiplying a list by 2 versus multiplying an array by 2:
# Multiply a list by 2
ls = [1, 2, 3] * 2
print(ls)
## [1, 2, 3, 1, 2, 3]
# Multiply an array by 2
ar = np.array([1, 2, 3]) * 2
print(ar)
## [2 4 6]
As you can see, multiplication works on a list as an object whereas it works on the elements of an array.
ar1 = np.array([
[1, 2],
[3, 4],
[5, 6]
])
ar2 = np.array([
[5, 6, 7],
[7, 8, 9]
])
# Perform dot multiplication
ar = np.dot(ar1, ar2)
print(ar)
## [[19 22 25]
## [43 50 57]
## [67 78 89]]
ar1 = np.array([
[1, 2],
[3, 4]
])
ar2 = np.array([
[5, 6],
[7, 8]
])
# Perform cross multiplication
ar = np.cross(ar1, ar2)
print(ar)
## [-4 -4]