⇦ Back

Sometimes, you’ll come across datasets where certain columns contain only the values “False”, “None”, “NaN” or “0” and you’ll want to remove them. This isn’t entirely straightforward to do, so this page explores a few options at your disposal.

Just to be clear, this page is about removing entire columns that contain only data that isn’t useful. For removing columns or rows that partially contain missing data, check out the .dropna() method over here.

1 Example Data

Here’s the data frame that will be used for the examples on this page:

import pandas as pd
import numpy as np

# Set Pandas display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 120)

# Create a data frame
raw = pd.DataFrame({
    'only_text': ['Value 1', 'Value 2', 'Value 3', 'Value 4'],
    'only_trues': [True, True, True, True],
    'text_and_nones': ['Value 1', 'Value 2', None, None],
    'only_falses': [False, False, False, False],
    'only_nones': [None, None, None, None],
    'only_nans': [np.NaN, np.NaN, np.NaN, np.NaN],
    'only_zeroes': [0, 0, 0, 0],
    'falses_and_nones': [False, False, None, None],
    'falses_and_nans': [False, False, np.NaN, np.NaN],
    'falses_and_zeroes': [False, False, 0, 0],
    'nones_and_nans': [None, None, np.NaN, np.NaN],
    'nones_and_zeroes': [None, None, 0, 0],
    'nans_and_zeroes': [np.NaN, np.NaN, 0, 0],
}, dtype='object')

print(raw)
##   only_text only_trues text_and_nones only_falses only_nones only_nans only_zeroes falses_and_nones falses_and_nans  \
## 0   Value 1       True        Value 1       False       None       NaN           0            False           False   
## 1   Value 2       True        Value 2       False       None       NaN           0            False           False   
## 2   Value 3       True           None       False       None       NaN           0             None             NaN   
## 3   Value 4       True           None       False       None       NaN           0             None             NaN   
## 
##   falses_and_zeroes nones_and_nans nones_and_zeroes nans_and_zeroes  
## 0             False           None             None             NaN  
## 1             False           None             None             NaN  
## 2                 0            NaN                0               0  
## 3                 0            NaN                0               0

The above code uses the Pandas and Numpy packages. Install these from the terminal with:

  • python3.11 -m pip install pandas
  • python3.11 -m pip install numpy

Where python3.11 is the version of Python you are using (in my case it’s 3.11 but replace this with what you have).

As you can see, the example data frame includes columns that have all the different combinations of data that you might want to remove - Falses, Nones, NaNs and zeroes - as well as a few columns at the start that don’t only contain data to remove.

2 Drop Columns That Only Contain One Particular Unwanted Value

2.1 Drop Columns That Only Contain “False”

To find the elements which are “False”:

  • Use the .any() method to return whether any elements in a column are “True”
  • Take the inverse of this with ~ to get whether all elements in a column are “False”
  • Use the bool_only=True keyword argument to only look at Boolean values
    • This ensures that zeroes will not be interpreted as Falses, which is the default behaviour

The columns that are found can then be removed with the .drop() method:

# Make a copy of the data frame
df = raw.copy()

# Find which columns contain only Booleans and only Falses. Use the `bool_only=True`
# option with the `any()` method to have it only recognise Booleans:
bool_cols = ~df.any(bool_only=True)
only_falses = bool_cols[bool_cols]
# Get the names of these columns
cols_only_falses = only_falses.index.to_list()
# Remove these columns
df = df.drop(cols_only_falses, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses'}

The only_falses column is the only one to have been dropped.

The same output can be achieved using a loop:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'False'
for col in list(df.any(bool_only=True)[~df.any(bool_only=True)].index):
    df = df.drop(col, axis=1)
    
# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses'}

Column only_falses has been dropped while all the others remain.

2.2 Drop Columns That Only Contain “None”

The secret to finding the values that are “None” as opposed to “NaN” is to check their data types, which can be done by applying the type function as a map across the entire data frame:

# Make a copy of the data frame
df = raw.copy()

# Find the nones
nones = df.applymap(type) == type(None)
# Find columns with only nones
cols = nones.all()[nones.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_nones'}

The only_nones column has been dropped while all the others have been left alone.

2.3 Drop Columns That Only Contain “NaN”

We need to find the values that are “NA” using the .isna() method, but this will return values that are “None” as well. These can be removed by doing the opposite of what we did above: using the .applymap() method to look for values that do not have their type equal to “None”:

# Make a copy of the data frame
df = raw.copy()

# Find the nans
nans = (df.isna()) & (df.applymap(type) != type(None))
# Find columns with only nans
cols = nans.all()[nans.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_nans'}

only_nans is the only column to be removed.

2.4 Drop Columns That Only Contain “0”

We can search the data frame for values that are equal to “0”, but this will return values that are “False” as well! Remove those by using .applymap() once again, this time to find values that are integers:

# Make a copy of the data frame
df = raw.copy()

# Find the zeroes
zeroes = (df == 0) & (df.applymap(type) == int)
# Find columns with only zeroes
cols = zeroes.all()[zeroes.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_zeroes'}

We have removed the only_zeroes column and left the rest.

3 Drop Columns That Only Contain Two Particular Unwanted Values

3.1 Drop Columns That Only Contain “False” and “None”

Find the Falses by looking for values that are equal to “False” but which are not equal to “0” (as zeroes are interpreted as Falses by default). Find the Nones by searching for values of the same type as “None”. Remove the columns that only contain one or a combination of both of these:

# Make a copy of the data frame
df = raw.copy()

# Find the falses
falses = (df == False) & (df.astype(str) != '0')
# Find the nones
nones = df.applymap(type) == type(None)
# Find both the falses and the nones
falses_nones = falses | nones
# Find columns with only falses and nones
cols = falses_nones.all()[falses_nones.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'falses_and_nones', 'only_falses', 'only_nones'}

3.2 Drop Columns That Only Contain “False” and “NaN”

Find the Falses and the NaNs as has been described above:

# Make a copy of the data frame
df = raw.copy()

# Find the falses
falses = (df == False) & (df.astype(str) != '0')
# Find the nans
nans = (df.isna()) & (df.applymap(type) != type(None))
# Find both the falses and the nans
falses_nans = falses | nans
# Find columns with only falses and nans
cols = falses_nans.all()[falses_nans.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'falses_and_nans', 'only_falses', 'only_nans'}

The columns that contain only Falses, NaNs or a combination of the two - only_falses, only_nans and falses_and_nans - have been removed.

3.3 Drop Columns That Only Contain “False” and “0”

Because “False” and “0” both evaluate to “0” in Python, we can simply search for columns where all the values are equal to “0” using the .all() method:

# Make a copy of the data frame
df = raw.copy()

# Find columns that only contain falses and zeroes
only_falses_zeroes = (df == 0).all()
falses_zeroes_cols = only_falses_zeroes[only_falses_zeroes].index.to_list()
# Remove these columns which are all nones
df = df.drop(falses_zeroes_cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_zeroes', 'only_falses', 'falses_and_zeroes'}

The columns that contain only Falses, zeroes or a combination of the two - only_falses, only_zeroes and falses_and_zeroes - have been removed.

3.4 Drop Columns That Only Contain “None” and “NaN”

Because “None” and “NaN” are both considered to be ‘Nulls’, we can simply use the .isnull() method to search for them and the .all() method to search for the columns containing only them:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain Nones and/or NaNs
for col in df:
    if df[col].isnull().all():
        df = df.drop(col, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_nones', 'only_nans', 'nones_and_nans'}

Columns only_nones, only_nans and nones_and_nans have been dropped.

Equivalently, the .dropna() method can be used to drop ‘NAs’. Specifically, it can drop columns where all the values are NAs:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'None' and/or 'NaN'
df = df.dropna(axis='columns', how='all')

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_nones', 'only_nans', 'nones_and_nans'}

In general, the .dropna() method is used to remove missing data (which means “None”s and “NaN”s). It has a how keyword argument which determines how it chooses what to remove: in our case we want it to remove columns where all values are NAs (“None”s and “NaN”s).

3.5 Drop Columns That Only Contain “None” and “0”

Find the Nones by applying the type function as a map to the data frame via the .applymap() method and searching for the values that return the same type as None. Find the zeroes by looking for which values become “0” when converted to a string (this avoids values which are equal to “False” which, usually, are considered to be the same as the number “0” by Python):

# Make a copy of the data frame
df = raw.copy()

# Find the nones
nones = df.applymap(type) == type(None)
# Find the zeroes
zeroes = df.astype(str) == '0'
# Find both the nones and the zeroes
nones_zeroes = nones | zeroes
# Find columns with only nones and nones
cols = nones_zeroes.all()[nones_zeroes.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_nones', 'only_zeroes', 'nones_and_zeroes'}

The columns with only Nones, zeroes or a combination thereof have been removed.

3.6 Drop Columns That Only Contain “NaN” and “0”

Find the values that are “NaN” or “None” by using the .isna() method, then remove the Nones by doing the opposite of what we did above: using the .applymap() method to look for values that do not have their type equal to “None”. Find the zeroes as above:

# Make a copy of the data frame
df = raw.copy()

# Find the nans
nans = (df.isna()) & (df.applymap(type) != type(None))
# Find the zeroes
zeroes = df.astype(str) == '0'
# Find both the nans and the zeroes
nans_zeroes = nans | zeroes
# Find columns with only nans and zeroes
cols = nans_zeroes.all()[nans_zeroes.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'nans_and_zeroes', 'only_zeroes', 'only_nans'}

4 Drop Columns That Only Contain Three Particular Unwanted Values

4.1 Drop Columns That Only Contain “False”, “None” and “NaN”

Find the Falses, Nones and NaNs using methods described above and combine them:

# Make a copy of the data frame
df = raw.copy()

# Find the falses
falses = (df == False) & (df.astype(str) != '0')
# Find the nones
nones = df.applymap(type) == type(None)
# Find the nans
nans = (df.isna()) & (df.applymap(type) != type(None))
# Combine the above
falses_nones_nans = falses | nones | nans
# Find columns with only nones and nones
cols = falses_nones_nans.all()[falses_nones_nans.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses', 'falses_and_nones', 'nones_and_nans', 'falses_and_nans', 'only_nones', 'only_nans'}

All columns that contain only Falses, Nones, NaNs or a combination thereof have been removed.

4.2 Drop Columns That Only Contain “False”, “None” and “0”

Converting a copy of each column to ‘Boolean’ format will change all instances of “None” to “False”, but instances of “NaN” will be unaffected:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'False', 'None' and 'NaN'
for col in list(df):
    if not df[col].astype('bool').any():
        df = df.drop(col, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses', 'falses_and_nones', 'nones_and_zeroes', 'only_zeroes', 'only_nones', 'falses_and_zeroes'}

Columns only_falses, only_nones, only_zeroes, falses_and_nones, falses_and_zeroes and nones_and_zeroes have been dropped, but none that contain NaNs.

4.3 Drop Columns That Only Contain “False”, “NaN” and “0”

Find the Falses, NaNs and zeroes separately:

# Make a copy of the data frame
df = raw.copy()

# Find the falses
falses = (df == False) & (df.astype(str) != '0')
# Find the nans
nans = (df.isna()) & (df.applymap(type) != type(None))
# Find the zeroes
zeroes = df.astype(str) == '0'
# Combine the above
falses_nans_zeroes = falses | nans | zeroes
# Find columns with only nones and nones
cols = falses_nans_zeroes.all()[falses_nans_zeroes.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'nans_and_zeroes', 'only_nans', 'only_falses', 'only_zeroes', 'falses_and_nans', 'falses_and_zeroes'}

4.4 Drop Columns That Only Contain “None”, “NaN” and “0”

Find the Nones, NaNs and zeroes separately:

# Make a copy of the data frame
df = raw.copy()

# Find the nones
nones = df.applymap(type) == type(None)
# Find the nans
nans = (df.isna()) & (df.applymap(type) != type(None))
# Find the zeroes
zeroes = df.astype(str) == '0'
# Combine the above
nones_nans_zeroes = nones | nans | zeroes
# Find columns with only nones and nones
cols = nones_nans_zeroes.all()[nones_nans_zeroes.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'nans_and_zeroes', 'nones_and_zeroes', 'nones_and_nans', 'only_zeroes', 'only_nones', 'only_nans'}

No columns containing “False” have been removed.

5 Drop Columns That Only Contain Four Particular Unwanted Values

5.1 Drop Columns That Only Contain “False”, “None”, “NaN” and “0”

The .any() method is going to be the key to finding which columns are empty: it looks at each column to see if there are any elements in it which are:

  • True
  • Non-NaN
  • Non-zero
  • Non-empty

Let’s take a look:

# Make a copy of the data frame
df = raw.copy()

print(df.any())
## only_text             True
## only_trues            True
## text_and_nones        True
## only_falses          False
## only_nones           False
## only_nans            False
## only_zeroes          False
## falses_and_nones     False
## falses_and_nans      False
## falses_and_zeroes    False
## nones_and_nans       False
## nones_and_zeroes     False
## nans_and_zeroes      False
## dtype: bool

Of course, what we are actually interested in is the columns where none of the elements are True, non-NaN, non-zero or non-empty. This is the opposite of what the .any() method looks for, so we can use the tilda (~) operator which returns the opposite of what it was given:

print(~df.any())
## only_text            False
## only_trues           False
## text_and_nones       False
## only_falses           True
## only_nones            True
## only_nans             True
## only_zeroes           True
## falses_and_nones      True
## falses_and_nans       True
## falses_and_zeroes     True
## nones_and_nans        True
## nones_and_zeroes      True
## nans_and_zeroes       True
## dtype: bool

What we have is a series indicating which columns are empty! We can use the Boolean values (the True/False values) from this series to index this series, leaving us with only the columns that need removing:

print(~df.any()[~df.any()])
## only_falses          True
## only_nones           True
## only_nans            True
## only_zeroes          True
## falses_and_nones     True
## falses_and_nans      True
## falses_and_zeroes    True
## nones_and_nans       True
## nones_and_zeroes     True
## nans_and_zeroes      True
## dtype: bool

Now we can take the column names (which are the indices of this series) and drop them from the original data frame:

# Drop the columns that only contain 'False', 'None', 'NaN' and '0'
subset = df.drop(df.any()[~df.any()].index, axis=1)

print(subset)
##   only_text only_trues text_and_nones
## 0   Value 1       True        Value 1
## 1   Value 2       True        Value 2
## 2   Value 3       True           None
## 3   Value 4       True           None

This is the result we were looking for: the original data frame minus the columns that only contain non-values.

5.2 Using a Loop

The same output as above can be obtained by looping over the columns and testing them individually using the .any() method:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain Falses, Nones and/or NaNs
for col in df:
    if not df[col].any():
        df = df.drop(col, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'nans_and_zeroes', 'only_nans', 'only_falses', 'falses_and_nones', 'nones_and_zeroes', 'nones_and_nans', 'only_zeroes', 'falses_and_nans', 'only_nones', 'falses_and_zeroes'}
# Show the output
print(df)
##   only_text only_trues text_and_nones
## 0   Value 1       True        Value 1
## 1   Value 2       True        Value 2
## 2   Value 3       True           None
## 3   Value 4       True           None

6 Note: ‘Category’ Columns Can Be a Problem

If there is a column that is of the ‘category’ type then the above methods do not always work as expected. To demonstrate, let’s convert one of the columns to this type:

# Change the type of the falses_and_nones column
raw['falses_and_nones'] = raw['falses_and_nones'].astype('category')

print(raw)
##   only_text only_trues text_and_nones only_falses only_nones only_nans only_zeroes falses_and_nones falses_and_nans  \
## 0   Value 1       True        Value 1       False       None       NaN           0            False           False   
## 1   Value 2       True        Value 2       False       None       NaN           0            False           False   
## 2   Value 3       True           None       False       None       NaN           0              NaN             NaN   
## 3   Value 4       True           None       False       None       NaN           0              NaN             NaN   
## 
##   falses_and_zeroes nones_and_nans nones_and_zeroes nans_and_zeroes  
## 0             False           None             None             NaN  
## 1             False           None             None             NaN  
## 2                 0            NaN                0               0  
## 3                 0            NaN                0               0

Note that the “None” values in the falses_and_nones column has been changed to “NaN”s by the re-typing process.

6.1 Drop Columns That Only Contain “False”, “None”, “NaN” and “0”

Now, let’s attempt the first method we used above to drop columns containing only “False”, “None”, “NaN” and “0” (ie using df.any()):

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'False', 'None', 'NaN' and '0'
df = df.drop(df.any()[~df.any()].index, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'falses_and_zeroes', 'nones_and_zeroes', 'falses_and_nans', 'nans_and_zeroes', 'only_nans', 'only_zeroes', 'only_nones', 'only_falses', 'nones_and_nans'}

This has not worked as expected because falses_and_nones (the ‘category’ column) has not been removed!

Using .fillna() to fill all “None” and “NaN” values with “False” does not work either, regardless of whether you have bool_only=True or not:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'False', 'None', 'NaN' and '0'
df = df.fillna(False)
for col in list(df.any()[~df.any()].index):
    df = df.drop(col, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses', 'falses_and_nans', 'nones_and_nans', 'only_zeroes', 'nones_and_zeroes', 'only_nones', 'falses_and_zeroes', 'nans_and_zeroes', 'only_nans'}
# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'False'
df = df.fillna(False)
for col in list(df.any(bool_only=True)[~df.any(bool_only=True)].index):
    df = df.drop(col, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses', 'nones_and_nans', 'falses_and_nans', 'only_nones', 'only_nans'}

In both cases, falses_and_nones has not been removed.

6.2 Drop Columns That Only Contain “False” and “None”

Again, we will attempt the method that worked previously:

# Make a copy of the data frame
df = raw.copy()

# Find the falses
falses = (df == False) & (df.astype(str) != '0')
# Find the nones
nones = df.applymap(type) == type(None)
# Find both the falses and the nones
falses_nones = falses | nones
# Find columns with only falses and nones
cols = falses_nones.all()[falses_nones.all()].index.to_list()
# Drop these columns
df = df.drop(cols, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_nones', 'only_falses'}

…and again this has not worked as expected! The falses_and_nones column which we converted to the ‘category’ type is still not being removed.

6.3 Drop Columns That Only Contain “False”

Now let’s try the method that drops columns containing only “False” (ie using bool_only=True):

# Make a copy of the data frame
df = raw.copy()

# Find which columns contain only Booleans and only Falses. Use the `bool_only=True`
# option with the `any()` method to have it only recognise Booleans:
bool_cols = ~df.any(bool_only=True)
only_falses = bool_cols[bool_cols]
# Get the names of these columns
cols_only_falses = only_falses.index.to_list()
# Remove these columns
df = df.drop(cols_only_falses, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'only_falses'}

This has worked as expected!

6.4 Solution: Just Avoid Columns of Type ‘Category’

Iterate though the columns and treat them differently depending on whether they are categorical or not:

# Make a copy of the data frame
df = raw.copy()

# Drop the columns that only contain 'False', 'None', 'NaN' and/or '0'
for col in list(df):
    if str(df[col].dtype) != 'category':
        # If the column is not categorical, check if it has any True values...
        if not df[col].any():
            # ...if not, remove it
            df = df.drop(col, axis=1)
    else:
        # If the column is categorical, convert it to Boolean THEN check it...
        if not df[col].astype('bool').any():
            # ...if there are no True values, remove it
            df = df.drop(col, axis=1)

# Which columns have been removed?
print(set(list(raw)) - set(list(df)))
## {'nans_and_zeroes', 'only_nans', 'only_falses', 'falses_and_nones', 'nones_and_zeroes', 'nones_and_nans', 'only_zeroes', 'falses_and_nans', 'only_nones', 'falses_and_zeroes'}

As expected, all columns that only contain “False”, “None”, “NaN” or “0” (or a combination thereof) have been removed.

⇦ Back