You can remove NaN
from pandas.DataFrame
and pandas.Series
with the dropna()
method.
- pandas.DataFrame.dropna — pandas 2.0.3 documentation
- pandas.Series.dropna — pandas 2.0.3 documentation
Contents
- Remove rows/columns where all elements are NaN: how='all'
- Remove rows/columns that contain at least one NaN: how='any' (default)
- Remove rows/columns according to the number of non-missing values: thresh
- Remove based on specific rows/columns: subset
- Update the original object: inplace
- For pandas.Series
While this article primarily deals with NaN
(Not a Number), it's important to note that in pandas, None
is also treated as a missing value.
- Missing values in pandas (nan, None, pd.NA)
See the following article on extracting, replacing, and counting missing values.
- pandas: Find rows/columns with NaN (missing values)
- pandas: Replace NaN (missing values) with fillna()
- pandas: Detect and count NaN (missing values) with isnull(), isna()
The sample code in this article uses pandas version 2.0.3
. As an example, read a CSV file with missing values.
import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_dropna.py
Remove rows/columns where all elements are NaN
: how='all'
By setting how='all'
, rows where all elements are NaN
are removed.
print(df.dropna(how='all'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_dropna.py
If axis
is set to 1
or 'columns'
, columns where all elements are NaN
are removed.
print(df.dropna(how='all', axis=1))# name age state point# 0 Alice 24.0 NY NaN# 1 NaN NaN NaN NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN
source: pandas_nan_dropna.py
Note that if axis
is set to 0
or 'index'
, rows are removed. Since the default value of axis
is 0
, rows are removed if omitted, as shown in the first example.
In former versions, both rows and columns were removed with axis=[0, 1]
, but since version 1.0.0
, axis
can no longer be specified with a list or tuple. If you want to remove both rows and columns, you can repeatedly apply dropna()
.
# print(df.dropna(how='all', axis=[0, 1]))# TypeError: supplying multiple axes to axis is no longer supported.print(df.dropna(how='all').dropna(how='all', axis=1))# name age state point# 0 Alice 24.0 NY NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN
source: pandas_nan_dropna.py
Remove rows/columns that contain at least one NaN
: how='any'
(default)
To use as an example, remove rows and columns where all values are NaN
.
df2 = df.dropna(how='all').dropna(how='all', axis=1)print(df2)# name age state point# 0 Alice 24.0 NY NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN
source: pandas_nan_dropna.py
By setting how='any'
, rows that contain at least one NaN
are removed. Since the default value of how
is 'any'
, the result is the same even if omitted.
print(df2.dropna(how='any'))# name age state point# 3 Dave 68.0 TX 70.0print(df2.dropna())# name age state point# 3 Dave 68.0 TX 70.0
source: pandas_nan_dropna.py
If axis
is set to 1
or 'columns'
, columns that contain at least one NaN
are removed.
print(df2.dropna(axis=1))# name# 0 Alice# 2 Charlie# 3 Dave# 4 Ellen# 5 Frank
source: pandas_nan_dropna.py
Remove rows/columns according to the number of non-missing values: thresh
With the thresh
argument, you can remove rows and columns according to the number of non-missing values.
For example, if thresh=3
, the rows that contain more than three non-missing values remain, and the other rows are removed.
print(df.dropna(thresh=3))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN
source: pandas_nan_dropna.py
If axis
is set to 1
or 'columns'
, columns are removed.
print(df.dropna(thresh=3, axis=1))# name age state# 0 Alice 24.0 NY# 1 NaN NaN NaN# 2 Charlie NaN CA# 3 Dave 68.0 TX# 4 Ellen NaN CA# 5 Frank 30.0 NaN
source: pandas_nan_dropna.py
Remove based on specific rows/columns: subset
If you want to remove based on specific rows and columns, specify a list of rows/columns labels (names) to the subset
argument of dropna()
. Even if you want to set only one label, you need to specify it as a list, like subset=['name']
.
Since the default is how='any'
and axis=0
, rows with NaN
in the columns specified by subset
are removed.
print(df.dropna(subset=['age']))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 5 Frank 30.0 NaN NaN NaNprint(df.dropna(subset=['age', 'state']))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN
source: pandas_nan_dropna.py
If how
is set to 'all'
, rows with NaN
in all specified columns are removed.
print(df.dropna(subset=['age', 'state'], how='all'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_dropna.py
If axis
is set to 1
or 'columns'
, columns are removed.
print(df.dropna(subset=[0, 4], axis=1))# name state# 0 Alice NY# 1 NaN NaN# 2 Charlie CA# 3 Dave TX# 4 Ellen CA# 5 Frank NaNprint(df.dropna(subset=[0, 4], axis=1, how='all'))# name age state point# 0 Alice 24.0 NY NaN# 1 NaN NaN NaN NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN
source: pandas_nan_dropna.py
An error is raised if a non-existent row or column name is specified. An error is also raised if you set axis=1
but specify column names or set axis=0
(default) but specify row names.
# print(df.dropna(subset=['age', 'state', 'xxx']))# KeyError: ['xxx']# print(df.dropna(subset=['age', 'state'], axis=1))# KeyError: ['age', 'state']
source: pandas_nan_dropna.py
Update the original object: inplace
As shown in the examples above, by default, a new object is returned, and the original object is not changed, but if inplace=True
, the original object itself is updated.
df.dropna(subset=['age'], inplace=True)print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_dropna.py
For pandas.Series
The only valid argument for dropna()
of pandas.Series
is inplace
. Since it is one-dimensional data, the elements with NaN
are simply removed.
s = pd.read_csv('data/src/sample_pandas_normal_nan.csv')['age']print(s)# 0 24.0# 1 NaN# 2 NaN# 3 68.0# 4 NaN# 5 30.0# Name: age, dtype: float64print(s.dropna())# 0 24.0# 3 68.0# 5 30.0# Name: age, dtype: float64s.dropna(inplace=True)print(s)# 0 24.0# 3 68.0# 5 30.0# Name: age, dtype: float64
source: pandas_nan_dropna.py