Python ---- Pandas: Missing Value Processing

May 30, 2021 Article blog

preface

Hello, Everyone In Front of the Screen, Year After Year, Day After Day, In A Flash Between The Fantastic 2020, IS About To Pass, The Editor-in-Chief Here in Advance Wish You All: New Year's Day Fast 楽. T oday, share with you several ways to encounter missing values when analyzing data with pandas. Recommended lessons: Python Automation Office, Python3 Advanced: Data Analysis and Visibility.

Pandas provides a comprehensive approach to this when we encounter numerical deficiencies when working with data, including:

isull() - find the missing value;
notnull() - find out the non-missing value;
dropna() - reject missing values;
Fillna() - Fills the missing value. See how you use it.

I. isull()

Isull() is used to find out where the missing value is, returning a Boolean-type mask to mark the missing value, and here is the case:

import pandas as pd

import numpy as np

data = pd.DataFrame({'name':['W3CSCHOOL',np.nan,'JAVA','PYTHON'],'age':[18,np.nan,99,None]})

data

The data obtained by executing the above code is as follows:

name age

0 W3CSCHOOL 18.0

1 NaN NaN

2 JAVA 99.0

3 PYTHON NaN

Here we can see that whether we create DataFrame with np.nan or Non, it becomes NaN when we create it.

data.isnull()

name age

0 False False

1 True True

2 False False

3 False True

Second, notnull()

"Notnull() is the opposite of isnull() to find out non-empty values and mark them with Boolean values, and here is an example:"

data.notnull()

name age

0 True True

1 False False

2 True True

3 True False

Three, dropna()

Dropna() literally means losing the missing value.

DataFrame.dropna(axis=0, how=‘any’, thresh=None, subset=None, inplace=False)

parameter:

axis: 0 by default, indicating whether rows or columns are deleted, or "index" and "columns";
how: 'any', 'all', default to 'any'; any means to delete the whole row (column) as soon as the row (column) is empty, and all means that the whole row (column) is empty to delete the whole row (column);
thresh: indicates deletion when the non-empty value is less than the number of thresh;
subset: a list type that indicates which columns have empty values to delete rows or columns;
Inplace: As with other functions, it indicates whether the original DataFrame is overwritten.

Here's an example:

data.dropna(axis=1,thresh=3)

name

0 W3CSCHOOL

1 NaN

2 JAVA

3 PYTHON

data.dropna(axis=0,how='all')

name age

0 W3CSCHOOL 18.0

2 JAVA 21.0

3 PYTHON NaN

data.dropna(subset = ['name'])

name age

0 W3CSCHOOL 18.0

2 JAVA 21.0

3 PYTHON NaN

Four, fillna()

The purpose of fillna() is to fill in the missing values

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

parameter:

Value: Sets the value used to populate DataFrame
Method: None by default; setting the methods for populating DataFrame are: 'backfill', 'bfill', -'pad', 'ffill' four, where 'backfill' and 'bfill' are filled with the previous values, and 'pad' and 'ffill' are filled with the following values
axis: Fills the axis along which the missing value is set, as in the axis setting method above
Inplace: Whether to replace the original DataFrame is the same as the setup method above
limit: Sets a limit on the number of replacement values
downcast: represents a down-compatible conversion type that is not commonly used

Here's an example:

data.fillna(0)

name age

0 W3CSCHOOL 18.0

1 0 0.0

2 JAVA 21.0

3 PYTHON 0.0

data.fillna(method='ffill')

name age

0 W3CSCHOOL 18.0

1 W3CSCHOOL 18.0

2 JAVA 21.0

3 PYTHON 21.0

Python ---- Pandas: Missing Value Processing

Table of contents

preface

I. isull()

Second, notnull()

Three, dropna()

Four, fillna()

Cookie Consent