Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Python ---- Pandas: Missing Value Processing


May 30, 2021 Article blog


Table of contents


preface

Hello, Everyone In Front of the Screen, Year After Year, Day After Day, In A Flash Between The Fantastic 2020, IS About To Pass, The Editor-in-Chief Here in Advance Wish You All: New Year's Day Fast 楽. T oday, share with you several ways to encounter missing values when analyzing data with pandas. Recommended lessons: Python Automation Office, Python3 Advanced: Data Analysis and Visibility.

Pandas provides a comprehensive approach to this when we encounter numerical deficiencies when working with data, including:

  • isull() - find the missing value;
  • notnull() - find out the non-missing value;
  • dropna() - reject missing values;
  • Fillna() - Fills the missing value. See how you use it.

I. isull()

Isull() is used to find out where the missing value is, returning a Boolean-type mask to mark the missing value, and here is the case:

import pandas as pd

import numpy as np

data = pd.DataFrame({'name':['W3CSCHOOL',np.nan,'JAVA','PYTHON'],'age':[18,np.nan,99,None]})

data

The data obtained by executing the above code is as follows:

name   age

0  W3CSCHOOL   18.0

1    NaN NaN

2       JAVA   99.0

3     PYTHON   NaN

Here we can see that whether we create DataFrame with np.nan or Non, it becomes NaN when we create it.

data.isnull()

name    age

0  False  False

1   True   True

2  False  False

3  False   True

Second, notnull()

"Notnull() is the opposite of isnull() to find out non-empty values and mark them with Boolean values, and here is an example:"

data.notnull()

name    age

0   True   True

1  False  False

2   True   True

3   True  False

Three, dropna()

Dropna() literally means losing the missing value.

DataFrame.dropna(axis=0, how=‘any’, thresh=None, subset=None, inplace=False)

parameter:

  • axis: 0 by default, indicating whether rows or columns are deleted, or "index" and "columns";
  • how: 'any', 'all', default to 'any'; any means to delete the whole row (column) as soon as the row (column) is empty, and all means that the whole row (column) is empty to delete the whole row (column);
  • thresh: indicates deletion when the non-empty value is less than the number of thresh;
  • subset: a list type that indicates which columns have empty values to delete rows or columns;
  • Inplace: As with other functions, it indicates whether the original DataFrame is overwritten.

Here's an example:

data.dropna(axis=1,thresh=3)

name

0        W3CSCHOOL

1              NaN

2             JAVA

3           PYTHON

data.dropna(axis=0,how='all')

name   age

0        W3CSCHOOL  18.0

2             JAVA  21.0

3           PYTHON   NaN

data.dropna(subset = ['name'])

name   age

0        W3CSCHOOL  18.0

2             JAVA  21.0

3           PYTHON   NaN

Four, fillna()

The purpose of fillna() is to fill in the missing values

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

parameter:

  • Value: Sets the value used to populate DataFrame
  • Method: None by default; setting the methods for populating DataFrame are: 'backfill', 'bfill', -'pad', 'ffill' four, where 'backfill' and 'bfill' are filled with the previous values, and 'pad' and 'ffill' are filled with the following values
  • axis: Fills the axis along which the missing value is set, as in the axis setting method above
  • Inplace: Whether to replace the original DataFrame is the same as the setup method above
  • limit: Sets a limit on the number of replacement values
  • downcast: represents a down-compatible conversion type that is not commonly used

Here's an example:

data.fillna(0)

name   age

0        W3CSCHOOL  18.0

1                0   0.0

2             JAVA  21.0

3           PYTHON   0.0

data.fillna(method='ffill')

name   age

0        W3CSCHOOL  18.0

1        W3CSCHOOL  18.0

2             JAVA  21.0

3           PYTHON  21.0