Python ---- first met Pandas

May 30, 2021 Article blog

1. preface

2. First, the pandas operating process

3. Second, the creation of pandas

4. Three, df's lookup

5. Fourth, df increase method

6. V. Delete the method

7. Six, modify

8. Seven, statistical analysis

preface

Pandas is a Numpy-based tool created to solve data analysis tasks, incorporating a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large data sets, and pandas providing a number of functions and methods that enable us to process data quickly and easily. R ecommended lessons: Pandas Chinese Tutorials, Python3 Advanced: Data Analysis and Visualization.

First, the pandas operating process

The addition, deletion and revision of the table data;
Multi-table processing is achieved;
Data cleaning operations: missing values, duplicate values, outliers, data standardization, data conversion operations;
Implement excel's special operation, generate perspective table, cross table;
Complete the statistical analysis.

Second, the creation of pandas

1, import pandas library

import pandas as pd

2, table structure data, build Dataframe

column:column index:row index values:element data

Mode 1:

df = pd.DataFrame(

Data = [["Alex ', 20,' Men, '0831'], ['Tom', 30, 'Female', '0830'],],

Index = ['A', 'B'], # can not write, the default starts from 0, or you can specify the character to sort

columns=['name', 'age', 'sex', 'class'],

# Build method

Print (DF) # Print Data

name age sex class
a a Alex 20 male 0831
B Tom 30 female 0830

Mode 2:

DF1 = pd.dataframe (Data = {'Name': ['Tom', 'Alex'], 'Age': [18, 20], 'SEX': ['Male', 'Female'], 'Class': ['0831', '0831']})

Print (DF) # Prints the data without specifying the index character sort, start sorting from 0 by default

name age sex class
0 alex 20 male 0831
1 Tom 30 female 0830

3, the properties of dataframe

Because pandas are based on numpy, the properties of numpy's ndarray, dataframe, also have.

df.shape- structure

df.ndim s dimension

The number of df.sizes

The data type of the df.dtypes element

df.columns . . . column index

df.index s row index

The df.values element

Three, df's lookup

1, index a column value

The one-dimensional cut of df1'name', returned is series

Print (DF1 ['Name']) # Sliced a column value

0 tom
1 alex

2, the method of cutting multi-column values

print(df1[['name', 'age']])

name age

0 tom 18

1 alex 20

Print (Type (DF1 [['Name', 'Age']]) # Series is a one-dimensional type, only one axis

3, index cut method

Method 1:

Print (DF [['Name', 'Age'] [: 2]) # cannot specify the row to index

name age

a alex 20

b tom 30

Method two:

Method of index cut: df.loc (row index name, condition, column index name)

print(df.loc['a', 'name'])

alex

df.loc['a', ['name']] # <class 'pandas.core.series.Series'> 行或者列，只要有一个为字符串，是一维

df.loc[['a'], ['name']] # <class 'pandas.core.frame.DataFrame'> 行或者列，两个参数都为列表，是二维

4, conditional index: bool slice

Mask = DF ['agn']> 18 # Returns all students than 18-year-old classmates, return True, False

Mask2 = DF ['SEX'] == 'Female' # Returns all women's classmates

Mask3 = Mask & Mask2 # combines two MASKs, and can not use and use & logic

print(mask3)

a False

b True

dtype: bool

Print (Df.loc [Mask3,:]) # Slices of data using MASK

name age sex class

B Tom 30 female 0830

5, index query: iloc (index of rows, index of columns) . . . before closed and open

print(df.iloc[:1, :])

name age sex class

a a Alex 20 male 0831

Fourth, df increase method

1, key value pair to add columns

# DF ['address'] = [' Beijing ', Shanghai'] two ways, one, directly equal to 'Beijing', all data will become Beijing

DF ['Address'] = 'Beijing'

name age sex class address

A a Alex 20 male 0831 Beijing

B Tom 30 Female 0830 Beijing

2, append add lines

df_mini = pd.DataFrame(data = {

'name':['jerry', 'make'],

'age':[15, 18],

'SEX': ['Male', 'Female'],

'class':['0831', '0770'],

'Address': ['Beijing', 'Henan']

}, index = ['a', 'b'])

df4 = df.append(df_mini)

print(df4)

A a Alex 20 male 0831 Beijing

B Tom 30 Female 0830 Beijing

A Jerry 15 male 0831 Beijing

B Make 18 Female 0770 Henan

V. Delete the method

Axis: Deleted rows or columns

INPLACE: Whether to modify the original table

A = df4.drop (labels = ['address', 'class'], axis = 1) # Delete columns need to use a variable acceptance

df4.drop(labels=['a'], axis=0, inplace=True)

Six, modify

Cut out the specified data and then make assignment modifications

C = DF4.LOC [DF4 ['name'] == 'Tom', 'Class'] = 'has problems'

print(c)

name age sex class address

A a Alex 20 male 0831 Beijing

B Tom 30 women have problems Beijing

A Jerry 15 male 0831 Beijing

B Make 18 Female 0770 Henan

Seven, statistical analysis

1, extending 10 statistical methods in Numpy

min() argmin() max() argmax() std() vat() sum() mean() cumsum() cumprod()

2, the method in pandas

df['age'].min() df['age'].max() df['age'].argsort()

3, majority, non-empty elements, frequency

df['age'].mode()

a grade

b grade

dtype: object

df['age'].count()

tom 1

make 1

alex 1

jerry 1

Name: name, dtype: int64

df['age'].value_counts()

name alex

age 20

SEX female

class 0830

Address Beijing

dtype: object

4, for the df type

DF ['agn']. IDXMAX (AXIS = 1) # horizontal comparison

DF ['agn']. IDXMAX (AXIS = 0) #ir comparison

name age sex class address

0 alex 15 female 0831 Beijing

1 Jerry 18 male nan nan

2 make 20 NaN NaN NaN

3 tom 30 NaN NaN NaN

5, describe describe

df['age'].describe()

# age

# Count 4.00 Number of non-space

# Mean 20.75 average

# STD 6.50 standard difference

# Min 15.00 Minimum

# 25% 17.25 1/4

# 50% 19.00 2/4

# 75% 22.50 3/4

# MAX 30.00 Max

df['name'].describe()

# Count: Number of non-space

# Unique: There are several values after it is heavy.

# TOP: Number

# Freq: The number of frequent numbers

Eight, Excel file reading

Pandas can read a variety of data types, and here's how to read excel data

Pd.read_excel (R 'File Path')

Python ---- first met Pandas

Table of contents

preface

First, the pandas operating process

Second, the creation of pandas

2, table structure data, build Dataframe

3, the properties of dataframe

Three, df's lookup

Fourth, df increase method

V. Delete the method

Six, modify

Seven, statistical analysis

Eight, Excel file reading

Cookie Consent