Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Don't worry about multiple Excel tables anymore, Python seconds handle real fragrances!


May 31, 2021 Article blog


Table of contents


This article comes from the public number: Python Technology Author: Pieson Sauce

Why are non-programmers learning Python more and more? I nstead of trying to climb through some website data, they may encounter a lot of data analysis processing problems at work that can be solved easily and efficiently with Python. This article provides a practical example of how Python can be used to effectively solve complex problems in practice.

background

Xiaoming works for an outdoor sports franchise that owns several brands and involves many segments of the industry. X iaoming works as a data analyst for this company, usually through Excel to do data analysis. Today the boss gave him the task of sifting through the group's top five brands and sales for the most recent year before work.

Isn't that a minute-by-minute thing for an Excel boss? Xiaoming didn't take it seriously, and it wasn't until his marketing colleagues sent him the original data file that he realized it wasn't that simple:

 Don't worry about multiple Excel tables anymore, Python seconds handle real fragrances!1

It's not an imaginary sort of top five. There are a total of 90 files, and in general terms, he either copies the contents of all the files to a table for a classification summary, or summarizes each table, and then summarizes the final results.

Think about the workload, think about the deadline, Xiaoming scratched his head, feel to gradually bald.

Idea analysis

This kind of physical work, writing procedures to solve is the easiest. Xiaoming thought of his programmer's good friend at this time, so he threw the question to the small paragraph.

A small section of his leftover hair, said: so easy, just find Master Pan.

Xiaoming said: Can't you figure it out? And find someone else!

A little bitter smile said: No, no, No, Pan master is Python inside a library of processing data, called Pandas, commonly known as Pan master.

Xiaoming said: I don't care what master is not master, say how long it will take to get it done.

The paragraph said: Give me a few minutes to write the program, and then run for a few seconds!

XiaoMing sent his expression of worshipping the big man.

The small paragraph thought a little, sorted out the program ideas:

  • Calculate the sales for each row of each table, using the "Number of visitors conversion rate" for the unit price.
  • Summarize sales by brand in each form.
  • Summarize the results of all tables into a general table
  • Summarize and sort sales by brand in the general table

encode

Step zero, read Excel:

import pandas as pd


df = pd.read_excel("./tables/" + name)

As a first step, calculate the sales within each table:

df['销售额'] = df['访客数'] * df['转化率'] * df['客单价']

In the second step, summarize sales based on the brand in each form:

df_sum = df.groupby('品牌')['销售额'].sum().reset_index()

In the third step, summarize the results of all the tables into a general table:

result = pd.DataFrame()
result = pd.concat([result, df_sum])

In the fourth step, summarize and sort sales by brand in the general table:

final = result.groupby('品牌')['销售额'].sum().reset_index().sort_values('销售额', ascending=False)

Finally, let's look at the complete program:

import pandas as pd
import os


result = pd.DataFrame()


for name in os.listdir("./tables"):
    try:
        df = pd.read_excel("./tables/" + name)
        df['销售额'] = df['访客数'] * df['转化率'] * df['客单价']
        df_sum = df.groupby('品牌')['销售额'].sum().reset_index()
        result = pd.concat([result, df_sum])
    except:
        print(name)
        pass


final = result.groupby('品牌')['销售额'].sum().reset_index().sort_values('销售额', ascending=False)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
print(final.head())

The end result is something like this:

       品牌           销售额
15   品牌-5 1078060923.62
8   品牌-17 1064495314.96
4   品牌-13 1038560274.21
3   品牌-12 1026115153.00
13   品牌-3 1006908609.07

You can see that the final top five has come out, and the whole program is still running fast.

A few minutes later, the small section of the results to Xiaoming sent past, Xiaoming moved to the inner cow full of face, call another day please eat, worship teachers to learn art!

summary

The main purpose of this article is to show you the charm of Pandas through a practical case, especially in this form processing, can be said to be too convenient. A nyone who has written a program may have a somewhat familiar feel, and this approach is a bit like a SQL query statement. Not only does Master Pan make our programs simpler and more efficient to handle, but it's also very friendly to non-programmers who need to work with tables on a regularly, and it's easier to get up and and about.

Above is W3Cschool编程狮 about no longer giving up worry about multiple Excel table summaries, Python seconds processing is really fragrant! Related to the introduction, I hope to help you.