Jun 01, 2021 Article blog
This article was reproduced to Know ID: Charles (Bai Lu) knows his personal column
Download the W3Cschool Mobile App, 0 Foundation Anytime, Anywhere Learning Programming >> Poke this to learn
Use Python to crawl and simply analyze fish oil data from the Fish C Forum. b20> Not wanting to affect the proper functioning of the forum and putting unnecessary pressure on the forum server, the reptile code wrote a lot of sleep statements and spent about a week piecing together 400,000 pieces of data (and then cleaning the data to find that most of the data was invalid T_T). B ecause the data is provided in the relevant documents, so do not climb again, because the forum seems to have no anti-climbing measures, so the reptile code does not have any learning value, just glance at it.
OK, let's start happily
Python version: 3.6.4
Related modules:
requests module;
fake_useragent modules;
pyechart module;
And some python comes with modules.
Install Python and add it to the environment variable, and pip installs the relevant modules that are required.
In fact, the reptile's article I usually write more water, and then occasionally snag other people's articles, found that the basic data acquisition, cleaning, and finally visual analysis of this process, so I intend to imitate a little bit, although I may still write more casual and extremely unprofessional.
Data acquisition:
This part is very simple, fish C forum each fish oil profile is different except uid, everything else is the same:
Ask for the url of each fish oil profile one by one, and then save the returned content:
The final data obtained is probably so much, a total of about 400,000, although most of it is invalid.
Data cleaning:
Next, we extract some useful data from each fish oil's profile, such as gender, birthday, place of birth, education, and more, as follows:
The last remaining data is probably so much more:
It's too real for him, and in the end there's probably only over 10,000 pieces of data left. Then I went to look at it and found that the home page of a lot of fish oil was like this:
There are many uids that are invalid users:
Originally wanted to climb again, and then felt that it had to drag a few days too much trouble, forget it, that's it, 10,000 pieces of data is also data ah, anyway estimated that the whole climb is not much data.
Visual analysis of data:
First, let's take a look at the ratio of men to women in fish C forum fish oil:
Emmmm, the original forum still has girls, I always thought there were no girls.
OK, let's take a look at the distribution of fish oil education in the forum:
There were nearly 200 Ph.D.s, and it was amazing.
OK, let's take a look at the provincial distribution of forum fish oil, here only count the domestic fish oil:
Look at the fish C forum fish oil from five lakes and four seas, of course, fish oil in Guangdong Province is the most, visual forum of the main small fish should be Cantonese bar.
Next, let's look at the age distribution of fish C forum fish oil:
It is estimated that most of the fish oil in the forum is still a post-90s student party, although most of the post-90s should have graduated.
Finally, let's take a look at which fish oil in the forum compares the earthy bar, that is, the fish currency and C coin the most fish oil, the statistical results are shown in the following figure:
Originally wanted to analyze the analysis of the highest technical value of fish oil and so on, but the meal point arrived, then it is T_T. Interested students can download their own data to continue analysis and analysis.
Well, that's it, at least the writing steps of this kind of article should mimic the right one, with full source code and data detailed in the relevant documents.
The code will be tested correctly by September 14, 2018