May 31, 2021 Article blog
Python3 consolidates the urllib and urllib2 libraries in Python2 into one urllib library, so what is the urllib library in Python3? H ow do I use the urllib library? What are the common functions of the urllib library?
Urllib is divided into the following four functional modules:
Urllib is Python's built-in HTTP request library, which requires no installation, can be used directly, and is a commonly used library for reptile developers.
1, urllib.request.urlopen() function
Create a file object that identifies the remote url, and then manipulate the class file object like a local file to get the remote data. The syntax is as follows:
urllib.request.urlopen(url,data = None,[timeout]*,cafile = None,capath = None,cadefault = False,context = None)
from urllib import request
Response = Request.urlopen ('http://www.baidu.com') #get method request
Print (response.read (). Decode ('UTF-8') # Get the content of the response and decoding the operation
Urlopen() returns objects in a number of ways:
2, urllib.request.urlretrieve() function
This function makes it easy to save a file on a Web page locally. T he syntax is as follows:
urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
from urllib import request
Request.urlReve ('http://www.baidu.com/' ,'baidu.html') # Download Baidu's Home Information to Local
3, urllib.parse.urlencode() function
urlencode converts dictionary data to URL-encoded data. The syntax is as follows:
urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus)
from urllib import parse
Data = {'Name': 'W3cschool', 'Say': 'Hello W3cschool', 'Age': 100}
qs = parse.urlencode(data)
print(qs)
# %E5%A7%93%E5%90%8D=W3CSchool&%E9%97%AE%E5%A5%BD=Hello+W3CSchool&%E5%B9%B4%E9%BE%84=100
4, urllib.parse.parse_qs() function
The encoded url parameter can be decoded. T he syntax is as follows:
urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
from urllib import parse
Data = {'Name': 'W3cschool', 'Say': 'Hello W3cschool', 'Age': 100}
qs = parse.urlencode(data)
print(qs)
#%E5%A7%93%E5%90%8D=W3CSchool&%E9%97%AE%E5%A5%BD=hello+W3CSchool&%E5%B9%B4%E9%BE%84=100
print(parse.parse_qs(qs))
# {'Name': ['W3CSChool'], 'Say ": [' Hello W3cschool '],' Age ': [' 100 ']}
5, urllib.parse.parse_qsl() function
The basic usage is consistent with the parse_qs() function, except that the urllib.parse.parse_qs() function returns to the dictionary and the urllib.parse.parse_qsl() function is put back into the list. The syntax is as follows:
urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
from urllib import parse
Data = {'Name': 'W3cschool', 'Say': 'Hello W3cschool', 'Age': 100}
qs = parse.urlencode(data)
print(parse.parse_qs(qs))
# [('Name', 'W3cschool'), ('Say ",' Hello W3cschool ', (' Age ',' 100 ')]
6, urllib.parse.urlparse() and urllib.parse.urlsplit() functions
Sometimes when you get a url and want to split the components in that url, you can use urlparse or urlsplit to split it. T he syntax is as follows:
urllib.parse.urlparse(urlstring, scheme=”, allow_fragments=True)
urllib.parse.urlsplit(urlstring, scheme=”, allow_fragments=True)
urlparse and urlsplit are basically the same.
The only difference is that the urlparse has a params property, and urlsplit does not have this params property.
from urllib import parse
url = 'http://www.baidu.com/index.html;user?id=S#comment'
result = parse.urlparse(url)
# result = parse.urlsplit(url)
print(result)
print(result.scheme)
print(result.netloc)
print(result.path)
Print.Params # urlparse has params properties, and Urlsplit does not have this property.
7, urllib.error module
The error module of the urllib defines the exceptions that are generated by the urllib.request request. I
f the request fails, urllib.request throws an exception to the error module.
from urllib import request, error
try:
resp = request.urlopen('https://w3cschool.c/index.html')
except error.URLError as e:
print(e.reason)
# [Errno 11001] getaddrinfo failed
It is a subclass of URLError, designed to handle HTTP request errors, with three properties:
from urllib import request
Response = Request.urlopen ('http://www.baidu.com') #get method request
Print (response.read (). Decode ('UTF-8') # Get the content of the response and decoding the operation
#404
Of course, most of the time, the URLError and HTTPError are combined to deal with exceptions, first through HTTPError to catch url error status code, exception reason, request first information, if not this type of error, then capture URLError, output error cause, and finally else to handle normal logic.
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
resp = Request('http://www.baidu.cnc/')
try:
response = urlopen(resp)
except HTTPError as e:
Print ('(www.baidu.cnc) server cannot complete the request.')
Print ('error code:', e.code)
except URLError as e:
Print ('We cannot connect to the server.')
Print ('Cause:', E.reason) ELSE:
Print ("The link is successful!")
print(response.read().decode("utf-8"))
The above is a commonly used function in the urllib library, I hope you pay more attention to contact in front of the screen, the combination of theory and practice is the best way to learn! Recommended reading: Python static reptiles, Python Scrapy web crawlers.
Finally, let's summarize the common various status code meanings: