May 10, 2021 Python2
XML refers to the extensible markup language (e X tensible M arkup L anguage). You can take the XML tutorial from this site
XML is designed to transfer and store data.
XML is a set of rules that define semantic tags that divide documents into parts and identify them.
It is also a meta-tag language, which defines the synth language used to define other semantic, structured markup languages related to a particular domain.
Common XML programming interfaces are DOM and SAX, which handle XML files in different ways and, of course, in different situations.
python has three ways to resolve XML, SAX, DOM, and ElementTree:
The pyhton standard library contains SAX parsers, which use an event-driven model to process XML files by triggering events one by one during the resolution of XML and calling user-defined callback functions.
The XML data is parsed into a tree in memory, and the XML is operated by the operation of the tree.
ElementTree is like a lightweight DOM with a convenient and friendly API. Code availability is good, fast, and consumes less memory.
Note: Because doM needs to map XML data to the tree in memory, one is slower, the other is memory-consuming, while SAX streams XML files faster and uses less memory, but requires the user to implement a callback function.
The XML instance file movies used in this section .xml as follows:
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
<type>War, Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
<type>Anime, Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>8</stars>
<description>A schientific fiction</description>
</movie>
<movie title="Trigun">
<type>Anime, Action</type>
<format>DVD</format>
<episodes>4</episodes>
<rating>PG</rating>
<stars>10</stars>
<description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
<type>Comedy</type>
<format>VHS</format>
<rating>PG</rating>
<stars>2</stars>
<description>Viewable boredom</description>
</movie>
</collection>
SAX is an event-driven API.
Parsing an XML document with SAX involves two parts: the parser and the event processor.
The parser is responsible for reading the XML document and sending events to the event processor, such as the element starting to end the event with the element;
The event handler, on the other hand, is responsible for handling the transmitted XML data accordingly.
Using sax in python to handle xml starts with the introduction of the parse function in xml.sax and contentHandler in xml.sax.handler.
Characters (content) method
Timing of call:
Starting with the line, there are characters before the label is encountered, and the value of content is these strings.
From one label, before the next label is encountered, there are characters, and the value of content is these strings.
From a label, before encountering a line end character, there are characters, and the value of content is these strings.
A label can be a start label or an end label.
StartDocument() method
Called when the document starts.
EndDocument() method
The parser is called when it reaches the end of the document.
StartElement (name, attrs) method
When an XML start label is called, name is the name of the label and attrs is the property value dictionary of the label.
EndElement (name) method
Called when an XML end label is encountered.
The following method creates a new parser object and returns it.
xml.sax.make_parser( [parser_list] )
Description of the parameters:
Here's how to create an SAX parser and parse the xml document:
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Description of the parameters:
The parseString method creates an XML parser and parses the xml string:
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Description of the parameters:
#coding=utf-8
#!/usr/bin/python
import xml.sax
class MovieHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.type = ""
self.format = ""
self.year = ""
self.rating = ""
self.stars = ""
self.description = ""
# 元素开始事件处理
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "movie":
print "*****Movie*****"
title = attributes["title"]
print "Title:", title
# 元素结束事件处理
def endElement(self, tag):
if self.CurrentData == "type":
print "Type:", self.type
elif self.CurrentData == "format":
print "Format:", self.format
elif self.CurrentData == "year":
print "Year:", self.year
elif self.CurrentData == "rating":
print "Rating:", self.rating
elif self.CurrentData == "stars":
print "Stars:", self.stars
elif self.CurrentData == "description":
print "Description:", self.description
self.CurrentData = ""
# 内容事件处理
def characters(self, content):
if self.CurrentData == "type":
self.type = content
elif self.CurrentData == "format":
self.format = content
elif self.CurrentData == "year":
self.year = content
elif self.CurrentData == "rating":
self.rating = content
elif self.CurrentData == "stars":
self.stars = content
elif self.CurrentData == "description":
self.description = content
if ( __name__ == "__main__"):
# 创建一个 XMLReader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
# 重写 ContextHandler
Handler = MovieHandler()
parser.setContentHandler( Handler )
parser.parse("movies.xml")
The above code executes as follows:
*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Year: 2003
Rating: PG
Stars: 10
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Year: 1989
Rating: R
Stars: 8
Description: A schientific fiction
*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Stars: 10
Description: Vash the Stampede!
*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Stars: 2
Description: Viewable boredom
For the full SAX API documentation, please refer to Python SAX APIs
Document Object Model, or DOM, is the standard programming interface recommended by the W3C organization to handle extensable labeling languages.
When a DOM parser parses an XML document, it reads the entire document at once, keeps all the elements in the document in a tree structure in memory, and then you can use the different functions provided by the DOM to read or modify the contents and structure of the document, or you can write the modified contents to the xml file.
In python, the x ml file is parsed with x ml.dom.minidom, as follows:
#coding=utf-8
#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
# 使用minidom解析器打开 XML 文档
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
print "Root element : %s" % collection.getAttribute("shelf")
# 在集合中获取所有电影
movies = collection.getElementsByTagName("movie")
# 打印每部电影的详细信息
for movie in movies:
print "*****Movie*****"
if movie.hasAttribute("title"):
print "Title: %s" % movie.getAttribute("title")
type = movie.getElementsByTagName('type')[0]
print "Type: %s" % type.childNodes[0].data
format = movie.getElementsByTagName('format')[0]
print "Format: %s" % format.childNodes[0].data
rating = movie.getElementsByTagName('rating')[0]
print "Rating: %s" % rating.childNodes[0].data
description = movie.getElementsByTagName('description')[0]
print "Description: %s" % description.childNodes[0].data
The results of the above procedures are as follows:
Root element : New Arrivals
*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Rating: PG
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Rating: R
Description: A schientific fiction
*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Description: Vash the Stampede!
*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Description: Viewable boredom
For the complete DOM API documentation, check out Python DOM APIs.