May 12, 2021 Ruby
1. Ruby XML, XSLT and XPath tutorials
3. XML parser structure and API
XML refers to the extensible markup language (eXtensible Markup Language).
Extensable markup language, a subset of the standard common markup language, a markup language used to mark electronic files so that they have a structure.
It can be used to tag data, define data types, and is a source language that allows users to define their own tag language. I t is ideal for World Wide Web transmission, providing a unified way to describe and exchange structured data independent of applications or vendors.
For more information, check out our XML tutorial
XML parsers are mainly DOM and SAX.
This library, REXML library, can be used for the resolution of XML documents in RUBY.
The REXML Library is an XML toolkit for Ruby and is written in pure Ruby language and adheres to the XML 1.0 specification.
In Ruby 1.8 and beyond, REXML will be included in the RUBY Standard Library.
The path to the REXML library is: rexml/document
All methods and classes are encapsulated in a REXML module.
REXML parsers have the following advantages over other parsers:
Here's the XML code for the instance, saved as .xml:
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>
Let's parse the XML data first, first we'll introduce the rexml/document library, and usually we can introduce REXML into the top-level namespace:
#!/usr/bin/ruby -w require 'rexml/document' include REXML xmlfile = File.new("movies.xml") xmldoc = Document.new(xmlfile) # 获取 root 元素 root = xmldoc.root puts "Root element : " + root.attributes["shelf"] # 以下将输出电影标题 xmldoc.elements.each("collection/movie"){ |e| puts "Movie Title : " + e.attributes["title"] } # 以下将输出所有电影类型 xmldoc.elements.each("collection/movie/type") { |e| puts "Movie Type : " + e.text } # 以下将输出所有电影描述 xmldoc.elements.each("collection/movie/description") { |e| puts "Movie Description : " + e.text }
The output of the above examples is:
Root element : New Arrivals Movie Title : Enemy Behind Movie Title : Transformers Movie Title : Trigun Movie Title : Ishtar Movie Type : War, Thriller Movie Type : Anime, Science Fiction Movie Type : Anime, Action Movie Type : Comedy Movie Description : Talk about a US-Japan war Movie Description : A schientific fiction Movie Description : Vash the Stampede! Movie Description : Viewable boredom SAX-like Parsing:
Working with the same data file: movies .xml, it is not recommended that SAX be resolved to a small file, here is a simple example:
#!/usr/bin/ruby -w require 'rexml/document' require 'rexml/streamlistener' include REXML class MyListener include REXML::StreamListener def tag_start(*args) puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end def text(data) return if data =~ /^\w*$/ # whitespace only abbrev = data[0..40] + (data.length > 40 ? "..." : "") puts " text : #{abbrev.inspect}" end end list = MyListener.new xmlfile = File.new("movies.xml") Document.parse_stream(xmlfile, list)
The above output is:
tag_start: "collection", {"shelf"=>"New Arrivals"} tag_start: "movie", {"title"=>"Enemy Behind"} tag_start: "type", {} text : "War, Thriller" tag_start: "format", {} tag_start: "year", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "Talk about a US-Japan war" tag_start: "movie", {"title"=>"Transformers"} tag_start: "type", {} text : "Anime, Science Fiction" tag_start: "format", {} tag_start: "year", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "A schientific fiction" tag_start: "movie", {"title"=>"Trigun"} tag_start: "type", {} text : "Anime, Action" tag_start: "format", {} tag_start: "episodes", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "Vash the Stampede!" tag_start: "movie", {"title"=>"Ishtar"} tag_start: "type", {} tag_start: "format", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "Viewable boredom"
We can use XPath to view XML, a language that looks for information in XML documentation (see: XPath tutorial).
XPath is the XML path language, which is the language used to determine the location of a portion of the XML (subset of standard common markup language) documents. XPath is based on XML's tree structure and provides the ability to find nodes in the data tree.
Ruby supports XPath through REXML's XPath class, which is tree-based analysis (document object model).
#!/usr/bin/ruby -w require 'rexml/document' include REXML xmlfile = File.new("movies.xml") xmldoc = Document.new(xmlfile) # 第一个电影的信息 movie = XPath.first(xmldoc, "//movie") p movie # 打印所有电影类型 XPath.each(xmldoc, "//type") { |e| puts e.text } # 获取所有电影格式的类型,返回数组 names = XPath.match(xmldoc, "//format").map {|x| x.text } p names
The output of the above examples is:
<movie title='Enemy Behind'> ... </> War, Thriller Anime, Science Fiction Anime, Action Comedy ["DVD", "DVD", "DVD", "VHS"]
Ruby has two XSLT parsers, which are briefly described below:
This parser was written and maintained by Justice Masayoshi Takahash. This is primarily written for the Linux operating system and requires the following libraries:
The XSLT4R requires XMLScan operations and contains the XSLT4R archive, which is a 100% Ruby module. These modules can be installed using the standard Ruby installation method, Ruby install.rb.
The XSLT4R syntax format is as follows:
ruby xslt.rb stylesheet.xsl document.xml [arguments]
If you want to use XSLT4R in your application, you can introduce XSLT and enter the parameters you need. Here's an example:
require "xslt" stylesheet = File.readlines("stylesheet.xsl").to_s xml_doc = File.readlines("document.xml").to_s arguments = { 'image_dir' => '/....' } sheet = XSLT::Stylesheet.new( stylesheet, arguments ) # output to StdOut sheet.apply( xml_doc ) # output to 'str' str = "" sheet.output = [ str ] sheet.apply( xml_doc )