Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Ruby XML, XSLT and XPath tutorials


May 12, 2021 Ruby


Table of contents


Ruby XML, XSLT and XPath tutorials


What is XML?

XML refers to the extensible markup language (eXtensible Markup Language).

Extensable markup language, a subset of the standard common markup language, a markup language used to mark electronic files so that they have a structure.

It can be used to tag data, define data types, and is a source language that allows users to define their own tag language. I t is ideal for World Wide Web transmission, providing a unified way to describe and exchange structured data independent of applications or vendors.

For more information, check out our XML tutorial


XML parser structure and API

XML parsers are mainly DOM and SAX.

  • The SAX parser is event-based and requires scanning the XML document from start to finish, during which an event handler for that particular syntax structure is called each time a syntax structure is encountered, sending an event to the application.
  • DOM is the document object model analysis, build the document's hierarchical syntax structure, establish the DOM tree in memory, DOM tree nodes in the form of objects to identify, the document resolution, the entire DOM tree of the document will be placed in memory.

Resolve and create XML in Ruby

This library, REXML library, can be used for the resolution of XML documents in RUBY.

The REXML Library is an XML toolkit for Ruby and is written in pure Ruby language and adheres to the XML 1.0 specification.

In Ruby 1.8 and beyond, REXML will be included in the RUBY Standard Library.

The path to the REXML library is: rexml/document

All methods and classes are encapsulated in a REXML module.

REXML parsers have the following advantages over other parsers:

  • 100% written by Ruby.
  • Suitable for SAX and DOM parsers.
  • It is lightweight, less than 2000 lines of code.
  • Methods and classes that are easy to understand.
  • Based on SAX2 API and full XPath support.
  • Use Ruby to install without having to install separately.

Here's the XML code for the instance, saved as .xml:

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
   <movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

DOM parser

Let's parse the XML data first, first we'll introduce the rexml/document library, and usually we can introduce REXML into the top-level namespace:

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# 获取 root 元素
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]

# 以下将输出电影标题
xmldoc.elements.each("collection/movie"){ 
   |e| puts "Movie Title : " + e.attributes["title"] 
}

# 以下将输出所有电影类型
xmldoc.elements.each("collection/movie/type") {
   |e| puts "Movie Type : " + e.text 
}

# 以下将输出所有电影描述
xmldoc.elements.each("collection/movie/description") {
   |e| puts "Movie Description : " + e.text 
}

The output of the above examples is:

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie Description : Talk about a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash the Stampede!
Movie Description : Viewable boredom
SAX-like Parsing:

SAX parser

Working with the same data file: movies .xml, it is not recommended that SAX be resolved to a small file, here is a simple example:

#!/usr/bin/ruby -w

require 'rexml/document'
require 'rexml/streamlistener'
include REXML


class MyListener
  include REXML::StreamListener
  def tag_start(*args)
    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
  end

  def text(data)
    return if data =~ /^\w*$/     # whitespace only
    abbrev = data[0..40] + (data.length > 40 ? "..." : "")
    puts "  text   :   #{abbrev.inspect}"
  end
end

list = MyListener.new
xmlfile = File.new("movies.xml")
Document.parse_stream(xmlfile, list)

The above output is:

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
  text   :   "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
  text   :   "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
  text   :   "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Viewable boredom"

XPath and Ruby

We can use XPath to view XML, a language that looks for information in XML documentation (see: XPath tutorial).

XPath is the XML path language, which is the language used to determine the location of a portion of the XML (subset of standard common markup language) documents. XPath is based on XML's tree structure and provides the ability to find nodes in the data tree.

Ruby supports XPath through REXML's XPath class, which is tree-based analysis (document object model).

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# 第一个电影的信息
movie = XPath.first(xmldoc, "//movie")
p movie

# 打印所有电影类型
XPath.each(xmldoc, "//type") { |e| puts e.text }

# 获取所有电影格式的类型,返回数组
names = XPath.match(xmldoc, "//format").map {|x| x.text }
p names

The output of the above examples is:

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT and Ruby

Ruby has two XSLT parsers, which are briefly described below:

Ruby-Sablotron

This parser was written and maintained by Justice Masayoshi Takahash. This is primarily written for the Linux operating system and requires the following libraries:

  • Sablot
  • Iconv
  • Expat

XSLT4R

XSLT4R was written by Michael Neumann. XSLT4R is used for simple command-line interactions and can be used by third-party applications to transform XML documents.

The XSLT4R requires XMLScan operations and contains the XSLT4R archive, which is a 100% Ruby module. These modules can be installed using the standard Ruby installation method, Ruby install.rb.

The XSLT4R syntax format is as follows:

ruby xslt.rb stylesheet.xsl document.xml [arguments]

If you want to use XSLT4R in your application, you can introduce XSLT and enter the parameters you need. Here's an example:

require "xslt"

stylesheet = File.readlines("stylesheet.xsl").to_s
xml_doc = File.readlines("document.xml").to_s
arguments = { 'image_dir' => '/....' }

sheet = XSLT::Stylesheet.new( stylesheet, arguments )

# output to StdOut
sheet.apply( xml_doc )

# output to 'str'
str = ""
sheet.output = [ str ]
sheet.apply( xml_doc )

More information