Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Describe in detail the common ways java parses mxl documents (including examples)


May 10, 2021 Java



XML (eXtensible Markup Language) means extensible markup language, which is selected by most technicians as the carrier of data transmission, and becomes a common data exchange format, xml platform irrelevant, language-independent, system-independent, which brings great convenience to data integration and interaction. I n different languages, xml is parsed in the same way, except that the syntax of the implementation is different. It is well known that there are more and more ways to parse XML, but there are four mainstream methods: DOM, SAX, JDOM, and DOM4J.


The jar package download address for these four methods:

(1) DOM: It's all in the Java JDK now, in the xml-apis .jar package

②SAX http://sourceforge.net/projects/sax/

③JDOM: http://jdom.org/downloads/index.html

④DOM4J: http://sourceforge.net/projects/dom4j/


Describe in detail the common ways java parses mxl documents (including examples)


Here's an example of these four methods:


xml file:

<?xml version="1.0" encoding="GB2312"?>
<RESULT>
<VALUE>
   <NO>A1234</NO>
   <ADDR>四川省XX县XX镇XX路X段XX号</ADDR>
</VALUE>
<VALUE>
   <NO>B1234</NO>
   <ADDR>四川省XX市XX乡XX村XX组</ADDR>
</VALUE>
</RESULT>


1. Use DOM (JAXP Crimson Parser)

DOM is the official W3C standard for representing XML documents in a way independent of platform and language. D OM is a collection of nodes or pieces of information organized in a hierarchy. T his hierarchy allows developers to look for specific information in the tree. A nalyzing the structure usually requires loading the entire document and constructing the hierarchy before any work can be done. B ecause it is information-based, the DOM is considered tree-based or object-based. D OM and broad tree-based processing have several advantages. F irst, because the tree is persistent in memory, you can modify it so that your application can make changes to the data and structure. I t can also navigate up and down the tree at any time, rather than being a one-time process like SAX. DOM is also much simpler to use.


How to implement it:

import java.io.*;
import java.util.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class MyXMLReader{
 public static void main(String arge[]){
  long lasting =System.currentTimeMillis();
  try{
   File f=new File("data_10k.xml");
   DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
   DocumentBuilder builder=factory.newDocumentBuilder();
   Document doc = builder.parse(f);
   NodeList nl = doc.getElementsByTagName("VALUE");
   for (int i=0;i<nl.getLength();i++){
    System.out.print("车牌号码:" + doc.getElementsByTagName("NO").item(i).getFirstChild().getNodeValue());
    System.out.println("车主地址:" + doc.getElementsByTagName("ADDR").item(i).getFirstChild().getNodeValue());
   }
  }catch(Exception e){
   e.printStackTrace();
}
Advantages
(1) Allow the application to make changes to the data and structure.
(2) Access is bidirectional, you can at any time in the tree up and down navigation, access and operation of any part of the data.


Weaknesses
Often you need to load the entire XML document to construct a hierarchy that consumes a lot of resources.


2. Use SAX

The benefits of SAX processing are very similar to those of streaming media. A nalysis can begin immediately, rather than waiting for all data to be processed. A lso, because the application only checks the data as it reads it, it does not need to store the data in memory. T his is a huge advantage for large documents. I n fact, the application does not even have to parse the entire document; In general, SAX is also much faster than its replacement DOM.

Do you choose DOM or SAX? C hoosing DOM or SAX resolution models is a very important design decision for developers who need to write their own code to handle XML documents. DOM accesses XML documents by building a tree structure, while SAX uses an event model.

The DOM parser transforms an XML document into a tree containing its contents and can traverse the tree. T he advantage of using the DOM parsing model is that programming is easy, and developers only need to invoke the building instructions and then use the national APIs to access the required tree nodes to complete the task. E lements in the tree can be easily added and modified. H owever, due to the need to work with the entire XML document when using the DOM parser, performance and memory requirements are high, especially when encountering large XML files. Because of its traversal capabilities, DOM parsers are often used in services where XML documents need to change frequently.

The SAX parser uses an event-based model that triggers a series of events when parsing an XML document, and when a given tag is found, it activates a callback method that tells the label developed by the method that it has been found. S AX's memory requirements are often low because it lets developers decide for themselves what tags to work with, especially when developers only need to work with some of the data contained in the document. However, coding with SAX parsers can be difficult and it is difficult to access many different data in the same document at the same time.

How to implement it:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
public class MyXMLReader extends DefaultHandler {
 java.util.Stack tags = new java.util.Stack();
 public MyXMLReader() {
  super();
}
 public static void main(String args[]) {
  long lasting = System.currentTimeMillis();
  try {
   SAXParserFactory sf = SAXParserFactory.newInstance();
   SAXParser sp = sf.newSAXParser();
   MyXMLReader reader = new MyXMLReader();
   sp.parse(new InputSource("data_10k.xml"), reader);
  } catch (Exception e) {
   e.printStackTrace();
  }
 
  System.out.println("运行时间:" + (System.currentTimeMillis() - lasting) + "毫秒");}
  public void characters(char ch[], int start, int length) throws SAXException {
  String tag = (String) tags.peek();
  if (tag.equals("NO")) {
   System.out.print("车牌号码:" + new String(ch, start, length));
}
if (tag.equals("ADDR")) {
  System.out.println("地址:" + new String(ch, start, length));
}
}
 
  public void startElement(String uri,String localName,String qName,Attributes attrs) {
  tags.push(qName);}
}
Advantages
(1) Without waiting for all the data to be processed, the analysis can begin immediately.
(2) The data is checked only when it is read and does not need to be saved in memory.
(3) You can stop parsing when a condition is met without having to parse the entire document.
(4) High efficiency and performance, can resolve documents larger than the system memory.

Weaknesses
(1) The more complex the document, the more complex the program becomes, requiring the application to be responsible for the TAG's own processing logic (e.g., maintaining parent/child relationships, etc.).
(2) One-way navigation, can not locate the document hierarchy, it is difficult to access different parts of the same document data at the same time, does not support XPath.

Describe in detail the common ways java parses mxl documents (including examples)

3, the use of JDOM

The purpose of JDOM is to be a Java-specific document model that simplifies interaction with XML and implements faster than using DOM. J DOM has been greatly promoted and promoted as the first Java-specific model. C onsideration is being made to eventually use it as a Java Standard Extension through Java Specification Request JSR-102. JDOM development has been in the early 2000s.

JDOM and DOM are mainly different in two ways. F irst, JDOM uses only specific classes and not interfaces. T his simplifies the API in some ways, but also limits flexibility. Second, the API uses the Colleges classes extensively, simplifying the use of Java developers who are already familiar with them.

The JDOM document states that the goal is to "use 20% (or less) of effort to solve 80% (or more) of Java/XML problems" (assuming 20% based on the learning curve). J DOM is of course useful for most Java/XML applications, and most developers find APIs much easier to understand than DOMs. J DOM also includes a fairly extensive review of program behavior to prevent users from doing anything that doesn't make sense in XML. H owever, it still requires that you fully understand XML in order to do something beyond the basics (or even understand errors in some cases). This may be more meaningful work than learning dom or JDOM interfaces.

JDOM itself does not contain parsers. I t typically uses the SAX2 parser to parse and validate the input XML document (although it can also use the previously constructed DOM as input). I t contains some converters to output the JDOM represent as an SAX2 event stream, a DOM model, or an XML text document. JDOM is open source that is published under the Apache license variant.

How to implement it:
import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.*;
public class MyXMLReader {
 public static void main(String arge[]) {
  long lasting = System.currentTimeMillis();
  try {
   SAXBuilder builder = new SAXBuilder();
   Document doc = builder.build(new File("data_10k.xml"));
   Element foo = doc.getRootElement();
   List allChildren = foo.getChildren();
   for(int i=0;i<allChildren.size();i++) {
    System.out.print("车牌号码:" + ((Element)allChildren.get(i)).getChild("NO").getText());
    System.out.println("车主地址:" + ((Element)allChildren.get(i)).getChild("ADDR").getText());
   }
  } catch (Exception e) {
   e.printStackTrace();
}
 
}
Advantages
(1) The DOM API is simplified by using specific classes instead of interfaces.
(2) A large number of Java collection classes are used to facilitate Java developers.

Weaknesses
(1) There is no better flexibility.
(2) Poor performance.

4, the use of DOM4J

Although DOM4J represents a completely independent development result, it was initially an intelligent branch of JDOM. I t incorporates many features that go beyond the basic XML documentation, including integrated XPath support, XML Schema support, and event-based processing for large or streamed documents. I t also provides the option to build documentation, which has parallel access through the DOM4J API and the standard DOM interface. It has been under development since the second half of 2000.

To support all of these features, DOM4J uses interfaces and abstract basic class methods. D OM4J makes extensive use of theCollections class in the API, but in many cases it also provides alternative methods to allow for better performance or more direct coding methods. The immediate benefit is that while the DOM4J pays the price for a more complex API, it provides much more flexibility than JDOM.

When adding flexibility, XPath integration, and goals for large document processing, DOM4J's goals are the same as JDOM's: ease of use and intuitive operation for Java developers. I t is also committed to being a more complete solution than JDOM, achieving the goal of essentially addressing all Java/XML issues. When this goal is accomplished, it emphasizes preventing incorrect application behavior less than JDOM.

DOM4J is a very, very good Java XML API with excellent performance, power, and extreme ease of use, as well as an open source software. Now you can see that more and more Java software is using DOM4J to read and write XML, especially noting that even Sun's JAXM is using DOM4J.

How to implement it:
import java.io.*;
import java.util.*;
import org.dom4j.*;
import org.dom4j.io.*;
 
public class MyXMLReader {
 
 public static void main(String arge[]) {
  long lasting = System.currentTimeMillis();
  try {
   File f = new File("data_10k.xml");
   SAXReader reader = new SAXReader();
   Document doc = reader.read(f);
   Element root = doc.getRootElement();
   Element foo;
   for (Iterator i = root.elementIterator("VALUE"); i.hasNext() {
    foo = (Element) i.next();
    System.out.print("车牌号码:" + foo.elementText("NO"));
    System.out.println("车主地址:" + foo.elementText("ADDR"));
   }
  } catch (Exception e) {
   e.printStackTrace();
}
)
Advantages
(1) A large number of Java collection classes are used to facilitate Java developers, while providing some alternative ways to improve performance.
(2) Support for XPath.
(3) Have very good performance.

Weaknesses
(1) The interface is heavily used, and the API is more complex.

Describe in detail the common ways java parses mxl documents (including examples)

4 methods of comprehensive comparison

1. DOM4J performs best, even Sun's JAXM is using DOM4J. D OM4J is currently heavily used in many open source projects, such as the famous Hibernate, which also uses DOM4J to read XML profiles. If portability is not taken into account, dom4J is used.
2. JDOM and DOM perform poorly in performance testing, and memory overflows when testing 10M documents, but are portable. I t's also worth considering USMs and JDOM in small documentation situations. Although JDOM developers have indicated that they expect to focus on performance issues before the official release, from a performance perspective, it's really not recommendable. I n addition, DOM is still a very good choice. D OM implementations are widely used in a variety of programming languages. It is also the basis for many other XML-related standards, as it is officially recommended by W3C (as compared to non-standard Java-based Java models), so it may also be required in some types of projects (such as using DOM in JavaScript).
3. SAX performs well, depending on its specific resolution method- event-driven. An SAX detects an upcoming XML stream, but does not load it into memory (of course, when the XML stream is read in, some documents are temporarily hidden in memory).
Recommendation: DOM4J is recommended if the XML documentation is large and portability is not considered, JDOM is recommended if the XML document is small, and SAX is considered if the data needs to be processed in a timely manner and data need not be saved. But in any case, or that sentence: the right for their own is the best, if time permits, it is recommended that you say all four methods to try again and then choose a suitable for their own can be.

Read the XML profile

First we need to get the factory instance of the xml file through Document BuilderFactory.
DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
        dbf.setIgnoringElementContentWhitespace(true);

Create a document object
DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(xmlPath); // 使用dom解析xml文件

Finally, traverse the list for extracts
NodeList sonlist = doc.getElementsByTagName("son"); 
            for (int i = 0; i < sonlist.getLength(); i++) // 循环处理对象
            {
                Element son = (Element)sonlist.item(i);;
                
                for (Node node = son.getFirstChild(); node != null; node = node.getNextSibling()){  
                    if (node.getNodeType() == Node.ELEMENT_NODE){  
                        String name = node.getNodeName();  
                        String value = node.getFirstChild().getNodeValue();  
                        System.out.println(name+" : "+value);
                    }  
                }  
            }

Full example:
public static void getFamilyMemebers(){
        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
        dbf.setIgnoringElementContentWhitespace(true);
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(xmlPath); // 使用dom解析xml文件

            NodeList sonlist = doc.getElementsByTagName("son"); 
            for (int i = 0; i < sonlist.getLength(); i++) // 循环处理对象
            {
                Element son = (Element)sonlist.item(i);;
                
                for (Node node = son.getFirstChild(); node != null; node = node.getNextSibling()){  
                    if (node.getNodeType() == Node.ELEMENT_NODE){  
                        String name = node.getNodeName();  
                        String value = node.getFirstChild().getNodeValue();  
                        System.out.println(name+" : "+value);
                    }  
                }  
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Add nodes to the XML file

In much the same step, first get the root node, create a new node, add element information to it, and finally add the new node to the root node
Element root = xmldoc.getDocumentElement();
            
            //删除指定节点
            
            Element son =xmldoc.createElement("son");
            son.setAttribute("id", "004");
            
            Element name = xmldoc.createElement("name");
            name.setTextContent("小儿子");
            son.appendChild(name);

            Element age = xmldoc.createElement("name");
            age.setTextContent("0");
            son.appendChild(age);
            
            root.appendChild(son);

Finally, don't forget to save the new file and overwrite the source file
TransformerFactory factory = TransformerFactory.newInstance();
            Transformer former = factory.newTransformer();
            former.transform(new DOMSource(xmldoc), new StreamResult(new File(xmlPath)));

Full example:
public static void createSon() {
        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
        dbf.setIgnoringElementContentWhitespace(false);
        
        try{
        
            DocumentBuilder db=dbf.newDocumentBuilder();
            Document xmldoc=db.parse(xmlPath);
        
            Element root = xmldoc.getDocumentElement();
            
            //删除指定节点
            
            Element son =xmldoc.createElement("son");
            son.setAttribute("id", "004");
            
            Element name = xmldoc.createElement("name");
            name.setTextContent("小儿子");
            son.appendChild(name);

            Element age = xmldoc.createElement("name");
            age.setTextContent("0");
            son.appendChild(age);
            
            root.appendChild(son);
            //保存
            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer former = factory.newTransformer();
            former.transform(new DOMSource(xmldoc), new StreamResult(new File(xmlPath)));
            
        }catch(Exception e){
            e.printStackTrace();
        }
    }


Modify the node information in XML

The target node is obtained by XPath
public static Node selectSingleNode(String express, Element source) {
        Node result=null;
        XPathFactory xpathFactory=XPathFactory.newInstance();
        XPath xpath=xpathFactory.newXPath();
        try {
            result=(Node) xpath.evaluate(express, source, XPathConstants.NODE);
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        }
        
        return result;
    }

Get the target node, modify it, and when you're done, save the file
Element root = xmldoc.getDocumentElement();
            
            Element per =(Element) selectSingleNode("/father/son[@id='001']", root);
            per.getElementsByTagName("age").item(0).setTextContent("27");
            
            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer former = factory.newTransformer();
            former.transform(new DOMSource(xmldoc), new StreamResult(new File(xmlPath)));

Full example:
public static void modifySon(){
        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
        dbf.setIgnoringElementContentWhitespace(true);
        try{
        
            DocumentBuilder db=dbf.newDocumentBuilder();
            Document xmldoc=db.parse(xmlPath);
        
            Element root = xmldoc.getDocumentElement();
            
            Element per =(Element) selectSingleNode("/father/son[@id='001']", root);
            per.getElementsByTagName("age").item(0).setTextContent("27");
            
            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer former = factory.newTransformer();
            former.transform(new DOMSource(xmldoc), new StreamResult(new File(xmlPath)));
        }catch(Exception e){
            e.printStackTrace();
        }
    }

Remove nodes from XML

Get the target node via XPath, delete it, and save it
Element root = xmldoc.getDocumentElement();
            
            Element son =(Element) selectSingleNode("/father/son[@id='002']", root);
            root.removeChild(son);

            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer former = factory.newTransformer();
            former.transform(new DOMSource(xmldoc), new StreamResult(new File(xmlPath)));

Full example:
public static void discardSon(){
            
        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
        dbf.setIgnoringElementContentWhitespace(true);
        
        try{
        
            DocumentBuilder db=dbf.newDocumentBuilder();
            Document xmldoc=db.parse(xmlPath);
        
            Element root = xmldoc.getDocumentElement();
            
            Element son =(Element) selectSingleNode("/father/son[@id='002']", root);
            root.removeChild(son);

            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer former = factory.newTransformer();
            former.transform(new DOMSource(xmldoc), new StreamResult(new File(xmlPath)));
            
        }catch(Exception e){
            e.printStackTrace();
        }
    }