XML encoding


May 27, 2021 19:00 XML


Table of contents


XML encoding

When the XML processor reads the XML document, it encodes files that depend on the type of encoding, so we need to specify the type of encoding that the XML declaration is.

XML documents can contain non-ASCII characters, such as Norwegian, or French.

To avoid errors, you need to specify XML encoding, or save XML files as Unicode.


XML encoding error

If you load an XML document, you can get two different errors that indicate coding problems:

Invalid characters were found in the text content.

If your XML contains non-ASCII characters and the file is saved as a single-byte ANSI (or ASCII) without a specified encoding, you get an error.

XML file for single-byte encoded properties.

XML files with the same single byte that do not encode properties.

Switch the current encoding to a specified encoding that is not supported

If your XML file is saved as a double-byte Unicode (or UTF-16) with the specified single-byte encoding (WINDOWS-1252, ISO-8859-1, UTF-8), you get an error.

If your XML file is saved as a single-byte ANSI (or ASCII) with the specified double-byte encoding (UTF-16), you will also get an error.

Double bytes do not encode an XML file.

The same double byte has a single-byte encoded XML file.


Windows Note book

By default, Windows Notepad saves files as single-byte ANSI (ASCII).

If you select Save as..., you can specify ANSI, UTF-8, Unicode (UTF-16), or Unicode Big.

Save the following XML as ANSI, UTF-8, and Unicode (note that the document does not contain any encoding properties).

<?xml version="1.0"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Norwegian: æøå. French: êèé</message>
</note>

Try dragging the file to your browser and seeing the results. Different browsers display different results.

Experiences with different codes:

<?xml version="1.0" encoding="us-ascii"?>
<?xml version="1.0" encoding="windows-1252"?>
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-16"?>

Try:

Save with correct coding

Save with error code



Conclusion

  • Coding properties are always used
  • Use an editor that supports coding
  • Make sure you know what encoding the editor uses
  • Use the same encoding in your encoding properties