7 APPENDIX B: Whitespace processing
Whitespace processing is performed once at document load time:
- A Mac or DOS line break ('\r' or "\r\n") is normalized to a
single newline character ('\n').
- If an element has the xml:space attribute set to
preserve, all the whitespace characters it contains are
preserved.
Note that unlike other attributes, the xml:space and
xml:lang attributes of an element may be ``inherited'' from an
ancestor element.
- Otherwise, the default behavior of XXE is to replace consecutive
whitespace characters (' ', '\t', '\n', '\r') by a single space character ('
').
- If a document has a DTD and if a mix element (an element described
in the DTD as being able to contain text interspersed with child elements) is
contained in a structure element (an element described in the DTD as
only containing child elements), the space characters at the beginning and at
the end of the mix, if any, are trimmed.
This behavior applies to:
- elements with a mixed content model,
- elements with ANY as their content model,
- elements which are referenced but not declared in the DTD.
- If a document has no DTD and if the ``guess what spaces are not useful''
option has been checked in the Options dialog box (see
Options->Options), whitespace
characters are trimmed from elements containing child elements separated by
whitespace.
The xml:space attribute is very important for XXE because it is the
only way to skip the compression and trimming of whitespace characters. Do
not forget to add support for this standard XML attribute to the DTDs you'll
define.