The SAX interface

SAX is an event-based standard interface for XML parsers. The Qt interface follows the design of the SAX2 Java implementation. Its naming scheme was adapted to fit the Qt naming conventions. Details on SAX2 can be found at http://www.saxproject.org.

Support for SAX2 filters and the reader factory are under development. The Qt implementation does not include the SAX1 compatibility classes present in the Java interface.

Introduction to SAX2

The SAX2 interface is an event-driven mechanism to provide the user with document information. An "event" in this context means something reported by the parser, for example, it has encountered a start tag, or an end tag, etc.

To make it less abstract consider the following example:


  <quote>A quotation.</quote>

Whilst reading (a SAX2 parser is usually referred to as "reader") the above document three events would be triggered:

  1. A start tag occurs (<quote>).
  2. Character data (i.e. text) is found, "A quotation.".
  3. An end tag is parsed (</quote>).

Each time such an event occurs the parser reports it; you can set up event handlers to respond to these events.

Whilst this is a fast and simple approach to read XML documents, manipulation is difficult because data is not stored, simply handled and discarded serially. The DOM interface reads in and stores the whole document in a tree structure; this takes more memory, but makes it easier to manipulate the document's structure.

The Qt XML module provides an abstract class, QXmlReader, that defines the interface for potential SAX2 readers. Qt includes a reader implementation, QXmlSimpleReader, that is easy to adapt through subclassing.

The reader reports parsing events through special handler classes:

Handler classDescription
QXmlContentHandlerReports events related to the content of a document (e.g. the start tag or characters).
QXmlDTDHandlerReports events related to the DTD (e.g. notation declarations).
QXmlErrorHandlerReports errors or warnings that occurred during parsing.
QXmlEntityResolverReports external entities during parsing and allows users to resolve external entities themselves instead of leaving it to the reader.
QXmlDeclHandlerReports further DTD related events (e.g. attribute declarations).
QXmlLexicalHandlerReports events related to the lexical structure of the document (the beginning of the DTD, comments etc.).

These classes are abstract classes describing the interface. The QXmlDefaultHandler class provides a "do nothing" default implementation for all of them. Therefore users only need to overload the QXmlDefaultHandler functions they are interested in.

To read input XML data a special class QXmlInputSource is used.

Apart from those already mentioned, the following SAX2 support classes provide additional useful functionality:

ClassDescription
QXmlAttributesUsed to pass attributes in a start element event.
QXmlLocatorUsed to obtain the actual parsing position of an event.
QXmlNamespaceSupportUsed to implement namespace support for a reader. Note that namespaces do not change the parsing behavior. They are only reported through the handler.

The SAX Bookmarks example illustrates how to subclass QXmlDefaultHandler to read an XML bookmark file (XBEL) and how to generate XML by hand.

SAX2 Features

The behavior of an XML reader depends on its support for certain optional features. For example, a reader may have the feature "report attributes used for namespace declarations and prefixes along with the local name of a tag". Like every other feature this has a unique name represented by a URI: it is called http://xml.org/sax/features/namespace-prefixes.

The Qt SAX2 implementation can report whether the reader has particular functionality using the QXmlReader::hasFeature() function. Available features can be tested with QXmlReader::feature(), and switched on or off using QXmlReader::setFeature().

Consider the example


  <document xmlns:book = 'http://example.com/fnord/book/'
            xmlns      = 'http://example.com/fnord/' >

A reader that does not support the http://xml.org/sax/features/namespace-prefixes feature would report the element name document but not its attributes xmlns:book and xmlns with their values. A reader with the feature http://xml.org/sax/features/namespace-prefixes reports the namespace attributes if the feature is switched on.

Other features include http://xml.org/sax/features/namespace (namespace processing, implies http://xml.org/sax/features/namespace-prefixes) and http://xml.org/sax/features/validation (the ability to report validation errors).

Whilst SAX2 leaves it to the user to define and implement whatever features are required, support for http://xml.org/sax/features/namespace (and thus http://xml.org/sax/features/namespace-prefixes) is mandantory. The QXmlSimpleReader implementation of QXmlReader, supports them, and can do namespace processing.

QXmlSimpleReader is not validating, so it does not support http://xml.org/sax/features/validation.

Namespace Support via Features

As we have seen in the previous section, we can configure the behavior of the reader when it comes to namespace processing. This is done by setting and unsetting the http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes features.

They influence the reporting behavior in the following way:

  1. Namespace prefixes and local parts of elements and attributes can be reported.
  2. The qualified names of elements and attributes are reported.
  3. QXmlContentHandler::startPrefixMapping() and QXmlContentHandler::endPrefixMapping() are called by the reader.
  4. Attributes that declare namespaces (i.e. the attribute xmlns and attributes starting with xmlns:) are reported.

Consider the following element:


  <author xmlns:fnord = 'http://example.com/fnord/'
               title="Ms"
               fnord:title="Goddess"
               name="Eris Kallisti"/>

With http://xml.org/sax/features/namespace-prefixes set to true the reader will report four attributes; but with the namespace-prefixes feature set to false only three, with the xmlns:fnord attribute defining a namespace being "invisible" to the reader.

The http://xml.org/sax/features/namespaces feature is responsible for reporting local names, namespace prefixes and URIs. With http://xml.org/sax/features/namespaces set to true the parser will report title as the local name of the fnord:title attribute, fnord being the namespace prefix and http://example.com/fnord/ as the namespace URI. When http://xml.org/sax/features/namespaces is false none of them are reported.

In the current implementation the Qt XML classes follow the definition that the prefix xmlns itself isn't associated with any namespace at all (see http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using). Therefore even with http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes both set to true the reader won't return either a local name, a namespace prefix or a namespace URI for xmlns:fnord.

This might be changed in the future following the W3C suggestion http://www.w3.org/2000/xmlns/ to associate xmlns with the namespace http://www.w3.org/2000/xmlns.

As the SAX2 standard suggests, QXmlSimpleReader defaults to having http://xml.org/sax/features/namespaces set to true and http://xml.org/sax/features/namespace-prefixes set to false. When changing this behavior using QXmlSimpleReader::setFeature() note that the combination of both features set to false is illegal.

Summary

QXmlSimpleReader implements the following behavior:

(namespaces, namespace-prefixes)Namespace prefix and local partQualified namesPrefix mappingxmlns attributes
(true, false)YesYes*YesNo
(true, true)YesYesYesYes
(false, true)No*YesNo*Yes
(false, false)Illegal

The behavior of the entries marked with an asterisk (*) is not specified by SAX.

Properties

Properties are a more general concept. They have a unique name, represented as an URI, but their value is void*. Thus nearly anything can be used as a property value. This concept involves some danger, though: there is no means of ensuring type-safety; the user must take care that they pass the right type. Properties are useful if a reader supports special handler classes.

The URIs used for features and properties often look like URLs, e.g. http://xml.org/sax/features/namespace. This does not mean that the data required is at this address. It is simply a way of defining unique names.

Anyone can define and use new SAX2 properties for their readers. Property support is not mandatory.

To set or query properties the following functions are provided: QXmlReader::setProperty(), QXmlReader::property() and QXmlReader::hasProperty().