XML-Data reduced

DRAFT
Last update:
4 July 1998
Version 0.23

This version:
Not posted yet.
Editors:
Charles Frankston (Microsoft) cfranks@microsoft.com
Henry S. Thompson (University of Edinburgh) ht@cogsci.ed.ac.uk
Ashok Malhotra (IBM) petsa@us.ibm.com

Status of this document

This note is a refinement of the January 1998 XML-Data submission http://www.w3.org/TR/1998/NOTE-XML-data-0105/.

Abstract

The XML-Data submission contained many new ideas that an XML schema language could support. This document refines and subsets those ideas down to a more manageable size in order to allow faster progress toward adopting a new schema language for XML. Some of the inconsistencies in the XML-Data submission are cleaned up, and some changes have been made based on comments received since the XML-Data submission was posted.

Table of Contents


Requirements

We will call a formal expression of the structure of XML documents and of constraints on text contained therein a schema.  There is broad recognition that XML's existing "Document Type Definition" (DTD) language is an inadequate and/or inappropriate language for expressing what many of the current and anticipated applications of XML need to include in schemas. XML-Data provides an alternative approach using XML instance syntax  language to address these needs.  The following is a table of the principal requirements for a schema language, matched to the feature(s) of this proposal that satisfies the requirement:

Requirement

Feature

Special tools should not be required for maintaining schema documents.
  • XML-Data schemas use XML instance syntax.
Users of XML or HTML should not have to learn a new syntax to express schema information.
  • XML-Data schemas use XML instance syntax.
Schemas should be extensible. I.e. it should be possible for an application to specialize a schema with application-specific information, such as additional constraints
  • XML-Data schemas use XML instance syntax.
  • XML-Data schema definitions are "open" by default, allowing annotation with additional elements and attributes.
The schema language should be simple enough to encourage implementation in all XML processors.
  • XML-Data schemas use only XML instance syntax, obviating the need to implement a new special syntax parser.
  • The XML-Data (reduced) language is reasonably simple and concise.
  • XML-Data (reduced) can be processed by a simple XML processor, without requiring additional tools, such as an RDF processor.
Web-based applications, such as e-commerce, require additional data-validation beyond what can be expressed in XML DTDs today.
  • XML-Data defines a set of primitive data-types roughly comparable to those found in relational databases and programming languages.
  • XML-Data can express range constraints (min/max)
Web based applications require standardized encodings in order to facilitate document interchange.
  • XML-Data defines standard encoding formats for the primitive data types it defines.
Individual documents in web-based applications are often composed of parts defined in several sources. The schema language must support this.
  • XML-Data provides integral namespace support.
Re-use of content model definitions should be easier (than it is today with Parameter Entities).
  • XML-Data supports class-based content model inheritence.
  • XML-Data supports attribute definitions that can be shared by multiple elements.
The schema language must be upward compatible with XML 1.0.
  • XML-Data is a superset of XML 1.0 DTD expressiveness.

 

The Schema language

Schema document structure

An XML-Data schema is a well-formed XML document.  It must be a valid instance of the XML Schema DTD.

Prolog

The prolog of an XML-Data schema must define all namespaces used within the schema. The namespace for the XML-Data schema language itself is defined by a URN, using a namespace declaration::

<?xml:namespace ns="urn:w3-org:xmlschema" prefix="s"?>

Note that because the schema language described in this document differs in detail from the language used in the January 1998 submission, the URN used here is different from the one identified in the submission, both in value and form.

The rest of the examples in this document will use the prefix "s" to refer to the XML-Data schema namespace. This is purely for convenience and brevity; any prefix may be used in actual schemas as long as the ns part of the namespace declaration uses the URN given above.

The prolog in an XML-Data schema document should also contain namespace declarations for any other schemas that the schema being defined refers to. A particularly useful namespace is that which defines the built-in datatypes for XML-Data. (See Datatypes). This namespace is also referred to by a URN:

<?xml:namespace ns="urn:w3-org:xmldatatypes" prefix="dt"?>

The prolog in an XML-Data schema may, but need not, identify the XML Schema DTD:

<!DOCTYPE s:Schema SYSTEM "http://www.w3.org/XML/???">

Note on terminology and notation

Throughout this proposal, the word 'element' on its own refers to a particular bit of an XML document, either bounded by a matched pair of start- and end-tags, or consisting entirely of an empty-element tag.  In contrast, the phrase 'element type' refers to the type of which all elements sharing a name are instances.  Element types are declared in schemas; elements occur in documents.  We use bold face for element type names: Schema.  This terminology is consistent with that of the XML 1.0 Recommendation.  We go beyond the Recommendation in also using 'attribute' and 'attribute type' in a similar way.  We use italics for attribute type names: href.

Document element

Following the prolog, the actual definitions in an XML-Data schema document are contained within a Schema document element:

<s:Schema name="myschema">
  <!-- place your top level declarations here. -->
</s:Schema>

The document element then contains any number of "top-level" declarations. See the Scope section for an for the distinction between top-level and local declaration scope.

Top-level declarations

The ElementType, AttributeType, Entity and Notation element types are used to declare the major components of the structure of documents. All top-level declarations must have an explicit name attribute, which uniquely identifies the component defined. It is an error for more than one top-level declaration of the same type to use the same value for name within a single schema.

Element and Attribute type definitions

Element types are declared with the ElementType element type. The example below declares an element type called mynameThis declaration constrains myname elements to be empty; see the Content Type section below for more options here:

<s:ElementType name="myname" content="empty"/>

Similarly, attribute types may be declared with the AttributeType element type.

The dt:type attribute on an AttributeType element specifies a type constraint on the allowed content of attributes of that type. All  types available in an XML 1.0 DTD for attribute declarations are available, as are an additional set of datatypes useful in databases and programming languages. See Datatypes below for a complete list of datatypes and their definition.  The default value for dt:type is "string".

The example below declares an attribute type called myattr. This declaration constrains myattr attributes to contain only a name token (in DTD terms: NMTOKEN):

<s:AttributeType name="myattr" dt:type="nmtoken"/>

Default Value

It is possible to specify a default value for an attribute when its type is defined or referenced, using a default attribute on either AttributeType or attribute:

<s:AttributeType name="Country" default="Oleanna"/>
<s:ElementType name="fullname">
  <s:attribute type="formality" default="informal"/>   
</s:ElementType>

The default provided on an attribute takes precedence over any provided on the AttributeType it refers to.

There is a required attribute on either AttributeType or attribute which gives further control over attribute values.  If required has the value "yes", applications can rely on always getting a value for this attribute.  If a default is also specified, that will always be the value, and documents containing other values are in error.  If no default is specified, each element whose type is declared to have the attribute must have a value for it:

<s:ElementType name="quote" content="string">
  <s:attribute type="language" required="yes"/>
</s:ElementType>
<s:ElementType name="myLink">
  <s:attribute type="xml:link" default="simple" required="yes"/>   
</s:ElementType>

This means that every quote element must have a Language attribute, and every myLink element must either lack the xml:link attribute altogether (in which case its value defaults to "simple") or must have an xml:link attribute with value "simple":

<myLink xml:link="simple" . . .>. . .</myLink>

See the Schema Validity section for discussion of the details of how default and required on attribute and AttributeType interact.

Content Type

The content intended for elements falls into three categories:  none, text only, sub-elements only and a mixture of text and sub-elements.  This choice is expressed as an attribute named content on ElementType, with values "empty", "textOnly", "eltOnly" and "mixed" respectively.  The default is "mixed".

Element Content

Element type declarations may constrain the content and/or attributes which appear in elements of the named type by referring to other element type or attribute type declarations.  The following example references the myname element type and the myattr attribute type:

<s:ElementType name="fullname">
  <s:element type="myname"/>
  <s:attribute type="myattr"/>   
</s:ElementType>

The following would then be a schema-valid fullname element:

<fullname myattr="anyname">
  <myname>Any PCDATA style string.</myname>
</fullname>

Order constraints

The ElementType declaration also supports an order attribute to specify the allowed pattern of the elements whose types are referenced in it. Possible values for order are "seq", "any", or "all" or "many".

seq
Sub-elements must appear in the same sequential order as the elements referenced within the ElementType (the default for "eltOnly" content);
one
Like the "or" content model in an XML 1.0 DTD.  One sub-element of a type referenced within the ElementType must appear;
all
Corresponds to the "and" content model of SGML: an element of each type referenced within the ElementType must appear as a sub-element, but the sub-elements may appear in any order.
many
Corresponds to a starred or-group in an XML 1.0 DTD.  Any number of sub-elements drawn from the types referenced within the ElementType may appear in any order (the default for "mixed" content).

In XML-Data, as in XML 1.0, the order of appearance of attributes in an element is not constrained, and a given attribute may appear no more than once in an element.

Further constraints on element content can be expressed by grouping element references inside a group element. The group element type has an order attribute which takes the same values as the order attribute of the ElementType element type. The default for the order of group is the same as for the order of  ElementTypes, based on the value of content of the enclosing ElementType.   An example of the use of group:

<s:ElementType name="q" order="seq">
  <s:element type="a"/>
  <s:element type="b"/>
  <s:group order="one">
    <s:element type="d"/>
    <s:element type="e"/>
    <s:element type="f"/>
  </s:group>
</s:ElementType>

Given the above declaration, the only schema-valid orderings of the elements a, b, d, e and f when occuring within a q element are as follows:

<q> <a/> <b/> <d/> </q>
<q> <a/> <b/> <e/> </q>
<q> <a/> <b/> <f/> </q>

Sub-elements allowed in content via a group in a declaration are ordered as a group under the control of the order attribute of the enclosing ElementType (or group).  So for example given

<s:ElementType name="q" order="all">
  <s:element type="a"/>
  <s:group order="seq">
    <s:element type="d"/>
    <s:element type="e"/>
  </s:group>
</s:ElementType>

the only schema-valid orderings of the elements a d and f when occuring within a q element are as follows:

<q> <a/> <d/> <e/> </q>
<q> <d/> <e/> <a/> </q>

Cardinality constraints

So far, we have described content contstrains as if each element (or group of elements) which is allowed to occur will occur exactly once.  Specific control over this is provided by minOccurs and maxOccurs attributes, which may appear on element and group elements.  The value of minOccurs, as the name implies states the minimum number of times an element may appear. Obviously a value of "0" makes an element optional. minOccurs has a default value of "1". maxOccurs specifies the maximum number of times an element may appear. The default for maxOccurs is also "1", so an element with neither a minOccurs or a maxOccurs must appear once and only once in a content model. There is a special value for maxOccurs, which is "*", which means there is no upper limit on the number of times an element may appear.

minOccurs

maxOccurs

How many times should this element or group appear?

Not specified or 1.

Not specified or 1.

Precisely once. ("Required")

0

Not specified or 1.

Not at all or once. ("Optional")

N > 1

M > N

At least N times, but no more than M times.

N > 1

M < N

Must not appear at all. Processor might issue warning.

0

"*"

Any number of times. ("ZEROORMORE")

1 > 0

"*"

At least once. ("ONEORMORE")

N > 0

"*"

At least N times.

Any value

0

Must not appear at all.

Open content model

By default, elements not declared with "empty"or "textOnly" content may contain attributes and sub-elements not referred to in their declaration.   This is referred to as an "open" content model.  The alternative is a "closed" content model, in which only the attributes and sub-elements referred to in an element type's declaration may appear.  To require a closed model, use the model attribute on ElementType with the value "closed".  For example:

<s:ElementType name="fullname" model="closed">
  <s:element type="myname"/>
  <s:attribute type="myattr"/>   
</s:ElementType>

With this declaration, fullname elements must contain exactly one myname sub-element and may only contain myattr attributes.

The default value of model is provided by the value of model on the enclosing Schema element, which in turn defaults to "open".

Note that text content is only allowed when content is "textOnly" or "mixed", whether model is "open" or "closed".

WHAT do we do about attributes?  Either covered by model, or always closed, or always open, or with their own controlling attribute?  To make the extends invariant simple to state (if P extends Q, then all meta-valid instances of Q would also be meta-valid instances of P if we replaced "Q" with "P" in the instance), we couldn't use 'always closed', but that is unfortunately the one that seems natural to me!  I guess I'll go with 'covered by model' for now.

Note that content='eltOnly' with model='open' replaces AnyElement.

Mixed

So far we have seen how to constrain elements to no content, or text content, or sub-element content.  The value "mixed" for the content attribute allows text and sub-elements to be mixed.  When an ElementType declaration has "mixed" content the default for the order attribute is "many".  This reconstructs XML 1.0's restriction on mixed content (but does not enforce it--you can construct schemas which have no equivalent in XML 1.0 terms).  For example,

<s:ElementType name="paragraph" content="mixed">
 <s:element type="strong"/>
 <s:element type="emph"/>
 <s:element type="link"/>
</s:ElementType>

allows any combination of text and strong, emph and link elements within paragraphs.

Note that the order attribute only applies to the sub-elements, so that e.g.

<s:ElementType name="bookDescription" content="mixed" order="one">
 <s:element type="author"/>
 <s:element type="editor"/>
</s:ElementType>

allows bookDescriptions with exactly one author or one editor somewhere in its content, along with any amount of text.   Since the order attribute applies only to sub-elements, the SGML pernicious mixed content problem cannot arise, and the following are schema-valid per the above:

<bookDescription>This is the most recent book
by <author>LeGuin</author>, and I really like it.</bookDescription>

<bookDescription>
<author>Joyce</author> wrote this towards the end of his life.
</bookDescription>

There is no way to constrain an element to have either element content or text content, the source of the SGML mixed content problem.

Inheritance and sub-classing

An element type may be declared to  re-use the content model declarations of other element types through the use of the extends element type. This effectively replaces itself with the entire content model of the element type it names.   For example:

<s:ElementType name="polygon" content="eltOnly">
  <s:attribute type="n" required="yes"/>
  <s:attribute type="regularity"/>  
  <s:element type="diagonals"/>
</s:ElementType>

<s:ElementType name="regularPolygon" content="eltOnly">
  <s:attribute type="regularity" default="regular" required="yes"/>
  <s:element type="side"/>
  <s:extends type="polygon"/>
<s:ElementType/>

A legal instance of regularPolygon (in this case an empty equilateral triangle 3mm on a side) might be:

<regularPolygon n="3">
  <side><dimension unit='mm'>3</dimension></side>
  <diagonals/>
</regularPolygon>

Using extends also allows instances of the extending element type to occur anywhere the extended type is allowed.  In the above example this means that any content model that allows polygon will also now allow regularPolygon. Furthermore, attributes declared on the extended element type may also occur on the extending element type, so in the example n can (in fact must) now appear on regularPolygon.  For example, if in addition to the above example we have:

<s:ElementType name="picture">
  <s:element type="polygon" occurs="oneOrMore">
</s:ElementType>

then the following is schema-valid:

<picture>
 <polygon n="3" regularity="irregular">...</polygon>
 <regularPolygon n="3">...</regularPolygon>
</picture>

We restrict the use of extends to cases where the merger of the two content models involved is straightforward.  For each extends:

  1. Either the extended element type must have an "open" model or the extending element type must have no content at all, either explicit or inherited from other extends;
  2. If the extending element type has explicit content, the values of the order attribute must be consistent.   The following table shows all the allowed values (if the extended element type has order with value 'one', no extension is possible):
  3. Extended

    Extending

    seq seq
    all all; seq
    many seq; one; all; many
  4. The values of the content attribute must be consistent, as follows:
  5. Extended

    Extending

    empty empty
    textOnly textOnly; empty
    eltOnly eltOnly
    mixed mixed; textOnly; eltOnly
  6. Allowed attributes and datatype constraints (see Datatypes) are cumulative, that is, all apply.  Attributes of the same name are merged:  the only difference allowed is that an attribute in the extending declaration may provide and/or require a default where the extended declaration does not.  Multiple datatype constraints, whether for content or an attribute, must be intelligibly combinable, (see Datatypes).

Consistent with the above remark about the extending element type being allowed anywhere the extended one is, the guiding principle is that anything allowed by the extending declaration would also be allowed by the extended one if the tag was changed.   Thus if we rename regularPolygon to polygon in the first example above, we get a schema-valid polygon:

<polygon n="3">
  <side><dimension unit='mm'>3</dimension></side>
  <diagonals/>
</polygon>

It's OK as a polygon, because it has everything a polygon requires (n attribute, diagonals sub-element), and the side sub-element is OK because polygon has (by default) an "open" content model.

Note that other than allowing multiple extends, this is even simpler than before, with no notion of merging identical occurences of the same element.  Effectively, the extended content model is dropped in as a group in the relevant place in the extending model.

Entity and Notation Declarations

All entities are declared using Entity at the top level. The declaration for an internal entity simply contains its content:

<s:Entity name="copyright">Copyright Microsoft Corporation © 1998</s:Entity>

An external entity declaration contains systemID and publicID attributes declaring the location of the entity:

<s:Entity name="ISOlat1" systemID="isolat1.ent"
          publicID="ISO 8879-1986//ENTITIES Added Latin 1//EN"/>

A declaration without a notation attribute is for a parsed entity (see the XML 1.0 Recommendation).  If a notation attribute is present, the entity declared is unparsed:

<s:Entity name="LTG" notation="#gif"
        systemID="http://www.ltg.ed.ac.uk/~ht/logo-transp.gif"/>

Notation declarations are similar to external entity declarations, except that following XML 1.0 the systemID attribute is not required, and of course no notation attribute is allowed:

<Notation name="gif"
          publicID="-//Compuserv Information Service//NOTATION
                       Graphics Interchange Format//EN"/>

Scope

Within a single Schema there are two different kinds of declaration scopes. 'Top-level' scope refers to any element types, attribute types, entities, or notations declared immediately within the Schema document element. Element types or attribute types declared at top-level scope may be referred to in the content declaration of any element type in the same schema, or any element type in any other schema that references that schema via the use of namespaces. Here is an example schema:

The urn:schemas-microsoft-com:ppt schema:

<?xml:namespace ns="urn:w3-org:xmlschema" prefix="s"?>
<?xml:namespace ns="urn:w3-org:xmldatatypes" prefix="dt"?>

<s:Schema name="people">

  <s:ElementType name="name" content="textOnly"/>

  <s:AttributeType name="age" dt:type="int"/>

  <s:ElementType name="building">
    <s:element type="name"/>
    <s:attribute type="age"/>
  </s:ElementType>

  <s:ElementType name="person">
    <s:AttributeType name="gender">
      <s:datatype dt:type="enumeration" dt:values="M F"/>
    </s:AttributeType>
    <s:attribute type="gender"/>
    <s:element type="name"/>
  </s:ElementType>

</s:Schema>

In the above schema, name, age, person and building are declared with top-level scope. All elements within this schema, or other schemas referencing urn:schemas-microsoft-com:ppt may refer to any of these. However gender is declared within the scope of the person element type declaration, and may only be referenced (as here) within that declaration.   Any number of local element type and attribute type declarations may appear at the beginning of an ElementType element.  In case of any conflict, a local declaration takes precedence over a global declaration with the same name.

Referring to other Schemas by using namespaces

We have shown that the content model of an element can be constrained by referring to other element type declarations (either from an attribute, element, or extends element) within the same schema. We can also refer to declarations from other schemas simply by using namespaces to identify the other schema, and using a prefix part in our reference to the ElementType or AttributeType.  For example, we can refer to the person element declared in the previous example:

<?xml:namespace prefix='pp' ns='urn:schemas-micrsoft-com:ppt'?>
<s:ElementType name="crowd">
  <s:element type="pp:person" occurs="oneOrMore"/>
</s:ElementType>

Datatypes

Attribute types, and element types with 'textOnly' content, can constrain their values/contents to be instances of a particular datatype.   XML 1.0 defines about 10 datatypes, which may only be used to constrain attribute values, and essentially one datatype, PCDATA, that can be used for element content. Here we propose a much richer set of datatypes, available equally well for attribute and element content.

Datatypes are referenced from the datatype namespace. In order to use this namespace in a schema, it must be declared (see the example in Prolog). We have assumed throughout this document that the datatype namespace has been assigned the dt prefix, but of course it could be assigned any prefix.

The dt:type attribute is used to reference primitive datatypes. These are tabulated below---the final column shows allowed combinations when using extends:   A pair of datatype constraints is allowed if either they are identical, or the extended type is "string" (the univeral supertype), or the extended type is listed in the "extends" column below in the row for the extended type, e.g. "IDREF" is allowed for an element type which extends an "IDREFS" element type.  The underlying reason is as for extending in general, that any element allowed by the extending declaration is also allowed by the extended declaration.

Name Examples Parse type Extends
id X XML ID  
idref X XML IDREF idrefs
idrefs X Y Z XML IDREFS  
entity Foo XML ENTITY entities
entities Foo Bar XML ENTITIES  
nmtoken Name XML NMTOKEN nmtokens
nmtokens Name1 Name2 XML NMTOKENS  
enumeration Red XML ENUMERATION  
notation GIF XML NOTATION  
string Omwnuma legatai wn onoma
monon koinon, o de kata tounoma
logos thV ousiaV eteros, oion
zuon o te anqropoV kai to gegrammenon.
pcdata  
number 15, 3.14, -123.456E+10 A number, with no limit on digits, may potentially have a leading sign, fractional digits, and optionally an exponent. Punctuation as in US English.  
int 1, 58502, -13 A number, with optional sign, no fractions, no exponent. number
fixed.14.4 12.0044 Same as "number" but no more than 14 digits to the left of the decimal point, and no more than 4 to the right. number
boolean 0, 1 (1=="true") "1" or "0" number; int
dateTime 1988-04-07T18:39:09 A date in a subset of ISO 8601 format, with optional time and no optional zone. Fractional seconds may be as precise as nanoseconds.  (See XMLDate.htm). dateTime.tz
dateTime.tz 1988-04-07T18:39:09-08:00 A date in a subset ISO 8601 format, with optional time and optional zone. Fractional seconds may be as precise as nanoseconds. (See XMLDate.htm).  
date 1994-11-05 A date in a subset ISO 8601 format. (no time).  (See XMLDate.htm). dateTime; dateTime.tz
time 08:15:27 A time in a subset ISO 8601 format, with no date and no time zone. (See XMLDate.htm). time.tz
time.tz 08:1527-05:00 A time in a subset ISO 8601 format, with no date but optional time zone. (See XMLDate.htm).  
i1
byte
1, 127, -128 A number, with optional sign, no fractions, no exponent. i2, i4, i8, int, number
i2
1, 703, -32768 " i4, i8, int, number
i4, int
1, 703, -32768, 148343, -1000000000 " i8, int, number
i8
1, 703, -32768, 1483433434334, -1000000000000000 " int, number
ui1
1, 255 A number, unsigned, no fractions, no exponent. ui2, ui4, ui8, int, number
ui2
1, 255, 65535 " ui4, ui8, int, number
ui4
1, 703, 3000000000 " ui8, int, number
ui8
1483433434334 " int, number
r4
  Same as float. r8, float
r8, float
.314159265358979E+1 Same as for "number."  (Note same parse type, not the same datatype!) float
uuid 333C7BC4-460F-11D0-BC04-0080C7055A83 Hexadecimal digits representing octets, optional embedded hyphens which should be ignored.  
uri urn:schemas-microsoft-com:Office9
http://www.ics.uci.edu/pub/ietf/uri/
http://www.ietf.org/html.charters/urn-charter.html
Universal Resource Identifier  
bin.hex   Hexadecimal digits representing octets  
bin.base64   MIME style Base64 encoded binary blob.  
char   String (only one character long)  

Declaring datatype constraints

We have already seen how to use the dt:type to declare constraints on attribute types.  The same syntax can be used for element types, provided they have 'textonly' content:

<s:ElementType name="birthday" content='textOnly' dt:type="dateTime"/>

If further parameterisation (see next section) is required in the case of attribute types or element types, a datatype element can be used as a carrier for the dt:type and other attributes, in which case no dt:type attribute should appear on the type declaring element itself.

Constraints

There are various constraints that one may place on a datatype. These constraints are generally interpreted in a datatype-specific fashion.

Maximum length

dt:maxLength can be used to limit the instance length of string, number, bin.hex, and bin.base64. For string and number, the maximum length is specified in number of characters. For bin.hex and bin.base64 the max length is in terms of the number of bytes long the binary object can be. The length is inclusive, i.e. a string may be as long as the maxLength, but no longer. An example of maxLength on a string:

<s:AttributeType name="city">
  <s:datatype dt:type="string" dt:maxLength="22"/>
</s:AttributeType>

The above example says that the value of city attributes may be from zero to 22 characters long.

Min/Max

For some datatypes, minimum and maximum values may be specified. For example:

<s:ElementType name="age">
  <s:datatype dt:type="int" dt:min="0" dt:max="150"/>
</s:ElementType>

dt:min and dt:max are inclusive, so the above example allows all values between 0 and 150, including 0 and 150. There are separate attributes to specify exclusive minimum and maximum:

<s:ElementType name="age">
  <s:datatype dt:type="int" dt:minExclusive="0" dt:maxExclusive="150"/>
</s:ElementType>

The above example here allows all values between 0 and 150, but not 0 and 150 themselves.

Enumerations

The dt:values attribute is used for defining enumerations:

<s:ElementType name="colors">
  <s:datatype dt:type="enumeration" dt:values="red green blue"/>
</s:ElementType>

Picture Constraints

The dt:picture attribute can be used for defining constraints on the format of all datatypes that have a parse type of "number".  The value of the attribute is string, similar to a COBOL picture clause, that constrains the format of the number. A picture is an alphanumeric string consisting of character symbols. Each symbol, which is usually one character but may be two characters, is a placeholder that stands for a set of characters. For example, the picture "A" stands for a single alphabetic character.

The following is a list of picture symbols and their meanings.

 $123,45.90 satisfies picture $999,99.99 
 $123,45.90 satisfies picture XXXX,XX.XX
 123-45-5678 satisfies picture 999-99-9999 (Social Security Number)
 24E80 satisfies picture 99E99 (floating point)
 23.45 satisfies picture 99.99
 2345 satisfies picture 99V99 (translates to 23.45)

Unique Values

In current XML, the ID AttributeType is unique within a document. We feel that unique AttributeTypes are very important and would like to extend the concept to any named AttributeType with the ability to specify the scope of the uniqueness. Particular implementations can use unique AttributeTypes to define keys to speed up searches.   An attribute that is to be unique is marked by a dt:uniqueIn property, which specifies the ElementType that the attribute is to be unique across.

<s:AttributeType name="SerialNumber">
  <s:datatype dt:type="int" dt:UniqueIn="Company"/>
</s:AttributeType>

The above example indicates that the SerialNumber AttributeType must be unique across all occurances of company.


Appendices

Description

The Description element type may be used to provide documentary information about any declaration. This does not add constraints to the schema in any way, but   schema design tools should provide means to create and inspect descriptions.

Schema Validity

What does it mean for a well-formed XML document to be schema-valid, i.e. to satisfy the contstraints expressed in a schema as defined in this proposal?  Basically, for elements whose type is model="closed", what you see is what you get:   there's an isomorphic regexp over element names with parens for groups, operators from the order attribute and exponents from the minOccurs and maxOccurs attributes.   For model="open", I think what we do is throw out all elements whose types are not referenced in the content model, and what's left has to match it.

[not complete]

DTD

<!-- XML Data reduced v0.21 Schema -->
<?xml:namespace ns="urn:w3-org:xmldatatypes" prefix="dt"?>
<!ELEMENT Schema        ((ElementType|AttributeType|
                          Entity|Notation|Description)*)>
<!ATTLIST Schema
                name    CDATA     #REQUIRED
                model   (open|closed) 'open'>

<!-- Element Type Declarations -->
<!ELEMENT ElementType        (datatype?,
                             (AttributeType|ElementType)*,
                             attribute*,
                             (element|group|extends)*)>
<!ATTLIST ElementType
                name    CDATA      #REQUIRED
                content (empty|textOnly|eltOnly|mixed) 'mixed'
                model   (open|closed) #IMPLIED
                dt:type CDATA      #IMPLIED
                order   (seq|one|all|many) #IMPLIED>
<!-- If no model value, inherits from Schema, i.e. default is 'open' -->
<!-- If content is 'mixed', default order is 'many'; if 'eltOnly', 'seq' -->
<!-- dt:type allowed only if content=textOnly -->
<!-- datatype daughter allowed only if no dt:type and content=textOnly -->

<!-- Attribute Type Declarations -->
<!ELEMENT AttributeType      (datatype?)>
<!ATTLIST AttributeType
                name    CDATA #REQUIRED
                dt:type CDATA 'string'
                default CDATA #IMPLIED
                required (yes|no) 'no'>
<!-- Content allowed only if dt:type is absent -->

<!-- Elements allowed in a content model -->
<!ELEMENT element           EMPTY>
<!ATTLIST element
                type    CDATA   #REQUIRED
                occurs  CDATA '1:1'>
<!-- The occurs attribute here and for group consists of a pair of
     numbers specifying a closed interval.  A missing second number
     is understood to mean infinity.  Symbolic shorthands for
     the most common cases are understood as follows:
      required     1:1
      optional     0:1
      oneOrMore    1:
      zeroOrMore   0:  -->

<!-- A group in a content model -->
<!ELEMENT group          ((group|element|extends)+)>
<!ATTLIST group
                order    (seq|one|all|many) #IMPLIED
                occurs  CDATA '1:1'>
<!-- If surrounding content is 'mixed', default order is 'many';
                            if 'eltOnly', 'seq' -->

<!ELEMENT extends        EMPTY>
<!ATTLIST extends
               type     CDATA #REQUIRED>

<!ELEMENT attribute      EMPTY>
<!ATTLIST attribute
                type    CDATA #REQUIRED
                default CDATA #IMPLIED
                required (yes|no) #IMPLIED>
<!-- If required is not specified, inherits from type -->

<!-- Datatypes -->
<!ELEMENT datatype       EMPTY>
<!ATTLIST datatype
               dt:type  CDATA #REQUIRED
               dt:maxLength NMTOKEN #IMPLIED
               dt:values CDATA #IMPLIED
               dt:max   CDATA #IMPLIED
               dt:min CDATA #IMPLIED
               dt:maxExclusive CDATA #IMPLIED
               dt:minExclusive CDATA #IMPLIED>
<!-- values iff type=ENUMERATION -->
<!-- min and minExclusive cannot both appear -->
<!-- max and maxExclusive cannot both appear -->

<!-- Entity Declarations -->
<!ELEMENT Entity     (#PCDATA)>
<!ATTLIST Entity
                name     CDATA #REQUIRED
                notation CDATA #IMPLIED
                systemId CDATA #IMPLIED
                publicId CDATA #IMPLIED>
<!-- The entity is external iff there is a systemId -->
<!-- publicID is not allowed unless systemID is also present -->
<!-- notation is not allowed unless systemID is also present -->
<!-- The entity will be treated as binary if a notation is present -->
<!-- systemID and publicId (if present) must have the required syntax -->

<!-- Notation Declarations -->
<!ELEMENT Notation        EMPTY>
<!ATTLIST Notation
                name    CDATA #REQUIRED
                systemId CDATA #REQUIRED
                publicId CDATA #IMPLIED>
<!-- systemID and publicId (if present) must have the required syntax -->

<!-- Descriptions -->
<!ELEMENT Description             (#PCDATA)>

Change history:

Date Version Change
June 18, 1998 0.13 Make <default> and <requiredValue> apply to AttributeType & ElementType and not references to same.
 June 22, 1998 0.14 Wording improvements and bug fixes from Henry Thompson.
June 24, 1998 0.15 Began terminology consistency pass: ht
June 25, 1998 0.16 more consistency work, undid v0.13 change!: ht
June 30, 1998 0.17 alternative approach to (data)typing, changed attribute defaulting, changed order values and semantics (sigh), consistent story about extends: ht
July 1, 1998 0.18 Clarification of mixed content semantics, syntax for datatyping: ht
July 2, 1998 0.19 DTD now complete, namespace example added, local scope cleaned up: ht
July 2, 1998 0.20 Bumped version number to deal with merge issues: cfranks
July 3, 1998 0.21 Spelling errors, lower-cased SGML datatypes, fixed Front Page/Word figure problems (I hope): ht
July 3, 1998 0.22 Put back minOccurs, maxOccurs.  Add pictures, and unique attributes stuff from IBM: cfranks
July 4, 1998 0.23 Formatting issues.