NeuroImaging Markup Language (NIML)

Base Level Specification

Robert W Cox, PhD -- 21 Feb 2002
**Please note that this document is a DRAFT, and extremely subject to revision**

Introduction

The purpose of this specification is to define a flexible, extensible, and self-describing format for encoding structured data for neuroimaging applications. The largest component of such information is the image data itself, but the images themselves are of limited use unless some auxiliary data (e.g., voxel dimensions, image orientation, timing information) are attached.

Another motivation for this specification is to work towards defining a standard and protocol for neuroimaging applications to exchange smallish pieces of data. If the community moves towards the development of interoperating software tools, it will be important for these applications to share not only the image files, but for them to be able to "talk" to each other interactively and to exchange small chunks of information or commands (e.g., "jump to coordinates (32,47,-13)".

This base level specification details how collections of disparate information can be packaged together. The body of this document describes the format for the data.

A C API for reading, writing, and storing information using this standard and protocol is described in the appendices. At this writing, a mostly-complete (but weakly-tested) implementation is available.

Individual data elements (1D or 2D tables of numbers and/or strings) are encoded in an XML-inspired format. An entire data collection consists of a number of data elements grouped together. One or more higher level documents will specify the structure and contents of prototypical neuroimaging data sets, and describe a communications standard for interoperating neuroimaging applications.

[** The higher level documents are only a gleam in my mind's eye.**]
XML note: The software that parses data formatted in the way specified herein is partly an XML processor and partly an application, in the jargon of the XML specification. For details about XML, the best place to start is the annotated XML specification: http://www.xml.com/axml/axml.html . The XML notes herein are intended to provide asides useful to someone who already knows something about XML.
XML note: Except for binary data, it will be possible to encode data in this format in a well-formed XML document (but not DTD-validated, thanks in part to the ni_typedef elements, which allow new NIML element types to be defined in the NIML document itself). Places in this specification where care must be taken to ensure XML well-formedness will be pointed out.
XML note: Documents formed according to this specification will not be fully general XML, since many features of XML (e.g., arbitrary nesting, CDATA, general DTDs, Unicode, entities) will not be supported. This is one reason why software that reads the type of data specified herein is only partly an XML processor.
XML note: Why not use a general XML processor as a front-end to this software (e.g., expat, available at http://www.jclark.com/xml/expat.html)? Mainly because I see a need for binary data to be included, since a typical MRI data set is 10-100 Mbytes. Expansion to a pure text form seems excessive just to conform to the XML specification, especially in standalone neuroimaging applications that otherwise don't care about XML at all. Nor do I think that the XML solution to binary data (reference to an external unparsed entity) is adequate, since that will make it imposssible to package up all the data for a neuroimaging data set into one file or one data transmission stream.

Glossary

The definitions of some terms used in this specification are given here. For some terms, the equivalent XML construct is given in parentheses.


A Simple Example: One Data Element

A data element consists of a header in angle brackets "<...>", a data stream that follows the ">", and a token "</>" that closes the data stream:
 <vector ni_type=float ni_form=text ni_dimen=3> 1.3 2.2 -3.7 </> 
where the components of the above element are:
< opens the element header
vector gives the type of the data element (almost any string)
ni_type=float says that the data stream for this element should be read into 4-byte floats
ni_form=text says that the data stream is stored in text format (the default)
ni_dimen=3 says that there are 3 floats that follow (default is 1)
> is the end of the header; data stream starts at the next byte
 1.3 2.2 -3.7  is the data stream to be decoded into numerical values
</> signifies the end of the data stream for this element

Data Element Format

Bytes before the opening "<" are skipped. The "<" marks the start of the element header, which describes the contents of the data element and the data stream that will populate the data element.

XML note: An XML processor should pass through whitespace that appears between elements. This NIML specification says to ignore such whitespace (and anything else between elements, for that matter), which is one reason that NIML processing software is an "XML application" (interpreting the input) as well as an "XML processor" (making the input available to the application).

Element name:
Immediately after the opening "<" is the element name (e.g., this could be used to mark a data structure's type or class name). Later, a mechanism for specifying data element subtypes is given, and a number of predefined subtypes is listed.

Names:
The allowable characters in an element (or attribute) Name are "A-Z", "a-z", "0-9", and the special characters underscore, period, and hyphen ("_", ".", and "-"). The first character in a Name must be alphabetic. The first whitespace or other non-Name character found ends the Name. (Whitespace is defined by XML to be the characters blank, newline, carriage-return, and horizontal tab.) The maximum legal length of a Name is 255 characters. Some examples:

  Z_zzza-...         legal
  _Ethel_            illegal (can't start with "_")
  In:the:beginning   illegal (can't use ":" in a Name)

XML note: The characters that are allowed in a Name are taken from the XML specification, with the exception of the colon. The XML namespace specification (http://www.w3.org/TR/REC-xml-names/) reserves the use of the colon in Names for namespace identification. XML allows Names to start with underscore "_", but NIML does not. XML does not put a maximum length on a Name. NIML documents will be encoded strictly in 8-bit characters, with the first 128 characters being US-ASCII (no Unicode or UTF-8 for NIML). This restriction means that it would be legal to use one of the ISO-8859-* character sets for non-English languages in an NIML file, but this would raise serious portability issues (since the values from 128..255 are interpreted differently in these different character sets).

Reserved Names:
Element and Attribute Names that start with the characters "ni_" are reserved for expansion of this specification. The following are the reserved Names currently in use:

Name Purpose
ni_type Attribute: specifies type of data to read
ni_form Attribute: specifies format of data stream
ni_dimen Attribute: specifies number of values to read
ni_delta Attribute: specifies coordinate spacing between data values on a uniform grid
ni_units Attribute: specifies units used in ni_delta
ni_origin Attribute: specifies coordinate offsets for data values on a uniform grid
ni_axes Attribute: specifies axis orientations for data values on a uniform grid
ni_url Attribute: specifies external location of data stream for a data element
ni_typedef Element: defines a new data element subtype
ni_name Attribute: provides a name for an ni_typedef-ed element
ni_group Element: provides a way to group multiple elements together
ni_include Element: provides a way to read an external file into an input stream

Elements with no data (Empty elements):
The minimal element is an element name with no attributes or data stream. Such a construct could be used as a flag or command to the receiving application. For example,

  <close/>
could be used in a transmission as a command to indicate that the transmission's I/O channel should be closed. The fact that the element header closes with "/>" indicates that there is no internal data stream (i.e., this is an "empty element", in the XML jargon). Note that the "/" character is not a legal Name character, so that the element name ends with the "e" in "close".

XML note: The XML specification allows empty elements to also be of the form "<name></name>", and implies that this form is to be indistinguishable from the form "<name/>". An NIML empty element must follow the latter form, closing the element header with "/>".

Attributes:
Following the element name is a sequence of attributes in the general form "attname=string". For data elements, some of these attributes give information about how to interpret the data stream (internal or external) into data structures. The order of the attributes is not important for the parsing operations. Attributes are separated by whitespace. As mentioned earlier, attnames that start with the characters "ni_" are reserved for expansion of this specification. In addition to the predefined attributes described below, the element header may include other attributes. These will not be interpreted by the input processor, but will be passed through to the application, in the order in which they are encountered in the element header.

XML note: XML requires that no two attributes in the same element have the same attname. NIML does not enforce this requirement, but if you wish to produce a well-formed XML document, then you need to be aware of this restriction. XML allows whitespace to occur around the "=" that separates the attname from the string. NIML does not allow this whitespace; the next character after attname must be "=", and the next character after that must be a Name character or a quote character.

Strings:
Strings that are sequences of Name characters, not necessarily starting with a letter, can be present on the right hand side (RHS) of an attribute (or in a data stream) without being enclosed in quotes. Strings with other characters must be present in a quoted form, using "double quote" or 'single quote' (apostrophe) characters. If the non-whitespace character that starts a String is " (or '), then the string is assumed to be in quoted form, and everything up to the next " (or ') character is included in the string. Whitespace characters, including newlines, are included in the String value, but the quoting characters are not. In keeping with the XML specification, the following end-of-line character sequences will be "normalized" to the Unix-standard single 0x0A byte (LF character):

Hexadecimal Character Names Systems
0x0D 0x0A CR LF Microsoft standard
0x0D CR Macintosh standard
N.B.: This definition of how Strings are to be formatted also applies to String input in the data stream section of a data element.

XML note: In XML, attributes must be in a quoted string format. Thus, if an application wishes to write an NIML file to be a well-formed XML file, it should use attributes in the form attname="string", even if the string contains no whitespace. Also, XML specifies that the RHS of most attributes should be normalized by replacing all sequences of contiguous whitespace characters by a single blank. NIML does not require this step; however, all predefined NIML attribute values contain no whitespace.

In keeping with the XML roots of this specification, the following escape sequences representing single characters will be recognized in Strings:

Escape Translation Note
&lt; < (less than) Required
&gt; > (greater than) Required
&quot; " (quote) Required
&amp; & (ampersand) Required
&apos; ' (apostrophe) Optional
Characters marked as Required can only be represented in a String by the escape sequence. Characters marked as Optional can be represented by the escape or by themselves. (Since none of these are Name characters, they can only be present in quoted Strings.)

Some example attributes:

  ni_type=5f.i.S
  ni_type='5float,int,String'
  ni_url="http://zork.bork.gork/fork/spoon/pork.ork#1024-$"
  command="cat fred &gt; 'ethel'"
All but the first have their RHS in quoted form, since these String values contain non-Name characters. In the last one, the RHS String value will be passed to the application as "cat fred > 'ethel'".

XML note: Other XML-defined escapes (such as "&#x3A;" for insertion of a single character specified in hexadecimal) should not be used, since they are not required to be recognized by NIML processor software.

Comma-Separated Substrings as Attribute Values:
In some specific cases, the RHS value of a pre-defined attributes is described as being a list of comma-separated substrings. An example of such a string (which must be quoted) is "float,int,short". This String can be broken into 3 substrings "float", "int", and "short". This construction is used to specify multiple parameters to attributes that are designed to process them (e.g., ni_dimen). However, when the attribute String value is actually passed to the application, it will not be broken into substrings.

External data streams and the ni_url attribute:
Input bytes that occur between the closing ">" of the element header and the opening "<" of the end token are called the internal data stream. It is also possible for a data element to specify that its data stream shall be read from an external source rather than from the input bytes immediately following the ">" that closes the element header. The external source is specified with the ni_url attribute, as in

 <TheKing ni_url="http://www.elvis.com/" />
This specifies that the contents of the data at the given URL be taken as the data stream for this data element. If ni_url is used, then the data element header must end with "/>", since there can be no internal data stream present after the header if there is an external data stream. An external data stream does not end at its first "</", but continues until the end of the data read from the URL.

The types of URLs that can be specified in a ni_url attribute depend on the input processor. In some processors, there may be not support for such inclusion (e.g., in a socket transmission). In the C API defined in the appendices, the following types of URLs are allowable:

Form Meaning
http://a/b Absolute reference, fetched by HTTP
ftp://a/b Absolute reference, fetched by anonymous FTP
file:/a/b Absolute reference to a local file
It is also legal to append a URL fragment specifier of the form "#p..q" at the end of the attribute value. Here, "p" and "q" indicate the first and last bytes of the fetched data to include in the data stream. p and q may be in one of these forms: It is an error if the value of p is after the value of q in the fetched data. If no fragment is given, then the data stream is taken from all the fetched data (as if the fragment were #0..$).

Use of ni_url may not be wise, especially if it involves fetching data files from another computer system. Using ni_url makes reading the NIML data file dependent on the existence of another file.

An external data stream will be processed as described in the next section. How much of it will be stored into the data structure transmitted to the application will depend on the ni_type and ni_dimen attributes.

End token:
If the data element has an internal data stream, then the end of the data stream is indicated by the bytes "</". (If binary data is being read, then "</" characters inside the specified length binary data will not indicate the end of the data stream.) NIML allows the end token to be the characters "</>" or "</elementname>", where elementname is the name of the element that is being closed.

If the internal data stream runs into the end of file or the transmission closes (e.g., a socket shuts down), this is also taken as a valid end token for any elements that have not yet "closed" (including the current data element and any group elements enclosing it). This rule makes it easy to have a final data element in a file without closing it with proper end tokens. In this way, an NIML file containing image data can conform to the informal convention that the image data is always the very last collection of bytes in the file, regardless of what header information comes before.

XML note: XML requires that elements that have content (i.e., an internal data stream) end with "</elementname>". Also, XML does not consider a document to be properly closed if the file just ends. This means that a well-formed XML version of an NIML file cannot conform to the "image data is last" convention.

Data Stream Interpretation

The following attributes determine how the data stream is interpreted by the input processor.

ni_type Attribute:
This attribute specifies the type or types of the individual data components in the data stream. The following 8 types are available:

Name byte short int float double complex rgb RGBA String Line
Initial b s i f d c r R S L
Size (bytes) 1 2 4 4 8 8 (2 floats) 3 (red grn blu) 4 (r g b alpha) arbitrary arbitrary

An individual type is specified by its name or by the single character of its initial (which is why "String" starts with an uppercase letter, to distinguish it from "short", and why "RGBA" is capitalized while "rgb" is not).

The ni_type attribute may specify a single type, as in the example at the very beginning of this document, or it may specify multiple types, separated by periods "." and with an optional decimal numeric count prepended:

  ni_type=float.int.int  OR  ni_type=f.i.i  OR  ni_type=f.2i  OR  ni_type=f2i
which specifies that the values to be read from the data stream come in triples: 1 float followed by 2 ints, then 1 more float, 2 more ints, etc. In this example, the data stream must come in these units of 3 numbers. The last illustration above shows that when single character abbreviations are used for type names, they do not need to be separated by periods ".".

Aside: Maybe there are too many variations here. Instead of allowing "float" and "f", we should only allow the latter? That would make the NIML processor's job simpler. Since we might eventually write NIML processors in several languages, simplicity is an important goal.

If the ni_type attribute is not present, then the data stream will be interpreted as if ni_type=b.

XML note: The reason that the separator for multiple types is a period "." is that this is a legal Name character, and NIML allows the RHS of an attribute to be unquoted if it consists entirely of Name characters. However, XML requires the RHS of an attribute to be quoted. If the type definition String is quoted, you can also use commas "," as the type separator.

Line Data Values:
The Line type is a special form of String. A Line is the text between the current scanning point of the data stream and the next end-of-line; it does not include the end-of-line character. This input type is designed to make it easy for an application to read and write individual lines of text without using quotes to enclose possible whitespace. For example

  <junk ni_type=3L>
    I am the first Line
    This is Line #2
    And this is Line number 3  </>
The three strings that will be saved are "I am the first Line", "This is Line #2", and "And this is Line number 3", since whitespace at the beginning and end of a Line will be discarded. It is possible (not necessarily wise) to include Line data on a physical line with other values; an example illustrates the processing that results:
  <data ni_type=f.L ni_dimen=2>
     3.0   Hi Bob
     5.7
     This is cool
  </>
The first Line value read is the string "Hi Bob", since the blanks after "3.0" are discarded (being at the start of the Line data). The second Line value read is the string "This is cool", since the end-of-line after the "5.7" is also discarded.

ni_form Attribute (optional):
This attribute specifies the format of the data stream. The possible values are

  ni_form=text   OR   ni_form=binary   OR   ni_form=base64
The first means that the data stream is in text format, the second that it is binary, and the third that it is base64 encoded binary (which allows binary data to be encoded in a pure text format, at the cost of a 33% expansion in size). If the ni_form attribute is not present, then ni_form=text is assumed.

The binary and base64 attributes may optionally have one of the two strings ".msbfirst" or ".lsbfirst" appended, as in "ni_form=binary.msbfirst". This addition specifies the byte order of the binary data. If the byte order is not specified (here or otherwise), then the receiving program should assume that the binary data is stored in MSB first order ("network order"), as on Sun-Sparc, SGI-MIPS, PowerPC, and HP-PA CPUs (and the opposite of Intel CPUs). If the current CPU does not match the order of the data, then two byte data (i.e.,short) ab will be swapped to ba before being passed to the application; four byte data (i.e., float, int) abcd will be swapped to dcba; eight byte data (i.e., double) abcdefgh will be swapped to hgfedcba.

XML note: In XML, there is no way around the fact that "</" closes an element, except by using a CDATA section. Since "]]>" ends the CDATA section, one is still left with the difficulty of including an arbitrary sequence of bytes into an XML document. (In fact, some bytes are not legal anyplace in an XML document, since only valid "characters" are allowed, and not all byte sequences are valid Unicode characters.) If one wants to write an NIML file that is also a well-formed XML document, one must avoid the use of binary data. In general, I would recommend that text encoding be used for most data, and that binary (or base64) be used only for very large data elements (e.g., images).

ni_dimen Attribute (optional, but probably needed):
This attribute specifies how many data elements are to be read from the data stream. One data element corresponds to a complete set of values as specified in the ni_type attribute. If ni_type=fii and ni_dimen=3, then the data stream should contain 3 floats and 6 ints (in order f i i f i i f i i).

If the ni_dimen attribute is not specified, it is equivalent to giving ni_dimen=1. The NIML input processor will not try to guess the number of input values from the data stream.

To read an arbitrary series of bytes from the data stream into a contiguous array, the combination of attributes needed is

  ni_type=b ni_form=binary ni_dimen=num_bytes 
where num_bytes should be replaced by the number of bytes to be read.

A useful way to think of the data specified by the ni_type and ni_dimen attributes is that the data stream defines a 2D array of values. The ni_type attribute specifies the contents of each row in this array, and the ni_dimen attribute specifies how many rows will be read. In the following example, the data element produces a data structure containing the array shown in the table:

  <data ni_type=f.i.S ni_dimen=4>
    3.72 55 "This is row 1" -0.70 444 'I'm row #2' 666.666
    -555 OK-3  0.003 777 "The last row!" </>
float int String
3.72 55 "This is row 1"
-0.7 444 "I'm row #2"
666.666 -555 "OK-3"
0.003 777 "The last row!"
In the C API (see appendices), the data would end up being stored in 3 arrays, one for each column of this array. The first array would be pointed to by a float *, the second by a int *, and the third by a char **. All of these would be gathered together into one NI_element struct.

N.B.: Although specifying "ni_type=3f ni_dimen=2" and "ni_type=f ni_dimen=6" mean the same thing as far as parsing the data stream goes (6 floats expected), these do not mean the same thing to the application. The first specification is for a 3x2 table of numbers, and the second is for a 1x6 table of numbers. In the C API (see appendices), the data structure returned to the application would be stored differently for these two cases. The first case would produce 3 vectors of length 2; the second case would produce 1 vector of length 6.

Multi-Dimensional Arrays and Related Attributes (optional):
For ease in dealing with multidimensional arrays (e.g., images), it is also legal to specify the ni_dimen attribute's value as a string of more than one integer, separated by commas, as in ni_dimen="128,128,16" (i.e., the attribute value is a list of comma-separated substrings). This means that 128*128*16=262144 values (specified by ni_type) will be read from the data stream, possibly representing a 3D image or a time series of 2D images.

The following attributes can be used in conjunction with ni_dimen to specify information that lets the data be interpreted as lying on a regular grid in n-dimensions, where n is the number of values specified in the RHS of ni_dimen. Each of these attributes should have the same number of comma-separated substrings in its RHS value as ni_dimen does.

ni_delta: This should be a set of floating point numbers indicating the spacing between the locations of data values in the grid.

ni_origin: This should be a set of floating point numbers indicating the origin of the locations of data values in the grid.

ni_units: This should be a set of string values that specify the units used in ni_delta and ni_origin. These strings are also not interpreted by the processor in any way, but are simply passed through to the application.

ni_axes: This should be a set of string values that specify the direction/orientation of the coordinates axes. These strings are not interpreted by the processor in any way, but are simply passed through to the application.

Example of a header for an element to hold the data for a 4D image (say from an FMRI experiment):

  <fourD ni_type=short
         ni_dimen="64,64,16,80"
         ni_delta="3.75,3.75,5.0,2.5"
         ni_origin="-120.0,-120.0,-10.0,0.0"
	 ni_axes="R-L,A-P,I-S,time"
         ni_units="mm,mm,mm,s">
This would correspond to an experiment with 64x64 images, 16 slices per volume, and 80 volumes gathered in time (5242880 values). The voxel dimensions are 3.75 mm in plane, slice thickness of 5.0 mm, and TR is 2.5 seconds. The first data axis is Right-to-Left, the second is Anterior-to-Posterior, the third is Inferior-to-Superior, and the fourth is time. The (i,j,k,p) voxel in this 4D array is located at the (i+64*j+4096*k+65536*p)th short in the data stream, and is located at coordinates (x,y,z,t) = (-120+3.75*i, -120+3.75*j, -10+5+k, 2.5*p), for i=0..63, j=0..63, k=0..15, p=0..79.

Example of a header for an element to hold a single time series of 128 points, with sampling interval of 1.5 seconds:

 <oneD ni_type=float ni_dimen=128 ni_delta=1.5 ni_units=s> 

If ni_dimen is not used, then ni_delta, ni_origin, ni_units, and ni_axes are not broken down by the NIML processor. These attributes, if present, will still be passed to the application as strings.

Other Attributes (optional):
Other attributes may be included in the element header. All attributes will be processed and passed back to the application (as strings) in the order in which they are encountered.

Data stream:
The data stream starts at the next byte after the ">" that closes the element header, unless a "/" character immediately preceeds the ">", as in "/>". In that case, there is no data stream present in the input, and this ">" is the end of the data element encoding.

Text data:
If the data stream is in text form, then the data values are read from the stream as follows:

Type C format string
byte %u (cast to unsigned char)
short %d (cast to signed short)
int %d
float %f
double %lf
complex %f%f (real part, imaginary part)
rgb %u%u%u (each cast to unsigned char)
RGBA %u%u%u%u (each cast to unsigned char)
String non-whitespace sequence (%s), or "quoted string"
Line data up to the next end-of-line
Data values must be separated by at least one whitespace character. If a String contains whitespace, the String must be present in the text data stream in a quoted form.

Recall that Line data is defined as the text from the current scanning point up to then next end-of-line, with leading and trailing whitespace eliminated. If an entirely blank line occurs in the input, then the Line string corresponding would be empty (have zero length). For example:

  <linestuff ni_type=L ni_dimen=3>
     Line 1

     Line 3
  </>
The second line here is the empty string.

Binary or base64 data:
If the data stream is in binary or base64 format (as specified by ni_form), then the data values are read from the stream byte-by-byte (after base64 decoding, if needed), with each value taking the number of bytes specified earlier. String and Line data values are not allowed in these forms. This restriction is made so that the number of bytes in the data stream can be computed from the ni_type and ni_dimen attributes (e.g., ni_type=f.i.s and ni_dimen=3 would require a binary data stream to contain exactly (4+4+2)*3=30 bytes, and a base64 data stream to contain 30 bytes after the base64 characters are decoded).

An internal data stream ends with the bytes "</"; an external data stream ends with the end of the URL that was fetched. If the data stream is internal, the data element transmission ends with the next following ">", which allows the closing sequence to be either "</>" or "</elementname>". After the proper ni_dimen number of data values have been read, any data bytes before the closing "</" will be discarded.


Defining Data Element Subtypes

If you just want to transmit/store 3 floats, say, the above format seems excessively complicated. Therefore, a syntax is available to let you declare subtypes of the generic data element that can be used more easily.

XML note: The idea that a ni_typedef element can influence the interpretation of future elements does not violate the XML specification (which is solely concerned with "processors"), but it does not fall within the XML specification either. XML uses the "<!ELEMENT ...>" and "<!ATTLIST ...>" constructs to constrain how elements and their attributes may be formed. Alternatively, an XML Schema can be used to provide control over the form/structure of XML data (http://www.w3.org/TR/xmlschema-1/). The XML-only methods are clumsy and don't suit the NIML needs well; the XML Schema method can specify what is allowed in great detail, but is quite complex and seems like too much to support for the purposes of the neuroimaging community.

An empty element (i.e., its header ends with "/>") with name "ni_typedef" is used to define a subtype. With a ni_typedef element, you specify the ni_type attribute and possibly the ni_dimen attribute that will be used when a subtype element is found. An example specifying both:

 <ni_typedef ni_name=fv3 ni_type=f ni_dimen=3/> 
This defines the new element type fv3 to contain exactly 3 floats in its data stream. An example of such an element:
 <fv3>2.71828 3.1416 666.0</> 
Note that it would still be legal to add the ni_form= attribute to the header of the fv3 element. You can't specify ni_form in the ni_typedef element; that is, you can't force a subtype to be encoded in a particular format.

If the ni_dimen attribute is missing from the subtype definition, then it can be supplied when the subtype is used; for example:

  <ni_typedef ni_name=xyzlist ni_type=3f/>
  <xyzlist ni_dimen=4>1 2 3 4 5 6 7 8 9 10 11 12</>
This subtype is intended to encode a list of 3-tuples of floats; the example produces a 3x4 table of floats. (Recall that if ni_dimen is not supplied, then ni_dimen=1 is assumed.)

Predefined Subtypes:
The following predefined subtypes can be used:

  <ni_typedef ni_name=ni_f1 ni_type=float/> (1 float)
  <ni_typedef ni_name=ni_f2 ni_type=2f   /> (2 floats)
  <ni_typedef ni_name=ni_f3 ni_type=3f   /> (3 floats)
  <ni_typedef ni_name=ni_f4 ni_type=4f   /> (4 floats)

  <ni_typedef ni_name=ni_i1 ni_type=int/>   (1 int)
  <ni_typedef ni_name=ni_i2 ni_type=2i />   (2 ints)
  <ni_typedef ni_name=ni_i3 ni_type=3i />   (3 ints)
  <ni_typedef ni_name=ni_i4 ni_type=4i />   (4 ints)

  <ni_typedef ni_name=ni_irgb  ni_type=i.r/> (int+color)
  <ni_typedef ni_name=ni_irgba ni_type=i.R/> (int+color)

  <ni_typedef ni_name=ni_S ni_type=S/>      (string)
  <ni_typedef ni_name=ni_L ni_type=L/>      (line string)
It is an error to redefine one of these subtypes, to define a new subtype that starts with "ni_", or to redefine a subtype that was previously defined through an explict ni_typedef element. A user-defined subtype cannot be used in an element until it has been defined previously in the data transmission.


Including External Files to Define Elements

The ni_include data element can be used to specify that a given file should be included; for example:

 <ni_include ni_url="file:/home/elvis/defs.ni"/> 
which says to read the given file into the data transmission at this point. Since this is an data element (with no data stream), it cannot appear inside another data element. If desired (why?), the #p..q fragment specification can be appended to the end of the URL.

One use for the ni_include element would be to read in a set of ni_typedefs at the start of a file that used them heavily.


Contents of an Entire Data Collection

A data file or transmission stream will often contain more than one data element that must be kept together to make a coherent whole. Data elements can be grouped together using the construction

  <ni_group>
    ...elements...
  </ni_group>
where "...elements..." is replaced by one or more data elements, formatted as described earlier. The whitespace between elements will be ignored. Groups may be nested. Attributes may be included in the "<ni_group ...>" header, as with data elements.


Appendix A: Processor and Application Interaction

Most of this specification is concerned with how arbitrary data will be encoded in a (supposedly) self-describing format. However, these Appendices deal with with one model of how the input and output processors can interact with the application.

The model presented herein is batch-oriented, in that an entire unit of information is processed at once. For an input processor, a free-standing (not in a group element) data element is turned into a data structure which is fully populated and then returned to the application; a group element is turned into a tree of data structures which are fully populated and the tree is returned to the application. For an output processor, the application must fully fill up a data structure, then call the output processor library to generate the resulting data/group elements.

An alternative model would be stream-oriented processing. For input processing, the application would register functions ("callbacks") to be called when certain structures (e.g., attributes, individual data values) in the input data were encountered. For example, the beginning of a data element would trigger one callback, and the decoding of each input value from the element's data stream would trigger another callback. This would allow the application to get a finer level of control over the handling of the input, without having to have it all decoded and stored before getting access to the decoded values. This specification does not address the development of a stream-oriented API for NIML data.

XML note:
"Batch-oriented" corresponds to "DOM" in XML (http://www.w3.org/TR/DOM-Level-3-Core/).
"Stream-oriented" corresponds to "SAX" in XML (http://www.megginson.com/SAX/index.html).

Nota Bene: The data structures and routines specified in the following appendices have not yet been fully implemented. Thus, they are especially subject to change as experience accumulates. See Appendix F for information on the current status of an implementation of this API.


Appendix B: Internal Representation of a Data Element in C

The information specified by a data element will be read into a C struct of type NI_element which has the following fields:

Field Name and Type Meaning
int type ; First field is always NI_ELEMENT_TYPE
char *name ; Element name
int attr_num; Number of attributes
char **attr_lhs; attr_lhs[i] points to the ith attribute name
char **attr_rhs; attr_rhs[i] points to the ith attribute String
int vec_num; Number of vectors (from ni_type)
int vec_len; Length of vectors (from ni_dimen)
int vec_filled; How many vector rows were filled on input (<=vec_len)
int *vec_typ; vec_typ[i] is the type of the ith vector
void **vec; vec[i] points to the start of the ith vector
int vec_rank; Number of dimensions specified in ni_dimen
int *vec_axis_len; vec_axis_len[i] is the ith dimension count (from ni_dimen)
float *vec_axis_delta vec_axis_delta[i] is the ith dimension grid spacing (from ni_delta)
float *vec_axis_origin vec_axis_origin[i] is the ith dimension grid offset (from ni_origin)
char **vec_axis_unit vec_axis_unit[i] is the ith dimension grid unit string (from ni_units)
char **vec_axis_label vec_axis_label[i] is the ith dimension axis label (from ni_axes)
Further details on these fields are given below.

type:
The first field is an int which can be used to distinguish the type of this element structure; the value NI_ELEMENT_TYPE here indicates that this is a data element. (For group elements, the corresponding value would be NI_GROUP_TYPE.)

name:
This is a standard NUL-terminated C string. Since an element name must contain at least one character, this will not have zero length.

attr_num:
This is the number of attributes read, including all the ni_* attributes. This may be zero (e.g., for the elements "<ni_f1>3.2</>" and "<quit/>", there are no attributes).

attr_lhs and attr_rhs:
If attr_num is zero, then these pointers will be set to the NULL pointer. Otherwise, attr_lhs[i] will be a pointer to a standard NUL-terminated C string that is the LHS of the ith "attname=string" attribute, and attr_rhs[i] will be a pointer to the ith RHS string, for i from 0 to attr_num-1. Attributes will be stored in the order encountered in the data element header, including the attributes that start with "ni_".

vec_num:
This is the number of types declared in the ni_type attribute; for example, "ni_type=f.2i" would give vec_num=3.
Emtpy elements: vec_num is zero if there is no data stream.

vec_len:
This is the total number of entries from ni_dimen.

vec_typ:
This array specifies the types of each vector of data read from the data stream, as specified from the ni_type attribute. If vec_num=0, then vec_typ will be the NULL pointer. Otherwise, vec_typ[i] is a code indicating the data type, for i from 0 to vec_num-1:

Name byte short int float double complex rgb RGBA String Line
Code 0 1 2 3 4 5 6 7 8 9
Macro NI_BYTE NI_SHORT NI_INT NI_FLOAT NI_DOUBLE NI_COMPLEX NI_RGB NI_RGBA NI_STRING NI_LINE
vec[i] byte * short * int * float * double * complex * rgb * rgba * char ** char **

The byte, rgb, rgba, and complex types are defined by

  typedef unsigned char             byte ;
  typedef struct { byte r,g,b ; }   rgb ;
  typedef struct { byte r,g,b,a ; } rgba ;
  typedef struct { float r,i ; }    complex ;
Empty elements: vec_type is NULL.

vec:
This array of arrays actually contains the data interpreted from the data stream, if vec_num is greater than zero. vec[i] is a pointer to an array of the type encoded by vec_typ[i] and of length vec_len, for i from 0 to vec_num-1. For example, if vec_typ[2]==NI_FLOAT, then the proper use of the pointer vec[2] is something like

  int j ;
  float *fv = (float *) vec[2] ;
  for( j=0 ; j < vec_len ; j++ ) do_something( fv[j] ) ;
If vec_typ[4]==NI_STRING, then printing out the jth string would be done like so:
  char **sv = (char **) vec[4] ;
  printf("%s\n",sv[j]) ;
Empty elements: vec is NULL.

vec_rank:
This value is the number of dimensions specified in ni_dimen; some examples:

  ni_dimen=7              implies   vec_rank=1
  ni_dimen="64,64"        implies   vec_rank=2
  ni_dimen="64,64,16,80"  implies   vec_rank=4
Empty elements: vec_rank is set to 0.

vec_axis_len:
This array holds the substring values decoded from ni_dimen. Continuing the examples above:

  vec_axis_len[0] = 7
  vec_axis_len[0] = 64; vec_axis_len[1] = 64;
  vec_axis_len[0] = 64; vec_axis_len[1] = 64; vec_axis_len[2] = 16; vec_axis_len[3] = 80;
Empty elements: vec_axis_len is NULL.

vec_axis_delta:
This array holds the values decoded from the ni_delta, if it was present.
Empty elements and elements without ni_delta: vec_axis_delta is NULL.

vec_axis_origin:
This array holds the values decoded from the ni_origin, if it was present.
Empty elements and elements without ni_origin: vec_axis_origin is NULL.

vec_axis_unit:
This array of pointers to C strings holds the values decoded from ni_units (i.e., the substrings that were separated by commas).
Empty elements and elements without ni_units: vec_axis_unit is NULL.

vec_axis_label:
This array of pointers to C strings holds the values decoded from ni_axes (i.e., the substrings that were separated by commas).
Empty elements and elements without ni_axes: vec_axis_label is NULL.


Appendix C: Internal Representation of a Data Group in C

The information specified by a ni_group will be read into a C struct of type NI_group which has the following fields:

Field Name and Type Meaning
int type ; First field is always NI_GROUP_TYPE
int attr_num; Number of attributes
char **attr_lhs; attr_lhs[i] points to the ith attribute name
char **attr_rhs; attr_rhs[i] points to the ith attribute String
int part_num; Number of parts (elements or sub-groups)
int *part_typ; part_typ[i] is the type of the ith part
void **part; part[i] points to the data describing the ith part

part_num:
This is the number of elements or sub-groups encountered between the opening "<ni_group>" and the closing "</ni_group>".

part_typ:
part_typ[i] specifies whether the ith part is a data element (constant NI_ELEMENT_TYPE) or a group itself (constant NI_GROUP_TYPE), for ii=0..part_num-1.

part:
If part_typ[i]==NI_ELEMENT_TYPE, then (NI_element *)part[i] is a pointer to a NI_element struct, defined above. If part_typ[i]==NI_GROUP_TYPE,then (NI_group *)part[i] is a pointer to a NI_group struct.


Appendix D: The C API for Input from NIML

Input to the NIML Functions: NI_stream:
Data is provided to the NIML processor through an opaque handle of type NI_stream. ("Opaque" means that the internal components of this type are not visible to the application). A NI_stream for input is a source of bytes that will be scanned to construct data and/or group elements.

Opening a NI_stream for Input:
An application opens an input stream with a function call like so:

  NI_stream ns ;
  ns = NI_stream_open( sname , "r" ) ;
Here, sname is a C string (NUL-terminated) that specifies whence the stream is to derive its data. The following formats for sname are supported: If an error occurs when opening the stream (e.g., filename can't be opened, hostname cannot be found, port number illegal), NI_stream_open() returns (NI_stream)NULL.

Closing a NI_stream:
An application closes an input (or output) stream with a call like

 NI_stream_close( ns ) ; 
where ns is a valid NI_stream value that was previously returned from NI_stream_open(). NI_stream_close() has no return value. After this function has been called, the memory associated with ns has been deallocated, and it is illegal for the application to refer to ns again, unless it is reassigned by another call to NI_stream_open().

Reading Data from an NIML Input Stream:
The next block of data can be read from an opened stream using a call like so:

  void *nini ;
  int msec = 1 ;
  nini = NI_read_element( ns , msec ) ;
where ns is a valid NI_stream value that was previously returned from NI_stream_open(), and msec is the number of milliseconds the process should wait for more data to appear in the input stream. Use msec=0 for an immediate return if no data is available.

NULL is returned if a complete element could not be extracted from the input stream. To check if this has failed because the connection was closed, use

  int nn = NI_stream_readcheck(ns,0) ;
  switch( nn ){
     case -1:  /* stream has gone bad */
     case  0:  /* stream is OK, just waiting for data (sockets only) */
     case  1:  /* stream has data waiting to be read */
  }
If NI_read_element() returns NULL and NI_stream_readcheck() then returns -1, then the stream will deliver no more data.

If not NULL, the value returned by NI_read_element() points to a NI_element or to a NI_group data structure. The program can determine which by

  int tt = NI_element_type( nini ) ;
  if( tt == NI_ELEMENT_TYPE ){  /* data element */
    NI_element *nel = (NI_element *) nini ;
    /* do something here */
  } else if( tt == NI_GROUP_TYPE ){  /* group element */
    NI_group *ngr = (NI_group *) nini ;
    /* do something else here, I suppose */
  } else {
    /* this should never occur, unless nini==NULL (tt==-1) */
  }

Checking for Available Input:
It can be useful to check if data is available to be read, in order to avoid calling NI_read_element() and waiting for input when there is no input. This function call does just that:

  int cod ;
  cod = NI_stream_readcheck( ns , 1 ) ;
The return value is positive if data can be read from the NI_stream, zero if no data can be read (but the stream is still good), and negative if the stream has failed in some way (e.g., the socket was closed at the other end). This function only checks if at least 1 data byte can be read from the I/O stream represented by ns; it does not check if valid NIML data is present. For socket streams, the second argument is the number of milliseconds the function should wait to check if data is present. For the other stream types, the function will return immediately, since there is no need to synchronize with another process.

Freeing Data from an NI_group or NI_element:
Function call NI_free_element( nini ) can be used to free all the data from a NI_element or NI_group constructed by the NIML functions.

Some application software may wish to move some or all of the data out of an NI_element prior to freeing the NI_element data structure itself. Instead of having to copy arrays, the application can simply copy any pointer from the NI_element to its own storage, and then set the pointer in the NI_element to NULL. For example:

  NI_element *nel ;
  float *fp ;
  int   nfp ;
  nel = NI_read_element(ns,1) ;    /* read a group element      */
  nfp = nel->vec_len ;             /* save length of data array */
  fp  = (float *) nel->vec[0] ;    /* copy data array pointer   */
  nel->vec[0] = NULL ;             /* clear pointer in nel       */
  NI_free_element( nel ) ;         /* free everything else in ngr */
This example skips all checking (e.g., if nel==NULL), and assumes that the data structure returned is a data element that contains a float vector as its first entry. In a real application, many more cases would need to be allowed for.

Attributes within an Element:
The application can certainly search for a given attribute name in an element returned by NI_read_element(); however, there is a utility function to do this. For example:

  char *rhs = NI_get_attribute( nel , "idcode" ) ;
will return NULL if there is no attribute with left hand side "idcode"; otherwise it returns a pointer to the right hand side value of the attribute. This pointer points into the element's data structure, so it should not be modified or free()-ed. If you need to make a copy of it, use the strdup() library function. NI_get_attribute() will work with both data and group elements.

Error Conditions:
The following is a discussion of how an implementation of the C API should handle various error conditions.


Appendix E: The C API for Output to NIML

Opening a NI_stream for Output:
The string "w" is supplied as the second argument to NI_stream_open() when a program wants to write to the stream.

Writing Elements to an NIML Output Stream:
The application must first assemble a data element, or a group element containing one or more data elements. Then the element is written to the output stream with function NI_write_element():

  NI_group   *ngr ;
  NI_element *nel ;
  int nbe, nbg ;
  nbe = NI_write_element( ns , nel , NI_TEXT_MODE ) ;
  nbg = NI_write_element( ns , ngr , NI_BINARY_MODE ) ;
  (void) NI_write_element( ns , nel , NI_BASE64_MODE ) ;
The return value is the number of bytes written to the output stream. If 0 is returned, then nothing was written (this will be the case if the output socket isn't yet connected at the reading end). If -1 is returned, then nothing was written and the NI_stream suffered an unrecoverable error (this will be the case if the output socket was connected but the connection was broken: e.g., if the reading application crashed).

For debugging purposes, it is often useful to write an element to standard output (in text form, of course). This can be done with the following code snippet:

  NI_stream nstdout ;
  NI_element *nel ;  /* get this from somewhere */
  nstdout = NI_stream_open( "fd:1" , "w" ) ;
  NI_write_element( nstdout , nel , NI_TEXT_MODE ) ;
  NI_stream_close( nstdout ) ;

Output "str:" streams are always written in text mode, regardless of the third parameter to NI_write_element(). Also, data elements that contain String or Line components will always be written in text mode.

The data elements written by this API will

Assembling Data Elements:
It is perfectly possible for the application to assemble its own data elements prior to calling one of the NI_put functions to write them. However, the following routines are intended to make it simpler to assemble a data element from data structures and arrays already present in the application.

A data element created this way can be freed by using NI_free_element().

Assembling Group Elements:
A similar set of functions can be used to assemble a group element.

A group element created this way can be freed by using NI_free_element().


Appendix F: Implementation Status

[21 Feb 2002] The first implementation of most of the C API exists. It is not very well tested yet.


Appendix G: Documentation of API Functions and Structures

Alas, this important section has yet to be written.


Appendix H: Complexity of the NIML Standard

As mentioned earlier in an aside, simplicity is an important consideration, since NIML may end up being re-implemented in a number of languages. For this reason, it may be desirable to define a basic NIML specification which has some features removed. Some candidates for simplification/elimination (NIML Lite, also known as the Shrubbery):


Appendix I: Linguistic Issues

"NI" or "ni" is to be pronounced as the word "knee", but with a high pitch and shortened. For the defining example of this, please see the film Monty Python and the Holy Grail.

"Niml" means "ants" in Arabic. It is also an acronym for

These results (from 1 minute of Googling) clearly illustrate that all semi-pronounceable acronyms have already been used, over-used, re-used, abused, and used-up.