The purpose of this specification is to define a flexible, extensible, and self-describing format for encoding structured data for neuroimaging applications. The largest component of such information is the image data itself, but the images themselves are of limited use unless some auxiliary data (e.g., voxel dimensions, image orientation, timing information) are attached.
Another motivation for this specification is to work towards defining a standard and protocol for neuroimaging applications to exchange smallish pieces of data. If the community moves towards the development of interoperating software tools, it will be important for these applications to share not only the image files, but for them to be able to "talk" to each other interactively and to exchange small chunks of information or commands (e.g., "jump to coordinates (32,47,-13)".
This base level specification details how collections of disparate information can be packaged together. The body of this document describes the format for the data.
A C API for reading, writing, and storing information using this standard and protocol is described in the appendices. At this writing, a mostly-complete (but weakly-tested) implementation is available.
Individual data elements (1D or 2D tables of numbers and/or strings) are encoded in an XML-inspired format. An entire data collection consists of a number of data elements grouped together. One or more higher level documents will specify the structure and contents of prototypical neuroimaging data sets, and describe a communications standard for interoperating neuroimaging applications.
[** The higher level documents are only a gleam in my mind's eye.**]
XML note: The software that parses data formatted in the way specified herein is partly an XML processor and partly an application, in the jargon of the XML specification. For details about XML, the best place to start is the annotated XML specification: http://www.xml.com/axml/axml.html . The XML notes herein are intended to provide asides useful to someone who already knows something about XML.
XML note: Except for binary data, it will be possible to encode data in this format in a well-formed XML document (but not DTD-validated, thanks in part to the ni_typedef elements, which allow new NIML element types to be defined in the NIML document itself). Places in this specification where care must be taken to ensure XML well-formedness will be pointed out.
XML note: Documents formed according to this specification will not be fully general XML, since many features of XML (e.g., arbitrary nesting, CDATA, general DTDs, Unicode, entities) will not be supported. This is one reason why software that reads the type of data specified herein is only partly an XML processor.
XML note: Why not use a general XML processor as a front-end to this software (e.g., expat, available at http://www.jclark.com/xml/expat.html)? Mainly because I see a need for binary data to be included, since a typical MRI data set is 10-100 Mbytes. Expansion to a pure text form seems excessive just to conform to the XML specification, especially in standalone neuroimaging applications that otherwise don't care about XML at all. Nor do I think that the XML solution to binary data (reference to an external unparsed entity) is adequate, since that will make it imposssible to package up all the data for a neuroimaging data set into one file or one data transmission stream.
The definitions of some terms used in this specification are given here. For some terms, the equivalent XML construct is given in parentheses.
<vector ni_type=float ni_form=text ni_dimen=3> 1.3 2.2 -3.7 </>where the components of the above element are:
< opens the element header vector gives the type of the data element (almost any string) ni_type=float says that the data stream for this element should be read into 4-byte floats ni_form=text says that the data stream is stored in text format (the default) ni_dimen=3 says that there are 3 floats that follow (default is 1) > is the end of the header; data stream starts at the next byte 1.3 2.2 -3.7 is the data stream to be decoded into numerical values </> signifies the end of the data stream for this element
Bytes before the opening "<" are skipped. The "<" marks the start of the element header, which describes the contents of the data element and the data stream that will populate the data element.
XML note: An XML processor should pass through whitespace that appears between elements. This NIML specification says to ignore such whitespace (and anything else between elements, for that matter), which is one reason that NIML processing software is an "XML application" (interpreting the input) as well as an "XML processor" (making the input available to the application).
Element name:
Immediately after the opening "<" is the element
name (e.g., this could be used to mark a data structure's type
or class name).
Later, a mechanism for specifying data element subtypes is given,
and a number of predefined subtypes is listed.
Names:
The allowable characters in an element (or attribute) Name
are "A-Z", "a-z",
"0-9", and the special characters underscore, period,
and hyphen ("_", ".", and "-").
The first character in a Name must be alphabetic.
The first whitespace or other non-Name character found
ends the Name. (Whitespace is defined by XML to be the characters
blank, newline, carriage-return, and horizontal tab.)
The maximum legal length of a Name is 255 characters. Some examples:
Z_zzza-... legal _Ethel_ illegal (can't start with "_") In:the:beginning illegal (can't use ":" in a Name)
XML note: The characters that are allowed in a Name are taken from the XML specification, with the exception of the colon. The XML namespace specification (http://www.w3.org/TR/REC-xml-names/) reserves the use of the colon in Names for namespace identification. XML allows Names to start with underscore "_", but NIML does not. XML does not put a maximum length on a Name. NIML documents will be encoded strictly in 8-bit characters, with the first 128 characters being US-ASCII (no Unicode or UTF-8 for NIML). This restriction means that it would be legal to use one of the ISO-8859-* character sets for non-English languages in an NIML file, but this would raise serious portability issues (since the values from 128..255 are interpreted differently in these different character sets).
Reserved Names:
Element and Attribute Names
that start with the characters "ni_" are reserved for
expansion of this specification. The following are the reserved
Names currently in use:
Name Purpose ni_type Attribute: specifies type of data to read ni_form Attribute: specifies format of data stream ni_dimen Attribute: specifies number of values to read ni_delta Attribute: specifies coordinate spacing between data values on a uniform grid ni_units Attribute: specifies units used in ni_delta ni_origin Attribute: specifies coordinate offsets for data values on a uniform grid ni_axes Attribute: specifies axis orientations for data values on a uniform grid ni_url Attribute: specifies external location of data stream for a data element ni_typedef Element: defines a new data element subtype ni_name Attribute: provides a name for an ni_typedef-ed element ni_group Element: provides a way to group multiple elements together ni_include Element: provides a way to read an external file into an input stream
Elements with no data (Empty elements):
The minimal element is an element name with no attributes or
data stream. Such a construct could be used as a flag or command to the
receiving application.
For example,
<close/>could be used in a transmission as a command to indicate that the transmission's I/O channel should be closed. The fact that the element header closes with "/>" indicates that there is no internal data stream (i.e., this is an "empty element", in the XML jargon). Note that the "/" character is not a legal Name character, so that the element name ends with the "e" in "close".
XML note: The XML specification allows empty elements to also be of the form "<name></name>", and implies that this form is to be indistinguishable from the form "<name/>". An NIML empty element must follow the latter form, closing the element header with "/>".
Attributes:
Following the element name
is a sequence of attributes in the
general form "attname=string".
For data elements, some of
these attributes give information about how to interpret
the data stream (internal or external) into data structures.
The order of the attributes is not important for the parsing operations.
Attributes are separated by whitespace. As mentioned earlier,
attnames that start with the characters "ni_"
are reserved for expansion of this specification.
In addition to the predefined
attributes described below, the element header may include other
attributes.
These will not be interpreted by the input processor,
but will be passed through to the application, in the order in which
they are encountered in the element header.
XML note: XML requires that no two attributes in the same element have the same attname. NIML does not enforce this requirement, but if you wish to produce a well-formed XML document, then you need to be aware of this restriction. XML allows whitespace to occur around the "=" that separates the attname from the string. NIML does not allow this whitespace; the next character after attname must be "=", and the next character after that must be a Name character or a quote character.
Strings:
Strings that are sequences of Name characters, not necessarily
starting with a letter, can be present
on the right hand side (RHS) of
an attribute (or in a data stream) without being
enclosed in quotes.
Strings with other characters
must be present in a quoted form, using "double quote" or 'single quote'
(apostrophe) characters.
If the non-whitespace character that starts a String is "
(or '), then
the string is assumed to be in quoted form, and everything up to
the next " (or ') character is included in the string.
Whitespace characters, including newlines, are included in the String value,
but the quoting characters are not. In keeping with the XML specification,
the following end-of-line character sequences will be "normalized" to the
Unix-standard single 0x0A byte (LF character):
N.B.: This definition of how Strings are to be formatted also applies to String input in the data stream section of a data element.
Hexadecimal Character Names Systems 0x0D 0x0A CR LF Microsoft standard 0x0D CR Macintosh standard
XML note: In XML, attributes must be in a quoted string format. Thus, if an application wishes to write an NIML file to be a well-formed XML file, it should use attributes in the form attname="string", even if the string contains no whitespace. Also, XML specifies that the RHS of most attributes should be normalized by replacing all sequences of contiguous whitespace characters by a single blank. NIML does not require this step; however, all predefined NIML attribute values contain no whitespace.
In keeping with the XML roots of this specification, the following escape sequences representing single characters will be recognized in Strings:
Characters marked as Required can only be represented in a String by the escape sequence. Characters marked as Optional can be represented by the escape or by themselves. (Since none of these are Name characters, they can only be present in quoted Strings.)
Escape Translation Note < < (less than) Required > > (greater than) Required " " (quote) Required & & (ampersand) Required ' ' (apostrophe) Optional
Some example attributes:
ni_type=5f.i.S ni_type='5float,int,String' ni_url="http://zork.bork.gork/fork/spoon/pork.ork#1024-$" command="cat fred > 'ethel'"All but the first have their RHS in quoted form, since these String values contain non-Name characters. In the last one, the RHS String value will be passed to the application as "cat fred > 'ethel'".
XML note: Other XML-defined escapes (such as ":" for insertion of a single character specified in hexadecimal) should not be used, since they are not required to be recognized by NIML processor software.
Comma-Separated Substrings as Attribute Values:
In some specific cases, the RHS value of a pre-defined attributes
is described as being a list of comma-separated substrings.
An example of such a string (which must be quoted) is
"float,int,short". This String can be broken into 3
substrings "float", "int", and "short".
This construction is used to specify multiple parameters to attributes
that are designed to process them (e.g., ni_dimen).
However, when the attribute String value is actually passed to the
application, it will not be broken into substrings.
External data streams and the ni_url attribute:
Input bytes that occur between the closing ">" of the element
header and the opening "<" of the end token are called the
internal data stream. It is also possible for a data element
to specify that its data
stream shall be read from an external source rather than from the input bytes
immediately following the ">" that closes the element header.
The external source is specified with the ni_url attribute, as in
<TheKing ni_url="http://www.elvis.com/" />This specifies that the contents of the data at the given URL be taken as the data stream for this data element. If ni_url is used, then the data element header must end with "/>", since there can be no internal data stream present after the header if there is an external data stream. An external data stream does not end at its first "</", but continues until the end of the data read from the URL.
The types of URLs that can be specified in a ni_url attribute depend on the input processor. In some processors, there may be not support for such inclusion (e.g., in a socket transmission). In the C API defined in the appendices, the following types of URLs are allowable:
It is also legal to append a URL fragment specifier of the form "#p..q" at the end of the attribute value. Here, "p" and "q" indicate the first and last bytes of the fetched data to include in the data stream. p and q may be in one of these forms:
Form Meaning http://a/b Absolute reference, fetched by HTTP ftp://a/b Absolute reference, fetched by anonymous FTP file:/a/b Absolute reference to a local file
Use of ni_url may not be wise, especially if it involves fetching data files from another computer system. Using ni_url makes reading the NIML data file dependent on the existence of another file.
An external data stream will be processed as described in the next section. How much of it will be stored into the data structure transmitted to the application will depend on the ni_type and ni_dimen attributes.
End token:
If the data element has an internal data stream, then
the end of the data stream is indicated by the bytes
"</". (If binary data is being read, then "</"
characters inside the specified length binary data will not indicate the end of the
data stream.) NIML allows the end token to be the characters
"</>" or "</elementname>", where elementname
is the name of the element that is being closed.
If the internal data stream runs into the end of file or the transmission closes (e.g., a socket shuts down), this is also taken as a valid end token for any elements that have not yet "closed" (including the current data element and any group elements enclosing it). This rule makes it easy to have a final data element in a file without closing it with proper end tokens. In this way, an NIML file containing image data can conform to the informal convention that the image data is always the very last collection of bytes in the file, regardless of what header information comes before.
XML note: XML requires that elements that have content (i.e., an internal data stream) end with "</elementname>". Also, XML does not consider a document to be properly closed if the file just ends. This means that a well-formed XML version of an NIML file cannot conform to the "image data is last" convention.
The following attributes determine how the data stream is interpreted by the input processor.
ni_type Attribute:
This attribute
specifies the type or types of the individual data components in the data stream.
The following 8 types are available:
Name byte short int float double complex rgb RGBA String Line Initial b s i f d c r R S L Size (bytes) 1 2 4 4 8 8 (2 floats) 3 (red grn blu) 4 (r g b alpha) arbitrary arbitrary
An individual type is specified by its name or by the single character of its initial (which is why "String" starts with an uppercase letter, to distinguish it from "short", and why "RGBA" is capitalized while "rgb" is not).
The ni_type attribute may specify a single type, as in the example at the very beginning of this document, or it may specify multiple types, separated by periods "." and with an optional decimal numeric count prepended:
ni_type=float.int.int OR ni_type=f.i.i OR ni_type=f.2i OR ni_type=f2iwhich specifies that the values to be read from the data stream come in triples: 1 float followed by 2 ints, then 1 more float, 2 more ints, etc. In this example, the data stream must come in these units of 3 numbers. The last illustration above shows that when single character abbreviations are used for type names, they do not need to be separated by periods ".".
Aside: Maybe there are too many variations here. Instead of allowing "float" and "f", we should only allow the latter? That would make the NIML processor's job simpler. Since we might eventually write NIML processors in several languages, simplicity is an important goal.
If the ni_type attribute is not present, then the data stream will be interpreted as if ni_type=b.
XML note: The reason that the separator for multiple types is a period "." is that this is a legal Name character, and NIML allows the RHS of an attribute to be unquoted if it consists entirely of Name characters. However, XML requires the RHS of an attribute to be quoted. If the type definition String is quoted, you can also use commas "," as the type separator.
Line Data Values:
The Line type is a special form of String.
A Line is the text between the current scanning point of
the data stream and the next end-of-line; it does not include the
end-of-line character. This input type is designed to make it easy for
an application to read and write individual lines of text without using
quotes to enclose possible whitespace. For example
<junk ni_type=3L> I am the first Line This is Line #2 And this is Line number 3 </>The three strings that will be saved are "I am the first Line", "This is Line #2", and "And this is Line number 3", since whitespace at the beginning and end of a Line will be discarded. It is possible (not necessarily wise) to include Line data on a physical line with other values; an example illustrates the processing that results:
<data ni_type=f.L ni_dimen=2> 3.0 Hi Bob 5.7 This is cool </>The first Line value read is the string "Hi Bob", since the blanks after "3.0" are discarded (being at the start of the Line data). The second Line value read is the string "This is cool", since the end-of-line after the "5.7" is also discarded.
ni_form Attribute (optional):
This attribute specifies the format of the data stream. The
possible values are
ni_form=text OR ni_form=binary OR ni_form=base64The first means that the data stream is in text format, the second that it is binary, and the third that it is base64 encoded binary (which allows binary data to be encoded in a pure text format, at the cost of a 33% expansion in size). If the ni_form attribute is not present, then ni_form=text is assumed.
The binary and base64 attributes may optionally have one of the two strings ".msbfirst" or ".lsbfirst" appended, as in "ni_form=binary.msbfirst". This addition specifies the byte order of the binary data. If the byte order is not specified (here or otherwise), then the receiving program should assume that the binary data is stored in MSB first order ("network order"), as on Sun-Sparc, SGI-MIPS, PowerPC, and HP-PA CPUs (and the opposite of Intel CPUs). If the current CPU does not match the order of the data, then two byte data (i.e.,short) ab will be swapped to ba before being passed to the application; four byte data (i.e., float, int) abcd will be swapped to dcba; eight byte data (i.e., double) abcdefgh will be swapped to hgfedcba.
XML note: In XML, there is no way around the fact that "</" closes an element, except by using a CDATA section. Since "]]>" ends the CDATA section, one is still left with the difficulty of including an arbitrary sequence of bytes into an XML document. (In fact, some bytes are not legal anyplace in an XML document, since only valid "characters" are allowed, and not all byte sequences are valid Unicode characters.) If one wants to write an NIML file that is also a well-formed XML document, one must avoid the use of binary data. In general, I would recommend that text encoding be used for most data, and that binary (or base64) be used only for very large data elements (e.g., images).
ni_dimen Attribute (optional, but probably needed):
This attribute specifies how many data elements are to be read from
the data stream. One data element corresponds to a complete set
of values as specified in the ni_type attribute.
If ni_type=fii and ni_dimen=3, then the data stream
should contain 3 floats and 6 ints
(in order f i i f i i f i i).
If the ni_dimen attribute is not specified, it is equivalent to giving ni_dimen=1. The NIML input processor will not try to guess the number of input values from the data stream.
To read an arbitrary series of bytes from the data stream into a contiguous array, the combination of attributes needed is
ni_type=b ni_form=binary ni_dimen=num_byteswhere num_bytes should be replaced by the number of bytes to be read.
A useful way to think of the data specified by the ni_type and ni_dimen attributes is that the data stream defines a 2D array of values. The ni_type attribute specifies the contents of each row in this array, and the ni_dimen attribute specifies how many rows will be read. In the following example, the data element produces a data structure containing the array shown in the table:
<data ni_type=f.i.S ni_dimen=4> 3.72 55 "This is row 1" -0.70 444 'I'm row #2' 666.666 -555 OK-3 0.003 777 "The last row!" </>
In the C API (see appendices), the data would end up being stored in 3 arrays, one for each column of this array. The first array would be pointed to by a float *, the second by a int *, and the third by a char **. All of these would be gathered together into one NI_element struct.
float int String 3.72 55 "This is row 1" -0.7 444 "I'm row #2" 666.666 -555 "OK-3" 0.003 777 "The last row!"
N.B.: Although specifying "ni_type=3f ni_dimen=2" and "ni_type=f ni_dimen=6" mean the same thing as far as parsing the data stream goes (6 floats expected), these do not mean the same thing to the application. The first specification is for a 3x2 table of numbers, and the second is for a 1x6 table of numbers. In the C API (see appendices), the data structure returned to the application would be stored differently for these two cases. The first case would produce 3 vectors of length 2; the second case would produce 1 vector of length 6.
Multi-Dimensional Arrays and Related Attributes (optional):
For ease in dealing with multidimensional arrays (e.g., images),
it is also legal to specify the ni_dimen
attribute's value as a string of more than one integer, separated by commas,
as in ni_dimen="128,128,16" (i.e., the attribute value is
a list of comma-separated substrings).
This means that 128*128*16=262144 values
(specified by ni_type) will be read from the data stream, possibly
representing a 3D image or a time series of 2D images.
The following attributes can be used in conjunction with ni_dimen to specify information that lets the data be interpreted as lying on a regular grid in n-dimensions, where n is the number of values specified in the RHS of ni_dimen. Each of these attributes should have the same number of comma-separated substrings in its RHS value as ni_dimen does.
ni_delta: This should be a set of floating point numbers indicating the spacing between the locations of data values in the grid.
ni_origin: This should be a set of floating point numbers indicating the origin of the locations of data values in the grid.
ni_units: This should be a set of string values that specify the units used in ni_delta and ni_origin. These strings are also not interpreted by the processor in any way, but are simply passed through to the application.
ni_axes: This should be a set of string values that specify the direction/orientation of the coordinates axes. These strings are not interpreted by the processor in any way, but are simply passed through to the application.
Example of a header for an element to hold the data for a 4D image (say from an FMRI experiment):
<fourD ni_type=short ni_dimen="64,64,16,80" ni_delta="3.75,3.75,5.0,2.5" ni_origin="-120.0,-120.0,-10.0,0.0" ni_axes="R-L,A-P,I-S,time" ni_units="mm,mm,mm,s">This would correspond to an experiment with 64x64 images, 16 slices per volume, and 80 volumes gathered in time (5242880 values). The voxel dimensions are 3.75 mm in plane, slice thickness of 5.0 mm, and TR is 2.5 seconds. The first data axis is Right-to-Left, the second is Anterior-to-Posterior, the third is Inferior-to-Superior, and the fourth is time. The (i,j,k,p) voxel in this 4D array is located at the (i+64*j+4096*k+65536*p)th short in the data stream, and is located at coordinates (x,y,z,t) = (-120+3.75*i, -120+3.75*j, -10+5+k, 2.5*p), for i=0..63, j=0..63, k=0..15, p=0..79.
Example of a header for an element to hold a single time series of 128 points, with sampling interval of 1.5 seconds:
<oneD ni_type=float ni_dimen=128 ni_delta=1.5 ni_units=s>
If ni_dimen is not used, then ni_delta, ni_origin, ni_units, and ni_axes are not broken down by the NIML processor. These attributes, if present, will still be passed to the application as strings.
Other Attributes (optional):
Other attributes may be included in the element header.
All attributes
will be processed and passed back to the application (as strings)
in the order in which they are encountered.
Data stream:
The data stream starts at the next byte after the ">" that closes
the element header,
unless a "/" character immediately preceeds the ">",
as in "/>".
In that case, there is no data stream present in the input, and this ">"
is the end of the data element encoding.
Text data:
If the data stream is in text form, then the data
values are read from the stream as follows:
Data values must be separated by at least one whitespace character. If a String contains whitespace, the String must be present in the text data stream in a quoted form.
Type C format string byte %u (cast to unsigned char) short %d (cast to signed short) int %d float %f double %lf complex %f%f (real part, imaginary part) rgb %u%u%u (each cast to unsigned char) RGBA %u%u%u%u (each cast to unsigned char) String non-whitespace sequence (%s), or "quoted string" Line data up to the next end-of-line
Recall that Line data is defined as the text from the current scanning point up to then next end-of-line, with leading and trailing whitespace eliminated. If an entirely blank line occurs in the input, then the Line string corresponding would be empty (have zero length). For example:
<linestuff ni_type=L ni_dimen=3> Line 1 Line 3 </>The second line here is the empty string.
Binary or base64 data:
If the data stream is in binary or base64 format
(as specified by ni_form),
then the data
values are read from the stream byte-by-byte (after base64 decoding, if needed),
with each value
taking the number of bytes specified earlier. String and Line data values
are not allowed in these forms. This restriction is made so
that the number
of bytes in the data stream can be computed from the ni_type
and ni_dimen attributes (e.g., ni_type=f.i.s and ni_dimen=3
would require a binary data stream to contain exactly (4+4+2)*3=30 bytes, and
a base64 data stream to contain 30 bytes after the base64 characters are decoded).
An internal data stream ends with the bytes "</"; an external data stream ends with the end of the URL that was fetched. If the data stream is internal, the data element transmission ends with the next following ">", which allows the closing sequence to be either "</>" or "</elementname>". After the proper ni_dimen number of data values have been read, any data bytes before the closing "</" will be discarded.
If you just want to transmit/store 3 floats, say, the above format seems excessively complicated. Therefore, a syntax is available to let you declare subtypes of the generic data element that can be used more easily.
XML note: The idea that a ni_typedef element can influence the interpretation of future elements does not violate the XML specification (which is solely concerned with "processors"), but it does not fall within the XML specification either. XML uses the "<!ELEMENT ...>" and "<!ATTLIST ...>" constructs to constrain how elements and their attributes may be formed. Alternatively, an XML Schema can be used to provide control over the form/structure of XML data (http://www.w3.org/TR/xmlschema-1/). The XML-only methods are clumsy and don't suit the NIML needs well; the XML Schema method can specify what is allowed in great detail, but is quite complex and seems like too much to support for the purposes of the neuroimaging community.
An empty element (i.e., its header ends with "/>") with name "ni_typedef" is used to define a subtype. With a ni_typedef element, you specify the ni_type attribute and possibly the ni_dimen attribute that will be used when a subtype element is found. An example specifying both:
<ni_typedef ni_name=fv3 ni_type=f ni_dimen=3/>This defines the new element type fv3 to contain exactly 3 floats in its data stream. An example of such an element:
<fv3>2.71828 3.1416 666.0</>Note that it would still be legal to add the ni_form= attribute to the header of the fv3 element. You can't specify ni_form in the ni_typedef element; that is, you can't force a subtype to be encoded in a particular format.
If the ni_dimen attribute is missing from the subtype definition, then it can be supplied when the subtype is used; for example:
<ni_typedef ni_name=xyzlist ni_type=3f/> <xyzlist ni_dimen=4>1 2 3 4 5 6 7 8 9 10 11 12</>This subtype is intended to encode a list of 3-tuples of floats; the example produces a 3x4 table of floats. (Recall that if ni_dimen is not supplied, then ni_dimen=1 is assumed.)
Predefined Subtypes:
The following predefined subtypes can be used:
<ni_typedef ni_name=ni_f1 ni_type=float/> (1 float) <ni_typedef ni_name=ni_f2 ni_type=2f /> (2 floats) <ni_typedef ni_name=ni_f3 ni_type=3f /> (3 floats) <ni_typedef ni_name=ni_f4 ni_type=4f /> (4 floats) <ni_typedef ni_name=ni_i1 ni_type=int/> (1 int) <ni_typedef ni_name=ni_i2 ni_type=2i /> (2 ints) <ni_typedef ni_name=ni_i3 ni_type=3i /> (3 ints) <ni_typedef ni_name=ni_i4 ni_type=4i /> (4 ints) <ni_typedef ni_name=ni_irgb ni_type=i.r/> (int+color) <ni_typedef ni_name=ni_irgba ni_type=i.R/> (int+color) <ni_typedef ni_name=ni_S ni_type=S/> (string) <ni_typedef ni_name=ni_L ni_type=L/> (line string)It is an error to redefine one of these subtypes, to define a new subtype that starts with "ni_", or to redefine a subtype that was previously defined through an explict ni_typedef element. A user-defined subtype cannot be used in an element until it has been defined previously in the data transmission.
The ni_include data element can be used to specify that a given file should be included; for example:
<ni_include ni_url="file:/home/elvis/defs.ni"/>which says to read the given file into the data transmission at this point. Since this is an data element (with no data stream), it cannot appear inside another data element. If desired (why?), the #p..q fragment specification can be appended to the end of the URL.
One use for the ni_include element would be to read in a set of ni_typedefs at the start of a file that used them heavily.
A data file or transmission stream will often contain more than one data element that must be kept together to make a coherent whole. Data elements can be grouped together using the construction
<ni_group> ...elements... </ni_group>where "...elements..." is replaced by one or more data elements, formatted as described earlier. The whitespace between elements will be ignored. Groups may be nested. Attributes may be included in the "<ni_group ...>" header, as with data elements.
Most of this specification is concerned with how arbitrary data will be encoded in a (supposedly) self-describing format. However, these Appendices deal with with one model of how the input and output processors can interact with the application.
The model presented herein is batch-oriented, in that an entire unit of information is processed at once. For an input processor, a free-standing (not in a group element) data element is turned into a data structure which is fully populated and then returned to the application; a group element is turned into a tree of data structures which are fully populated and the tree is returned to the application. For an output processor, the application must fully fill up a data structure, then call the output processor library to generate the resulting data/group elements.
An alternative model would be stream-oriented processing. For input processing, the application would register functions ("callbacks") to be called when certain structures (e.g., attributes, individual data values) in the input data were encountered. For example, the beginning of a data element would trigger one callback, and the decoding of each input value from the element's data stream would trigger another callback. This would allow the application to get a finer level of control over the handling of the input, without having to have it all decoded and stored before getting access to the decoded values. This specification does not address the development of a stream-oriented API for NIML data.
XML note:
"Batch-oriented" corresponds to "DOM" in XML (http://www.w3.org/TR/DOM-Level-3-Core/).
"Stream-oriented" corresponds to "SAX" in XML (http://www.megginson.com/SAX/index.html).
Nota Bene: The data structures and routines specified in the following appendices have not yet been fully implemented. Thus, they are especially subject to change as experience accumulates. See Appendix F for information on the current status of an implementation of this API.
The information specified by a data element will be read into a C struct of type NI_element which has the following fields:
Further details on these fields are given below.
Field Name and Type Meaning int type ; First field is always NI_ELEMENT_TYPE char *name ; Element name int attr_num; Number of attributes char **attr_lhs; attr_lhs[i] points to the ith attribute name char **attr_rhs; attr_rhs[i] points to the ith attribute String int vec_num; Number of vectors (from ni_type) int vec_len; Length of vectors (from ni_dimen) int vec_filled; How many vector rows were filled on input (<=vec_len) int *vec_typ; vec_typ[i] is the type of the ith vector void **vec; vec[i] points to the start of the ith vector int vec_rank; Number of dimensions specified in ni_dimen int *vec_axis_len; vec_axis_len[i] is the ith dimension count (from ni_dimen) float *vec_axis_delta vec_axis_delta[i] is the ith dimension grid spacing (from ni_delta) float *vec_axis_origin vec_axis_origin[i] is the ith dimension grid offset (from ni_origin) char **vec_axis_unit vec_axis_unit[i] is the ith dimension grid unit string (from ni_units) char **vec_axis_label vec_axis_label[i] is the ith dimension axis label (from ni_axes)
type:
The first field is an int which can be used to distinguish
the type of this element structure; the value NI_ELEMENT_TYPE
here indicates that this is a data element. (For group elements, the
corresponding value would be NI_GROUP_TYPE.)
name:
This is a standard NUL-terminated C string. Since an element name
must contain at least one character, this will not have zero length.
attr_num:
This is the number of attributes read, including all the
ni_* attributes. This may be zero
(e.g., for the elements "<ni_f1>3.2</>" and
"<quit/>", there are
no attributes).
attr_lhs and attr_rhs:
If attr_num is zero, then these pointers will be set to
the NULL pointer.
Otherwise, attr_lhs[i] will be a pointer to a standard NUL-terminated
C string that is the LHS of the ith
"attname=string" attribute, and
attr_rhs[i] will be a pointer to the ith
RHS string, for i from 0 to attr_num-1.
Attributes will be stored in the order encountered in the data element header,
including the attributes that start with "ni_".
vec_num:
This is the number of types declared in the ni_type attribute;
for example, "ni_type=f.2i" would give vec_num=3.
Emtpy elements:
vec_num is zero if there is no data stream.
vec_len:
This is the total number of entries from ni_dimen.
vec_typ:
This array specifies the types of each vector of data read from the
data stream, as specified from the ni_type attribute.
If vec_num=0, then vec_typ will be
the NULL pointer. Otherwise,
vec_typ[i] is a code indicating the data type, for
i from 0 to vec_num-1:
Name | byte | short | int | float | double | complex | rgb | RGBA | String | Line |
Code | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Macro | NI_BYTE | NI_SHORT | NI_INT | NI_FLOAT | NI_DOUBLE | NI_COMPLEX | NI_RGB | NI_RGBA | NI_STRING | NI_LINE |
vec[i] | byte * | short * | int * | float * | double * | complex * | rgb * | rgba * | char ** | char ** |
typedef unsigned char byte ; typedef struct { byte r,g,b ; } rgb ; typedef struct { byte r,g,b,a ; } rgba ; typedef struct { float r,i ; } complex ;Empty elements: vec_type is NULL.
vec:
This array of arrays actually
contains the data interpreted from the data stream, if
vec_num is greater than zero.
vec[i] is a pointer to an array of the type encoded
by vec_typ[i] and of length vec_len, for
i from 0 to vec_num-1. For example, if
vec_typ[2]==NI_FLOAT, then the proper use of the pointer
vec[2] is something like
int j ; float *fv = (float *) vec[2] ; for( j=0 ; j < vec_len ; j++ ) do_something( fv[j] ) ;If vec_typ[4]==NI_STRING, then printing out the jth string would be done like so:
char **sv = (char **) vec[4] ; printf("%s\n",sv[j]) ;Empty elements: vec is NULL.
vec_rank:
This value is the number of dimensions specified in ni_dimen;
some examples:
ni_dimen=7 implies vec_rank=1 ni_dimen="64,64" implies vec_rank=2 ni_dimen="64,64,16,80" implies vec_rank=4Empty elements: vec_rank is set to 0.
vec_axis_len:
This array holds the substring values decoded from ni_dimen.
Continuing the examples above:
vec_axis_len[0] = 7 vec_axis_len[0] = 64; vec_axis_len[1] = 64; vec_axis_len[0] = 64; vec_axis_len[1] = 64; vec_axis_len[2] = 16; vec_axis_len[3] = 80;Empty elements: vec_axis_len is NULL.
vec_axis_delta:
This array holds the values decoded from the ni_delta, if it
was present.
Empty elements and elements without
ni_delta: vec_axis_delta is NULL.
vec_axis_origin:
This array holds the values decoded from the ni_origin, if it
was present.
Empty elements and elements without
ni_origin: vec_axis_origin is NULL.
vec_axis_unit:
This array of pointers to C strings holds the values decoded from
ni_units (i.e., the substrings
that were separated by commas).
Empty elements and elements without
ni_units: vec_axis_unit is NULL.
vec_axis_label:
This array of pointers to C strings holds the values decoded from
ni_axes (i.e., the substrings
that were separated by commas).
Empty elements and elements without
ni_axes: vec_axis_label is NULL.
The information specified by a ni_group will be read into a C struct of type NI_group which has the following fields:
Field Name and Type Meaning int type ; First field is always NI_GROUP_TYPE int attr_num; Number of attributes char **attr_lhs; attr_lhs[i] points to the ith attribute name char **attr_rhs; attr_rhs[i] points to the ith attribute String int part_num; Number of parts (elements or sub-groups) int *part_typ; part_typ[i] is the type of the ith part void **part; part[i] points to the data describing the ith part
part_num:
This is the number of elements or sub-groups encountered between
the opening "<ni_group>"
and the closing "</ni_group>".
part_typ:
part_typ[i] specifies whether the ith part
is a data element (constant NI_ELEMENT_TYPE) or a group itself
(constant NI_GROUP_TYPE), for ii=0..part_num-1.
part:
If part_typ[i]==NI_ELEMENT_TYPE, then
(NI_element *)part[i] is a pointer to a NI_element
struct, defined above.
If part_typ[i]==NI_GROUP_TYPE,then
(NI_group *)part[i] is a pointer to a NI_group struct.
Input to the NIML Functions: NI_stream:
Data is provided to the NIML processor
through an opaque handle of type NI_stream.
("Opaque" means that the internal components of this type are
not visible to the application).
A NI_stream for input is a source of bytes that
will be scanned to construct data and/or group elements.
Opening a NI_stream for Input:
An application opens an input stream with a function call like so:
NI_stream ns ; ns = NI_stream_open( sname , "r" ) ;Here, sname is a C string (NUL-terminated) that specifies whence the stream is to derive its data. The following formats for sname are supported:
"file:filename"
This form opens the file "filename" for input,
using the C library function fopen().
"fd:integer"
This form does I/O to the pre-opened (by the application)
file descriptor given by integer. For example,
"fd:0" can be used for input from stdin,
and "fd:1" can be used for output to stdout.
When NI_stream_close() is called, this file
descriptor will not be closed -- the application opened
it, so the application can close it.
"http://hostname/filename" OR
"ftp://hostname/filename"
These forms fetch the given URL and then reads data from it.
Effectively, these forms are somewhat like "str:", where the
input string of bytes comes from an external resource. The
entire contents of the URL will
be fetched during the NI_stream_open() call and
stored in a memory buffer inside the NI_stream structure.
"str:string"
This form uses a copy of the characters that follow "str:"
as the source of input bytes. For example:
"str:<fred ni_type=f ni_dimen=3>1.1 1.2 1.3</>"It is also possible to provide the string to be decoded with a pair of calls like
ns = NI_stream_open( "str:" , "r" ) ; NI_stream_setbuf( ns , string ) ;The call to NI_stream_setbuf() will do nothing if ns was not opened as a "str:" in input ("r") mode. Otherwise, any existing contents of the internal string buffer will be discarded and replaced by a copy of the contents of string.
"tcp:hostname:port"
This form opens a TCP/IP socket to the computer hostname
(which can be specified by Internet name or by IP address in
the standard dotted form 123.456.789.123), on the port given
by port. For example,
"tcp:127.0.0.1:9999"opens a socket to the local computer on port #9999.
int msec=5, nn=NI_stream_goodcheck(ns,msec) ;The input msec is the number of milliseconds to wait for the stream to become good. The return value nn is 1 if the stream is potentially capable of reading data (socket is open; or file/string hasn't been used up yet). The return value is 0 if the stream isn't yet ready, but is waiting for connection (socket isn't connected yet). The return value is -1 if an unrecoverable fatal error has happened to the stream (socket connection failed or broke, input file/string was exhausted) such that no more data will be readable. This function can be used in a loop to check for establishment of the connection:
ns = NI_stream_open( "tcp:anybody:6666" , "r" ) ; if( ns == NULL ){ fprintf(stderr,"Can't open socket 6666\n"); exit(1); } while(1){ nn = NI_stream_goodcheck(ns,1) ; if( nn == 1 ) break ; /* good! */ if( nn < 0 ){ fprintf(stderr,"Can't accept on socket 6666\n"); exit(1); } /** could do something else here before trying again **/ } fprintf(stderr,"Socket 6666 connected from address %s\n",NI_stream_name(ns)) ;
Closing a NI_stream:
An application closes an input (or output) stream with a call like
NI_stream_close( ns ) ;where ns is a valid NI_stream value that was previously returned from NI_stream_open(). NI_stream_close() has no return value. After this function has been called, the memory associated with ns has been deallocated, and it is illegal for the application to refer to ns again, unless it is reassigned by another call to NI_stream_open().
Reading Data from an NIML Input Stream:
The next block of data can be read from an opened stream using
a call like so:
void *nini ; int msec = 1 ; nini = NI_read_element( ns , msec ) ;where ns is a valid NI_stream value that was previously returned from NI_stream_open(), and msec is the number of milliseconds the process should wait for more data to appear in the input stream. Use msec=0 for an immediate return if no data is available.
NULL is returned if a complete element could not be extracted from the input stream. To check if this has failed because the connection was closed, use
int nn = NI_stream_readcheck(ns,0) ; switch( nn ){ case -1: /* stream has gone bad */ case 0: /* stream is OK, just waiting for data (sockets only) */ case 1: /* stream has data waiting to be read */ }If NI_read_element() returns NULL and NI_stream_readcheck() then returns -1, then the stream will deliver no more data.
If not NULL, the value returned by NI_read_element() points to a NI_element or to a NI_group data structure. The program can determine which by
int tt = NI_element_type( nini ) ; if( tt == NI_ELEMENT_TYPE ){ /* data element */ NI_element *nel = (NI_element *) nini ; /* do something here */ } else if( tt == NI_GROUP_TYPE ){ /* group element */ NI_group *ngr = (NI_group *) nini ; /* do something else here, I suppose */ } else { /* this should never occur, unless nini==NULL (tt==-1) */ }
Checking for Available Input:
It can be useful to check if data is available to be read, in order to
avoid calling NI_read_element() and waiting for input when
there is no input. This function call does just that:
int cod ; cod = NI_stream_readcheck( ns , 1 ) ;The return value is positive if data can be read from the NI_stream, zero if no data can be read (but the stream is still good), and negative if the stream has failed in some way (e.g., the socket was closed at the other end). This function only checks if at least 1 data byte can be read from the I/O stream represented by ns; it does not check if valid NIML data is present. For socket streams, the second argument is the number of milliseconds the function should wait to check if data is present. For the other stream types, the function will return immediately, since there is no need to synchronize with another process.
Freeing Data from an NI_group or NI_element:
Function call NI_free_element( nini ) can be used to free all the
data from a NI_element or NI_group constructed by the
NIML functions.
Some application software may wish to move some or all of the data out of an NI_element prior to freeing the NI_element data structure itself. Instead of having to copy arrays, the application can simply copy any pointer from the NI_element to its own storage, and then set the pointer in the NI_element to NULL. For example:
NI_element *nel ; float *fp ; int nfp ; nel = NI_read_element(ns,1) ; /* read a group element */ nfp = nel->vec_len ; /* save length of data array */ fp = (float *) nel->vec[0] ; /* copy data array pointer */ nel->vec[0] = NULL ; /* clear pointer in nel */ NI_free_element( nel ) ; /* free everything else in ngr */This example skips all checking (e.g., if nel==NULL), and assumes that the data structure returned is a data element that contains a float vector as its first entry. In a real application, many more cases would need to be allowed for.
Attributes within an Element:
The application can certainly search for a given attribute name in an
element returned by NI_read_element(); however, there is
a utility function to do this. For example:
char *rhs = NI_get_attribute( nel , "idcode" ) ;will return NULL if there is no attribute with left hand side "idcode"; otherwise it returns a pointer to the right hand side value of the attribute. This pointer points into the element's data structure, so it should not be modified or free()-ed. If you need to make a copy of it, use the strdup() library function. NI_get_attribute() will work with both data and group elements.
Error Conditions:
The following is a discussion of how an implementation of the C API should
handle various error conditions.
<elvis ni_dimen=3 ni_type=fi> 3.2 1 4.7 2 3.1 </>would result in vec_len=3 but vec_filled=2, since the last row would only be half filled.
<vector ni_type=3f> 3.2 z66 7.1 </>decodes to the 3 numbers "3.2", "0.0", and "7.1". No indication of this error is made in the NI_element structure.
<junkola ni_type=f.S ni_dimen=3> 3.2 "This is 4.7 Bob 9.3 Dole </>The first String value starts with the "T" in "This" and ends with the blank after "Dole". The second and third float and String values will never be read.
Opening a NI_stream for Output:
The string "w" is supplied as the second argument to
NI_stream_open() when a program wants to write to the stream.
NI_stream ns = NI_stream_open( "str:" , "w" ) ; NI_element *nel ; NI_write_element( ns , nel , NI_TEXT_MODE ) ; printf("%s\n",NI_stream_getbuf(ns)) ;The function NI_stream_clearbuf(ns) can be used to erase the contents of the "str:" output buffer, so that new elements can be overwritten into that space.
Writing Elements to an NIML Output Stream:
The application
must first assemble a data element, or a group element containing
one or more data elements. Then the element is written to the
output stream with function NI_write_element():
NI_group *ngr ; NI_element *nel ; int nbe, nbg ; nbe = NI_write_element( ns , nel , NI_TEXT_MODE ) ; nbg = NI_write_element( ns , ngr , NI_BINARY_MODE ) ; (void) NI_write_element( ns , nel , NI_BASE64_MODE ) ;The return value is the number of bytes written to the output stream. If 0 is returned, then nothing was written (this will be the case if the output socket isn't yet connected at the reading end). If -1 is returned, then nothing was written and the NI_stream suffered an unrecoverable error (this will be the case if the output socket was connected but the connection was broken: e.g., if the reading application crashed).
For debugging purposes, it is often useful to write an element to standard output (in text form, of course). This can be done with the following code snippet:
NI_stream nstdout ; NI_element *nel ; /* get this from somewhere */ nstdout = NI_stream_open( "fd:1" , "w" ) ; NI_write_element( nstdout , nel , NI_TEXT_MODE ) ; NI_stream_close( nstdout ) ;
Output "str:" streams are always written in text mode, regardless of the third parameter to NI_write_element(). Also, data elements that contain String or Line components will always be written in text mode.
The data elements written by this API will
Assembling Data Elements:
It is perfectly possible for the application to assemble its own
data elements prior to calling one of the NI_put functions
to write them. However, the following routines are intended to make
it simpler to assemble a data element from data structures and arrays
already present in the application.
Create a data element:
NI_element *nel ; nel = NI_new_data_element( "elementname" , 6 ) ;Creates a data element with the given element name and with ni_dimen=6; the second argument specifies the length of the arrays added to the element, using function NI_add_column().
Add an array (column) to a data element, for data elements whose column length was specified as positive in NI_new_data_element().
float *fff ; NI_add_column( nel , NI_FLOAT , fff ) ;This adds a float column to the data element. The number of values pointed to by fff must match the number of values specified in NI_new_data_element(). This data is copied into the data structure pointed to by nel, and so can be over-written or deleted by the application after this function call. For the data types NI_STRING and NI_LINE, the third argument should be char **. Each NUL-terminated string from this array will be copied into the data element's internal storage.
Add a row to a data element, for data elements whose column length was specified as negative in NI_new_data_element().
typedef struct { int m,n; float f; char *s; } somestruct ; somestruct sss = { 3,2,1.7,"Fourier Transform" } ; nel = NI_new_data_element( "something" , -1 ) ; NI_define_rowmap_VA( nel , NI_INT , offsetof(somestruct,m) , NI_INT , offsetof(somestruct,n) , NI_FLOAT , offsetof(somestruct,f) , NI_STRING, offsetof(somestruct,s) , -1 ) ; NI_add_row( nel , &sss ) ; /* add 1st row of data */In this example, the struct type somestruct has four fields. Each field is defined to the element with a pair of int arguments to NI_define_rowmap_VA(). The first member of the pair is a type code, such as NI_INT. The second member of the pair is an offset into the struct type where the data lives. This offset is most conveniently computed using the C standard macro offsetof(). The final argument to NI_define_rowmap_VA() should be -1 (not a legal type code).
somestruct *ttt = malloc(sizeof(somestruct)*100) ; /** fill ttt[i].stuff for i=0..99 **/ NI_add_row( nel , 100 , sizeof(somestruct) , ttt ) ;This is more efficient than adding one row at a time, but simpler than converting each field (e.g., m in somestruct) in an array of structs to a column vector and then using NI_add_column().
NI_get_row( nel , rr , &sss ) ;where the int input rr is the row index from which the data should be extracted, and &sss is the pointer to the struct into which the data should be placed (at the offsets previously established by a call to NI_define_rowmap_VA()). Of course, in this case, you must call the NI_define_rowmap_VA() after you acquire the element from NI_read_element() and before you call NI_get_row(). The number and type of row components must agree with the number defined in the data element header. How your program ensures that is beyond the scope of this API (e.g., you could have an convention for various element names to be mapped to corresonding struct types).
int typ[4] , off[4] ; typ[0] = NI_INT ; off[0] = offsetof(somestruct,m) ; typ[1] = NI_INT ; off[1] = offsetof(somestruct,n) ; typ[2] = NI_FLOAT ; off[2] = offsetof(somestruct,f) ; typ[3] = NI_STRING ; off[3] = offsetof(somestruct,s) ; NI_define_rowmap_AR( nel , 4 , typ , off ) ;In fact, NI_define_rowmap_VA() just assembles type and offset arrays from its inputs, then calls NI_define_rowmap_AR() to do the actual rowmap setup inside the data element struct.
Specify dimensionality of element data (optional):
int nd[2] = { 2 , 3 } ; float del[2] = { 1.5 , 2.5 } ; float org[2] = { -1.3 , 3.3 } ; char *uni[2] = { "mm" , "parsec" } ; char *axi[2] = { "x" , "y" } ; NI_set_dimen ( nel , 2,nd ) ; /* ni_dimen="2,3" */ NI_set_delta ( nel , del ) ; /* ni_delta="1.5,2.5" */ NI_set_origin( nel , org ) ; /* ni_origin="-1.3,3.3" */ NI_set_units ( nel , uni ) ; /* ni_units="mm,parsec" */ NI_set_axes ( nel , axi ) ; /* ni_axes="x,y" */These functions set the indicated attributes. The first one that must be used is NI_set_dimen(), if the number of dimensions is more than 1. In the example, this function sets the size of each dimension; these values must multiply out to the same length as given in NI_new_data_element(). If NI_set_dimen() is not used, then the number of dimensions is taken as 1. The number of dimensions is needed for the other functions (NI_set_delta(), etc.) so that they can extract the correct number of values from their input arrays (del, etc.).
If you add data to the element using the NI_add_row() interface, then you cannot specify the dimension attributes as above until the last row is added. This restriction is so that the call to NI_set_dimen() can check if the supplied dimensions multiply out to the column length of the element.
Add other attributes (optional):
NI_set_attribute( nel , "attname" , "attvalue" ) ;
Assembling Group Elements:
A similar set of functions can be used to assemble a group element.
Create a new group element:
NI_group *ngr ; ngr = NI_new_group_element() ;
Add an element (data or group) to a group element:
NI_element *neladd ; NI_add_to_group( ngr , neladd ) ;
[21 Feb 2002] The first implementation of most of the C API exists. It is not very well tested yet.
Some limitations:
Miscellaneous utility functions not mentioned previously:
<ni_typedef ni_type=type ni_dimen=dimen/>If the input string dimen is NULL, then the ni_dimen attribute isn't set.
Base64 utility functions:
MD5 utility functions:
Function char * UNIQ_idcode(void) returns a globally unique identifier C string in newly malloc()-ed space. This string will fit in a 32 byte array (at most, including the NUL byte at the end). No two invocations of this function should return the same string. The characters in the string are alphanumeric 'a-z', 'A-Z', '0-9', with '-' and '_' possible as well. For example: "XYZ_qXxxypkMTmm_wSMh0-dEZA".
Internet host name functions:
The implementation is in two files: niml.h and niml.c. The test programs nimltest.c and nisurf.c can be used as samples. It has only been tried on Linux as yet (and not so much, either).
Many errors just fail silently. For example, characters that aren't understood on the RHS of a ni_type attribute are simply ignored.
Alas, this important section has yet to be written.
As mentioned earlier in an aside, simplicity is an important consideration, since NIML may end up being re-implemented in a number of languages. For this reason, it may be desirable to define a basic NIML specification which has some features removed. Some candidates for simplification/elimination (NIML Lite, also known as the Shrubbery):
"NI" or "ni" is to be pronounced as the word "knee", but with a high pitch and shortened. For the defining example of this, please see the film Monty Python and the Holy Grail.
"Niml" means "ants" in Arabic. It is also an acronym for