VAH: Variable Assignment Hierarchy

VAH is a data representation format intended to replace the use of XML for marshalling of application data.

Advantages of VAH over XML

Simplicity - VAH can be fully understood in five minutes and can be implemented in 70 or so lines of C code. For an application, a VAH library can present a simpler interface than XML, since there are no redundant concepts such as attributes vs. cdata, processor directives, etc.

Readability - In most cases, a VAH document is easier to read than an XML document.

Conciseness - VAH is more concise than XML, since there is no gratuitous repetition of element names.

VAH Example

This example illustrates essentially the full complexity of VAH:

person = {
  name = "Gregory B. Hudson" {
    nickname = "Greg"
  }
  email = "ghudson@mit.edu"
}
place = "MIT" {
  name = "Massachusetts Institute of Technology"
}
nothing =

That's it. You can see that a VAH document is a sequence of variable definitions; a variable definition can include a string value, a subtree of variable definitions, or both or neither of those things.

VAH Grammar

Here is a definition of the VAH grammar in ABNF, which is defined in RFC 2234. VAH documents are represented in UTF-8, which is defined in RFC 2279; the grammar below is written in terms of 32-bit scalar Unicode values.

vah-doc = *LWSP *var-def
var-def = varname *LWSP "=" *LWSP [value *LWSP] [subtree *LWSP]
subtree = "{" *LWSP *var-def "}"

varname = ALPHA *(ALPHA / DIGIT / "-" / ":")
value   = DQUOTE *(regchar / "\" DQUOTE / "\\" / CRLF) DQUOTE

; Whitespace and all printable Unicode characters other than double
; quote or backslash.
regchar = WSP / %x20 / %x22-5B / %x5D-7E
        / %x80-D7FF / %xE000-FFFD / %x10000-310FFFF

For the less formally inclined, here are English descriptions of each of the grammar elements. Whitespace may be included at any point in a VAH document without changing the meaning of a document, except inside a variable name or string value.

vah-doc: A VAH document is a sequence of zero or more variable definitions.

var-def: A variable definition consists of a variable name, an equals sign, an optional string value, and an optional subtree.

subtree: A subtree consists of a left brace, zero or more variable definitions, and a right brace.

varname: A variable name is a sequence of ASCII letters, digits, dashes, and colons, starting with a letter.

value: A string value is a double quote, zero or more regular Unicode characters, and another double quote. To represent a double quote or backslash in a string value, precede it with a backslash; for example, the string value a"b\c must be written as "a\"b\\c".

regchar: A regular character is any printable Unicode character or whitespace, except for a backslash or double quote (since those characters must be quoted within a value).

VAH Implementations in C

Library Implementation

A complete library implementation, suitable for pretty much any application, can be found here, in 335 lines of C code. This implementation is in the public domain. The library uses an interface similar to expat's.

Sample Implementation

However, because VAH is so simple, it isn't really necessary to go to the trouble of using a library. Depending on an application's requirements, a smaller implementation may be appropriate. The following implementation is included as an example of perhaps the simplest possible implementation: it relies on no infrastructure functions, allocates no memories, and defines no data structures. The result of a parse() call is a sequence of calls to handle_var(); the handle_var calls for the example near the start of this page would be:

handle_var("person", NULL, 0);
handle_var("name", "Gregory B. Hudson", 1);
handle_var("nickname", "Greg", 2);
handle_var("email", "ghudson@mit.edu", 1);
handle_var("place", "MIT", 0);
handle_var("name", "Massachusetts Institute of Technology", 1);
handle_var("nothing", NULL, 0);

Here is the implementation (which is also in the public domain):

/* Note: stomps on the contents of data. */
static int parse(char *data)
{
  char *p, *q, *name, *name_end, *value;
  int depth = 0;

  p = skip_ws(data);
  while (*p != '\0')
    {
      if (*p == '}')
	{
	  /* End of a subtree. */
	  if (depth == 0)
	    return -1;
	  depth--;
	  p = skip_ws(p + 1);
	  continue;
	}
      else if (!isalpha(*p))
	return -1;

      /* Read the name part of the next variable. */
      name = p;
      while (isalpha(*p) || *p == '-' || *p == ':')
	p++;
      name_end = p;
      p = skip_ws(p);
      if (*p != '=')
	return -1;
      *name_end = '\0';
      p = skip_ws(p + 1);

      if (*p == '\"')
	{
	  /* There's a value.  Read it. */
	  value = ++p;
	  q = p;
	  while (*p != '"')
	    {
	      if (*p == '\\' && (*(p + 1) == '\\' || *(p + 1) == '\"'))
		{
		  *q++ = *(p + 1);
		  p += 2;
		}
	      else if (*p == '\r' && *(p + 1) == '\n')
		{
		  *q++ = *p++;
		  *q++ = *p++;
		}
	      else if (*p == '\177' || (unsigned char) *p < ' ')
		return -1;
	      else
		*q++ = *p++;
	    }
	  *q = '\0';
	  p = skip_ws(p + 1);
	}
      else
	value = NULL;

      handle_var(name, value, depth);

      if (*p == '{')
	{
	  /* There's a subtree; note the increase in depth. */
	  depth++;
	  p = skip_ws(p + 1);
	}
    }

  return (depth != 0) ? -1 : 0;
}