VAH: Variable Assignment Hierarchy

VAH is a data representation format intended to replace the use of XML for marshalling of application data.

Advantages of VAH over XML

VAH Example

This example illustrates essentially the full complexity of VAH:

person = {
  name = "Gregory B. Hudson" {
    nickname = "Greg"
  }
  email = "ghudson@mit.edu"
}
place = "MIT" {
  name = "Massachusetts Institute of Technology"
}
nothing =
That's it. You can see that a VAH document is a sequence of variable definitions; a variable definition can include a string value, a subtree of variable definitions, or both or neither of those things.

VAH Grammar

Here is a definition of the VAH grammar in ABNF, which is defined in RFC 2234. VAH documents are represented in UTF-8, which is defined in RFC 2279; the grammar below is written in terms of 32-bit scalar Unicode values.

vah-doc = *LWSP *var-def
var-def = varname *LWSP "=" *LWSP [value *LWSP] [subtree *LWSP]
subtree = "{" *LWSP *var-def "}"

varname = ALPHA *(ALPHA / DIGIT / "-" / ":")
value   = DQUOTE *(regchar / "\" DQUOTE / "\\" / CRLF) DQUOTE

; Whitespace and all printable Unicode characters other than double
; quote or backslash.
regchar = WSP / %x20 / %x22-5B / %x5D-7E
        / %x80-D7FF / %xE000-FFFD / %x10000-310FFFF
For the less formally inclined, here are English descriptions of each of the grammar elements. Whitespace may be included at any point in a VAH document without changing the meaning of a document, except inside a variable name or string value.

VAH Implementations in C

Library Implementation

A complete library implementation, suitable for pretty much any application, can be found here, in 335 lines of C code. This implementation is in the public domain. The library uses an interface similar to expat's.

Sample Implementation

However, because VAH is so simple, it isn't really necessary to go to the trouble of using a library. Depending on an application's requirements, a smaller implementation may be appropriate. The following implementation is included as an example of perhaps the simplest possible implementation: it relies on no infrastructure functions, allocates no memories, and defines no data structures. The result of a parse() call is a sequence of calls to handle_var(); the handle_var calls for the example near the start of this page would be:

handle_var("person", NULL, 0);
handle_var("name", "Gregory B. Hudson", 1);
handle_var("nickname", "Greg", 2);
handle_var("email", "ghudson@mit.edu", 1);
handle_var("place", "MIT", 0);
handle_var("name", "Massachusetts Institute of Technology", 1);
handle_var("nothing", NULL, 0);
Here is the implementation (which is also in the public domain):
/* Note: stomps on the contents of data. */
static int parse(char *data)
{
  char *p, *q, *name, *name_end, *value;
  int depth = 0;

  p = skip_ws(data);
  while (*p != '\0')
    {
      if (*p == '}')
	{
	  /* End of a subtree. */
	  if (depth == 0)
	    return -1;
	  depth--;
	  p = skip_ws(p + 1);
	  continue;
	}
      else if (!isalpha(*p))
	return -1;

      /* Read the name part of the next variable. */
      name = p;
      while (isalpha(*p) || *p == '-' || *p == ':')
	p++;
      name_end = p;
      p = skip_ws(p);
      if (*p != '=')
	return -1;
      *name_end = '\0';
      p = skip_ws(p + 1);

      if (*p == '\"')
	{
	  /* There's a value.  Read it. */
	  value = ++p;
	  q = p;
	  while (*p != '"')
	    {
	      if (*p == '\\' && (*(p + 1) == '\\' || *(p + 1) == '\"'))
		{
		  *q++ = *(p + 1);
		  p += 2;
		}
	      else if (*p == '\r' && *(p + 1) == '\n')
		{
		  *q++ = *p++;
		  *q++ = *p++;
		}
	      else if (*p == '\177' || (unsigned char) *p < ' ')
		return -1;
	      else
		*q++ = *p++;
	    }
	  *q = '\0';
	  p = skip_ws(p + 1);
	}
      else
	value = NULL;

      handle_var(name, value, depth);

      if (*p == '{')
	{
	  /* There's a subtree; note the increase in depth. */
	  depth++;
	  p = skip_ws(p + 1);
	}
    }

  return (depth != 0) ? -1 : 0;
}