Issues to deal with ------------------- 1. State should contain instr and info pointer for faster access. Store and retrieve when we push and pop the intern stack. Fume internal documentation Strings ------- As often as possible, we use one of the following string structures to handle strings: String Shared_string Stored_string Substring A String is just a counted string. A Shared_string is a String with a reference count to keep track of the number of copies of it that are stored. A Stored_string is a pointer to a Shared_string, stored in a hash table. A Substring is a pointer to a Shared_string together with start and end indices. All four types of strings are handled by string.c. Shared_string and Stored_string are reference-counted objects. Substrings are only used during interpretation. Note that Strings are *not* null-terminated, and we should never read or write beyond the end of a string. We go back to using the standard type of C string, usually referred to as a 'buf', in three cases: (1) functions that accept string literal arguments, generally communications functions like error handlers, (2) the format_buf() function, which is usually used as an intermediary in calling functions that accept string literal arguments, and (3) in the compiler, where we use a pile malloc which makes it inconvenient to use Strings. The compiler also makes use of a lot of short strings, making the reduced overhead of a null-terminated buffer a more noticeable win, while the speed loss in taking the length of the string is unimportant. Memory management ----------------- Many objects in the Fume code use reference counts for memory management. Each reference-counted object has two functions associated with it, one to indicate that a pointer to the object has been copied, and one to indicate that a pointer to the object has been destroyed. These functions are called register__copy() and discard_(), where is the type of the object without the initial caps. For instance, Shared_string reference counts are managed by register_shared_string_copy() and discard_shared_string(). Functions that return pointers to reference-counted objects should register a copy before returning them. Functions that accept pointers to reference-counted objects as arguments should register a copy of them if they are stored, and should never discard them. If we can prove that, during the time which we use an object or part of an object, there will always be at least one registered copy of it stored somewhere else, we do not need to register and discard it. We say the object is "anchored". For instance, see the note in data.h on the .u.prop.name element of the Lval structure. Compiler -------- An FC program starts out as text in a program object. Compilation originates in editor.c, as compiling is an editor command. The editor calls compile() in fc.y. compile() calls yyparse(), generated by fc.y, to do the parsing. yyparse() calls yylex(), generated by fc.lex, to obtain a stream of tokens. yylex() uses copy_text() in editor.c to obtain text from the current editor. yyparse() parses the tokens obtained from yylex() and translates them into a parse tree using the constructors in ftree.c. Only ftree.c knows the internal structure of parse nodes, and only fc.y makes any use of these constructor functions. yyparse() returns with the parse tree in the_tree. If there has been a syntax error at this point, compile() aborts; otherwise, it passes the parse tree to generate_fs_prog() in ftree.c to create a stack machine program. ftree.c walks the parse tree, calling the coding functions in fstack.c to create the stack machine program. If there is a semantics error in this stage of the compilation, generate_fs_prog() returns NULL; otherwise, it calls scan_code() in fstack.c to finish up the coding process and return the finished program. generate_fs_prog() returns this program to compile(), which returns it back to editor.c, which installs the compiled program into the program object. Only fstack.c knows the structure of the program object. The following is a schematic of the flow of information and control through the compiler: ___________ | | copy_text() | editor.c |<--------------------| |___________|-------------------| | | ^ Characters | | | | | | | | | | compile() | | Program ojbect | | | | | | | | | | V | V | Constructor functions ___________ yylex() ___________ |------------------| |------------->| | | |--------------->| fc.y |<-------------| fc.lex | | | Parse nodes |___________| Tokens |___________| | | | ^ | | | | | | | | | | generate_fs_prog() | | Program object | | (with parse tree) | | | | | | | | V | | | ___________ | |----------------| | |----------------->| ftree.c |-----------| |___________| | | ^ | | | | | | Program | scan_code() | | object | Coding functions | | | | | | V | | ___________ | | | | | fstack.c |<----------| |___________| FS Programs ----------- The internals of FS programs are handled entirely by fstack.c. An FS program consists of an instruction sequence, an information sequence, and some information about the first function to be run by the program. The instruction sequence is used to determine which of the instruction functions in fstack.c will be run at each stage of program intepretation. An instruction is a character, which is treated as a bit field eight bits long. The six least significant bits determine the instruction type; the two most significant bits are flags. Most instructions that produce data values treat the first bit as an 'lval' flag and the second bit as a 'stack' flag. The lval flag indicates that the data value should be placed in the top lval on the lvals stack, which should be popped. The data flag indicates that the data should be pushed onto the data stack. Some instructions take arguments from the information sequence. This is a sequence of Info unions. Jump instructions must set the information pointer as well as the instruction pointer. Interpreter ----------- The interpreter is contained in fstack.c. All information necessary to determine the next action to perform in an FS program is contained in a structure State. A State consists of six stacks, an instruction and information pointer, and some information pertaining to suspended states. The six stacks are to hold data, lvals, variables, generating call frames, internal call frames, and program call frames. The top of the internal call stack determines the function that is currently being executed.