Deep Magic with Lex and Yacc, part 2

*****
Some non-regular cases
Variant lexing rules in special contexts
Expensive cases

*****

redeclaring yylex with YY_DECL
#define YYDECL float yylex (a,b) int a,b;

*****

yyrestart (FILE *): set up yyin for scanning from this file

lexer calls
  YY_INPUT(char *buf, size_t result, int maxsize)
to get input.

Upon EOF, scanner calls `yywrap'.  If it returns 0, then scanning
continues; if it returns 1, then scanning halts.

******

Multiple input buffers:
  YY_BUFFER_STATE yy_create_buffer (FILE * foo, int size)
size should be YY_BUF_SIZE.

yy_switch_to_buffer
yy_delete_buffer
yy_flush_buffer

YY_CURRENT_BUFFER macro

scanning in-memory strings:
 yy_scan_string (char *)
 yy_scan_bytes (char *, int)
sets up buffer and switches to it; they return YY_BUFFER_STATE.  
These copy the string.
 yy_scan_buffer (char *, yy_size_t) does not copy the buffer.
   The last two bytes must be YY_END_OF_BUFFER_CHAR.

*****

name conditions with
%s or %x in the front.  inclusive/exclusive.  

Start conditions are activated with BEGIN(NAME). 

<CONDITION-NAME>regexp
<*> matches ever condition

Series of start conditions can be indicated with
<SC,...>{
foo
bar
baz
}

special condition INITIAL (also 0)

YY_START names the current condition

*****


REJECT -> proceed to next-best match (this is a branch) VERY expensive
yymore() -> next token should be appended to yytext, not replace

YY_FLUSH_BUFFER (alias for yy_flush_buffer)

Special actions:

yyless(n) -> push back all but first N tokens
unput(c) -> push back character C
input() -> read one character from stream
yytermite() -> equivalent to return 0.


*****

How to scan Lisp.

How to scan C.

*****

Here is a scanner which recognizes (and discards) C comments while maintaining a count of the current input
line. 

%x comment
%%
        int line_num = 1;

"/*"         BEGIN(comment);

<comment>[^*\n]*        /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
<comment>\n             ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);

******

HW: 
1: extend wc to read multiple files
2: tokenize calc
Extra: Scan C.
