Deep Magic with Lex and Yacc, part 1.

Course introduction.

Program-generating programs.

  foo.l produces lex.yy.c
  lex.yy.c contains a function yylex

  you must link with -lfl in order to define some functions that yylex uses.

Regular expressions.

  C.S. definition: character, disjunction, concatenation

  x
  .		(not newline)
  [xyz]
  [abj-oZ]
  [^A-Z]
  [^A-Z\n]
  F*
  F+
  F?
  F{2,5}
  F{2,}
  F{4}
* {name}
  "[xyz]\"foo"
  \x for a, b, f, n, r, t, v
  \x for anything else
  \0
  \123
  \x2a
  (F)
  FB
  F|B
  F/B
  ^F
  F$	(F/\n)
* <B>F	
* <A,B,C>F  (disjunction)
* <*>F
  <<EOF>>
* <A,B><<EOF>>

  precedence
 
  [:alnum: :alpha: :print:, etc.]

  only one trailing context
  dangerous trailing context (zx*\xy*)
  
Input file syntax

  definitions
  %%
  rules
  %%
  user code

  1. Definitions are `name  definition'
  2. Rules are `pattern action'
  3. User code can be included with %{ and %} in definitions and rules.
     (Header files, etc.)

  Actions:
    missing is do nothing
    if no pattern is matched, default rule is used: "copy to output"

  Syntax:
    action is text through \n, unless it contains a {, then it's
      text to matching brace.
    action of | means "same as next rule"
    special action ECHO

  yytext is the string which matched
  yyleng is the length that matched

  %array

  call yylex to invoke parser
  parser reads from yyin (a stdio text stream; defaults to stdin
  yyout

  <<EOF>> rules are special

Use of scanners in filters.
  0. %%
  1. replace `username' with login name.
     %%
     username  printf ("%s", getlogin());
  2. jive and valspeak

Use of scanners in text processors.
  1. Count occurrences of a particular word
  2. anything you use awk for

Some regexps of interest:
  1. Pascal strings
  2. C++ comments
       

Homework:

1. Write a program that works like wc.  Read from standard input, and
produce a count of lines, characters, and words.  Numbers should be
exactly those that wc produces.  To test your program, do things like
text files, computer programs, binary files, etc., and make sure your
output is exactly that of wc.  After this works, add options -c, -l,
and -w, and accept a filename on the command line.
<p>
2. Write a scanner for calculator input.  It's not particularly
important what the output is (at this stage).  But you should
recognize patterns at least for integer literals, identifiers, some
operators (like +, -, *, /, etc.; match ** for exponentiation too),
parentheses, and square brackets.

