[Commit] mint/doc mintsec.tex,1.1,1.2

James LaMar commit at keithp.com
Tue Nov 30 22:44:33 PST 2004

Committed by: jlamar

Update of /local/src/CVS/mint/doc
In directory home.keithp.com:/tmp/cvs-serv12187

Modified Files:
Log Message:

Index: mintsec.tex
RCS file: /local/src/CVS/mint/doc/mintsec.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -d -r1.1 -r1.2
--- mintsec.tex	29 Nov 2004 19:59:06 -0000	1.1
+++ mintsec.tex	1 Dec 2004 06:44:30 -0000	1.2
@@ -101,4 +101,85 @@
 MINT was able to generate parsers for the regular expression and BNF
 syntaxes fairly easily. Attempting to generate a parse table for a
-small Nickle subset proved to be more difficult, however.
+small Nickle subset proved to be more difficult, however.  The
+implementation of the parser generator was fairly na\"ive -- as 
+this particular version was intended to be a proof of concept more
+than an efficient implementation of the canoncial LR(1) parsing
+algorithm.  However, the generated parser and lexer were more than
+fast enough for general use.
+\section{Translators in MINT}
+A few simple examples should be adequate to demonstrate the essentials
+of the MINT parser generator language.
+Each grammar file is divided into three sections -- tokens,
+precedence, and rules.  The tokens section specifies an identifier for
+each token, along with a regular expression specifying the pattern
+of input associated with that token.  In short, the token section
+compactly represents a lexer. 
+The precedence section allows ambiguous grammars to be specified and
+disambiguated by the user by specifying the precedence and assocativity
+of tokens in the grammar, similar to the YACC ``\%left'' ``\%right''
+and ``\%nonassoc'' directives.
+The rules section represents the productions of the context free grammar.
+The syntax of these rules may seem a little confusing at first.   A rule
+consists of a nonterminal which is tagged with some kind of descriptor, and
+zero or more symbols which represent a possible sentence the nonterminal
+can derive.  Each symbol may be ``tagged'' meaning such that its value 
+will appear in the syntax tree resulting from a successful parse tagged by a identifier.
+Untagged values are not included in the syntax tree.  This
+labelling allows the user to write syntax tree walkers which are
+readable and since the tags are not positional, allows more flexibility
+in modifying the grammar.  The omission of the untagged values reduces
+redundant syntactic clutter.
+A few examples may clarify things:
+# The canonical expression grammar (with precedence)
+# Token section
+token plus       /\+/
+token mult       /\*/
+token lparen     /\(/
+token rparen     /\)/
+token id         /[A-Za-z0-9]/
+skip whitespace  /[ ]/                # Skip whitespace
+# Operator precedence
+left plus
+left mult
+# Rules
+rule (E : plusexpr) -> (E : op1) plus (E : op2)
+rule (E : multexpr) -> (E : op1) mult (E : op2)
+rule (E : id) -> (id : name)
+rule (E : parenexpr) -> lparen (E : sub) rparen
+In this example, there is one nonterminal (E) with four productions.
+Addition and multiplication are binary operators in which the left
+operand is tagged ``op1'' and the right is tagged ``op2''.  Both the
+additon symbol and the multiplication symbol are untagged -- so they
+will not be included in the resulting syntax tree.

More information about the Commit mailing list