[Commit] mint-2004b tool.tex,1.2,1.3

Wed Dec 1 07:05:29 PST 2004

Committed by: jlamar

Update of /local/src/CVS/mint-2004b
In directory home.keithp.com:/tmp/cvs-serv19254

Modified Files:
	tool.tex 
Log Message:


Index: tool.tex
===================================================================
RCS file: /local/src/CVS/mint-2004b/tool.tex,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3

--- tool.tex	1 Dec 2004 08:59:17 -0000	1.2
+++ tool.tex	1 Dec 2004 15:05:26 -0000	1.3
@@ -145,4 +145,90 @@
 
 MINT was able to generate parsers for the regular expression and BNF
 syntaxes fairly easily. Attempting to generate a parse table for a
-small Nickle subset proved to be more difficult, however.
+small Nickle subset proved to be more difficult, however. The
+implementation of the parser generator was fairly na\"ive -- as 
+this particular version was intended to be a proof of concept more
+than an efficient implementation of the canoncial LR(1) parsing
+algorithm.  However, the generated parser and lexer were more than
+fast enough for general use.
+
+Barring performance problems, Mint is robust enough to serve its
+originally intended goal of being a cleaner, language independent
+alternative to the more traditional code-generating parser
+generator architecture.
+
+\section{Translators in MINT}
+
+A simple examples should be adequate to demonstrate the essentials
+of the MINT parser generator language.  More example grammars, including
+the grammar for the parser generator itself appear in the appendix.
+
+Each grammar file is divided into three sections -- tokens,
+precedence, and rules.  The tokens section specifies an identifier for
+each token, along with a regular expression specifying the pattern
+of input associated with that token.  In short, the token section
+compactly represents a lexer. 
+
+The precedence section allows ambiguous grammars to be specified and
+disambiguated by the user by specifying the precedence and assocativity
+of tokens in the grammar, similar to the YACC ``\%left'' ``\%right''
+and ``\%nonassoc'' directives.
+
+The rules section represents the productions of the context free grammar.
+The syntax of these rules may seem a little confusing at first.   A rule
+consists of a nonterminal which is tagged with some kind of descriptor, and
+zero or more symbols which represent a possible sentence the nonterminal
+can derive.  Each symbol may be ``tagged'' meaning such that its value 
+will appear in the syntax tree resulting from a successful parse tagged by a identifier.
+Untagged values are not included in the syntax tree.  This
+labelling allows the user to write syntax tree walkers which are
+readable and since the tags are not positional, allows more flexibility
+in modifying the grammar.  The omission of the untagged values reduces
+redundant syntactic clutter.
+
+A few examples may clarify things:
+
+\begin{verbatim}
+
+#
+# The canonical expression grammar (with precedence)
+#
+
+#
+# Token section
+#
+tokens:
+
+token plus       /\+/
+token mult       /\*/
+token lparen     /\(/
+token rparen     /\)/
+token id         /[A-Za-z0-9]/
+skip whitespace  /[ ]/                # Skip whitespace
+
+#
+# Operator precedence
+#
+precedence:
+
+left plus
+left mult
+
+#
+# Rules
+#
+rules:
+
+rule (E : plusexpr) -> (E : op1) plus (E : op2)
+rule (E : multexpr) -> (E : op1) mult (E : op2)
+rule (E : id) -> (id : name)
+rule (E : parenexpr) -> lparen (E : sub) rparen
+
+\end{verbatim}
+
+In this example, there is one nonterminal (E) with four productions.
+Addition and multiplication are binary operators in which the left
+operand is tagged ``op1'' and the right is tagged ``op2''.  Both the
+additon symbol and the multiplication symbol are untagged -- so they
+will not be included in the resulting syntax tree.
+