[Commit] mint/doc mintsec.tex,1.1,1.2
James LaMar
commit at keithp.com
Tue Nov 30 22:44:33 PST 2004
Committed by: jlamar
Update of /local/src/CVS/mint/doc
In directory home.keithp.com:/tmp/cvs-serv12187
Modified Files:
mintsec.tex
Log Message:
Index: mintsec.tex
===================================================================
RCS file: /local/src/CVS/mint/doc/mintsec.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -d -r1.1 -r1.2
--- mintsec.tex 29 Nov 2004 19:59:06 -0000 1.1
+++ mintsec.tex 1 Dec 2004 06:44:30 -0000 1.2
@@ -101,4 +101,85 @@
MINT was able to generate parsers for the regular expression and BNF
syntaxes fairly easily. Attempting to generate a parse table for a
-small Nickle subset proved to be more difficult, however.
+small Nickle subset proved to be more difficult, however. The
+implementation of the parser generator was fairly na\"ive -- as
+this particular version was intended to be a proof of concept more
+than an efficient implementation of the canoncial LR(1) parsing
+algorithm. However, the generated parser and lexer were more than
+fast enough for general use.
+
+\section{Translators in MINT}
+
+A few simple examples should be adequate to demonstrate the essentials
+of the MINT parser generator language.
+
+Each grammar file is divided into three sections -- tokens,
+precedence, and rules. The tokens section specifies an identifier for
+each token, along with a regular expression specifying the pattern
+of input associated with that token. In short, the token section
+compactly represents a lexer.
+
+The precedence section allows ambiguous grammars to be specified and
+disambiguated by the user by specifying the precedence and assocativity
+of tokens in the grammar, similar to the YACC ``\%left'' ``\%right''
+and ``\%nonassoc'' directives.
+
+The rules section represents the productions of the context free grammar.
+The syntax of these rules may seem a little confusing at first. A rule
+consists of a nonterminal which is tagged with some kind of descriptor, and
+zero or more symbols which represent a possible sentence the nonterminal
+can derive. Each symbol may be ``tagged'' meaning such that its value
+will appear in the syntax tree resulting from a successful parse tagged by a identifier.
+Untagged values are not included in the syntax tree. This
+labelling allows the user to write syntax tree walkers which are
+readable and since the tags are not positional, allows more flexibility
+in modifying the grammar. The omission of the untagged values reduces
+redundant syntactic clutter.
+
+A few examples may clarify things:
+
+\begin{verbatim}
+
+#
+# The canonical expression grammar (with precedence)
+#
+
+#
+# Token section
+#
+tokens:
+
+token plus /\+/
+token mult /\*/
+token lparen /\(/
+token rparen /\)/
+token id /[A-Za-z0-9]/
+skip whitespace /[ ]/ # Skip whitespace
+
+#
+# Operator precedence
+#
+precedence:
+
+left plus
+left mult
+
+#
+# Rules
+#
+rules:
+
+rule (E : plusexpr) -> (E : op1) plus (E : op2)
+rule (E : multexpr) -> (E : op1) mult (E : op2)
+rule (E : id) -> (id : name)
+rule (E : parenexpr) -> lparen (E : sub) rparen
+
+\end{verbatim}
+
+In this example, there is one nonterminal (E) with four productions.
+Addition and multiplication are binary operators in which the left
+operand is tagged ``op1'' and the right is tagged ``op2''. Both the
+additon symbol and the multiplication symbol are untagged -- so they
+will not be included in the resulting syntax tree.
+
+
More information about the Commit
mailing list