[Commit] mint-2004b conclusions.tex, NONE, 1.1 intro.tex, 1.1, 1.2 mint.tex, 1.2, 1.3 nickle.tex, 1.2, 1.3 tool.tex, 1.1.1.1, 1.2

Wed Dec 1 00:59:20 PST 2004

Committed by: bart

Update of /local/src/CVS/mint-2004b
In directory home.keithp.com:/tmp/cvs-serv14297

Modified Files:
	intro.tex mint.tex nickle.tex tool.tex 
Added Files:
	conclusions.tex 
Log Message:
Added and edited.

--- NEW FILE: conclusions.tex ---

Index: intro.tex
===================================================================
RCS file: /local/src/CVS/mint-2004b/intro.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -d -r1.1 -r1.2
--- intro.tex	1 Dec 2004 08:08:17 -0000	1.1
+++ intro.tex	1 Dec 2004 08:59:17 -0000	1.2
@@ -17,7 +17,7 @@
 consideration.  The development of the MINT parser
 generation tool is motivated by difficulties
 presented by the implementation of the Nickle programming
-language.  It is also motivated by a variety of perceived
+language~\cite{nickle}.  It is also motivated by a variety of perceived
 opportunities for improvements (of varying degrees of
 importance) in the state-of-the-art.  In particular, MINT is
 intended to be a friendly platform for future languages to
@@ -41,7 +41,13 @@
 appropriately is not available with YACC.
 
 Fortunately, Nickle is also a nice language for implementing
-automated language recognizers.
+automated language recognizers.  Nickle is more fully
+discussed in section~\ref{sec-nickle}.  Briefly, Nickle
+layers C-like surface syntax and semantics atop the core of
+a modern mixed functional/imperative language.  The language
+has been carefully defined over a 20 year period to make
+algorithmic prototyping easier.  As such, Nickle should
+represent a good opportunity to achieve the state of the art.
 
 To achieve this goal, the MINT
 architecture has been factored in a fashion developed some

Index: mint.tex
===================================================================
RCS file: /local/src/CVS/mint-2004b/mint.tex,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3
--- mint.tex	1 Dec 2004 08:08:17 -0000	1.2
+++ mint.tex	1 Dec 2004 08:59:17 -0000	1.3
@@ -37,6 +37,7 @@
 \input{tool}
 \input{lola}
 \input{nickle}
+\input{conclusions}
 
 \bibliographystyle{entcs}
 \bibliography{mint}

Index: nickle.tex
===================================================================
RCS file: /local/src/CVS/mint-2004b/nickle.tex,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3
--- nickle.tex	1 Dec 2004 08:08:17 -0000	1.2
+++ nickle.tex	1 Dec 2004 08:59:17 -0000	1.3
@@ -1,6 +1,6 @@
-\section{Nickle}
+\section{Nickle}\label{sec-nickle}
 
-The Nickle Programming Language\cite{nickle} was developed as a C-like
+The Nickle Programming Language was developed as a C-like
 procedural imperative language with strong types and several useful notions
 adopted from the functional programming community.  Originally targetted
 twenty years ago as an algorithmic prototyping language, it has become more

Index: tool.tex
===================================================================
RCS file: /local/src/CVS/mint-2004b/tool.tex,v
retrieving revision 1.1.1.1
retrieving revision 1.2
diff -u -d -r1.1.1.1 -r1.2
--- tool.tex	1 Dec 2004 06:47:02 -0000	1.1.1.1
+++ tool.tex	1 Dec 2004 08:59:17 -0000	1.2
@@ -15,24 +15,43 @@
 There are three elements in the design of MINT that differentiate it
 from traditional parser and lexer generators such as YACC and LEX.
 
-First, MINT implements a canonical LR(1) parser rather than SLR or
-LALR.  Processor speeds and memory capacities have grown enough in the
-twenty years since parsing techniques were first developed that it was
-deemed unecessary to implement a more complex algorithm in order to
-achieve a (possibly modest) table size reduction.
+First, MINT implements a canonical LR(1) parser rather than
+SLR or LALR.  When tools such as LEX and YACC were first
+developed, insufficient resources were available to
+construct or utilize canonical LR(1) parsers in most cases.
+Processor speeds and memory capacities have
+grown dramatically in the twenty years since.
+It seems unecessary to implement a
+more complex algorithm just to achieve a reduction in table
+size and parser generation time.  In addition, as noted
+previously, the additional discrimination power of LR(1)
+parsing is expected to permit simplification of some parts
+of the Nickle grammar.
 
-Second, rather than targetting a specific output language such as C,
-MINT generates language independent lexing and parsing tables, which
-can then be loaded by lexer/parser libraries (the MINT runtime)
-implemented in the desired target language.
+Second, rather than targetting a specific output language
+such as C, MINT generates language independent lexing and
+parsing tables.  These tables can then be loaded by
+lexer/parser libraries (the MINT runtime) implemented in the
+desired target language.  The lexer and parser simple to
+write in an arbitrary language, but need to be written in a
+specific target language.  The table generators are
+difficult to write, and would like written only infrequently
+in a high-level language.  The MINT architecture decouples
+table generation from processing, enabling simultaneous
+accomplishment of these goals.
 
 Third, MINT does not allow one to execute arbitary actions during the
 parsing of input. Instead, MINT constructs an abstract syntax
-tree. Grammar rules may additionally allow the special action of
-changing the lexer state (also known as start condition in flex-like
-parser generators.) This restriction on actions is due to the
-difficulty of specifying arbitrary actions when the target language is
-unknown, as it is in the MINT architecture.
+tree, consisting in essence of a specified subset of the full
+parse tree. The one action grammar rules may take is to
+change the lexer state (exclusive start condition, in FLEX terminology).
+This restriction is due to the
+impossiblity of specifying arbitrary actions in an unknown
+target language: the MINT architecture implies that the
+target language is in general unknown.  Changing the lexer
+state (so-called ``lexer feedback'') is necessary for some
+recognizers: it also seems to be sufficient for most
+languages.
 
 \begin{figure*}
 \begin{center}
@@ -42,60 +61,85 @@
 \end{center}
 \end{figure*}
 
-The design of MINT is split into two main sections: the MINT parser
-and lexer generator, implemented in Nickle, and the MINT runtime,
-which is currently implemented in Nickle but may be extended to other
-languages. The output parsing and lexing tables are generated in a
+As noted previously, MINT consists of two main components.  The MINT parser
+and lexer generator is implemented in Nickle.  The MINT runtime
+is currently implemented in Nickle as well.  However,
+we expect to provide this runtime in  other
+languages as well. In particular, a C MINT runtime will be
+constructed for Nickle development.
+
+The output parsing and lexing tables are generated in a
 simple line-oriented ASCII format that is easily parsed. Other formats
 such as XML were considered, but dismissed as overly complicated for
-this initial proof-of-concept application. Additionally, since the
-lexer passes tokens to the parser, they need to agree on a standard
+this initial proof-of-concept application. Since the
+lexer passes tokens to the parser, they also need to agree on a standard
 set of token specifications.
 
-As stated above, MINT generates an abstract syntax tree after
-successfully parsing a string. The advantage of this is that MINT can
-generate parsers targetting arbitrary back-ends written in multiple
-languages.
+As stated above, MINT generates an abstract syntax tree
+after successfully parsing a string.  This syntax tree is
+not particularly abstract, however: it consists of a subtree
+of the parse tree, labeled according to a labeling given in
+the grammar.  One advantage of this approach is that it is
+easy to specify in the MINT descriptions what sort of tree
+will be output.  A serious disadvantage is the lack of
+sufficient abstraction, which leads to tight coupling
+between the grammar and the backend code which walks the
+resulting trees.  Changes to the tree specification to help
+deal with this problem are currently under active consideration.
 
-\subsection{Implementation}
+\subsection{Lexing}
 
-The lexer generator uses Thompson's construction and an NFA to DFA
-conversion to create the lexing table. Minimum-state DFA and table
-compaction methods were not used.
+The lexer generator is fairly straightforward.  It uses
+Thompson's construction and straight NFA to DFA conversion
+to create the lexer table. Minimum-state DFA and table
+compaction methods are not currently used, but could be
+easily added if needed.
 
-The lexer is a standard DFA based algorithm with three enhancements:
-lexer states (aka ``start conditions''), lookahead patterns, and
+The lexer is a standard DFA-based algorithm with three enhancements:
+lexer states (aka ``exclusive start conditions''), lookahead patterns, and
 arbitrary size character sets. Because no code generation is performed
 at at lexer generation time, the lexer cannot perform arbitrary
 actions upon a rule match. The only allowed actions are to return a
 token or ``none.'' Additionally the lexer may change its own state
 upon matching a lexeme.
 
-MINT implements lexer states that are similar to the ``start
-condition'' present in FLEX. At any point in time, the lexer is in one
+MINT implements lexer states that are similar to the
+``exclusive start
+conditions'' present in FLEX. At any point in time, the lexer is in one
 and only one state. Lexer matching rules only match if the lexer is in
-the correct state. Rules may match in more than one state. This is
-typically used to implement different modes of lexing-- for example,
-string mode, or comment mode. The lexer's state may be changed
-internally, by the lexer's own rule actions, or externally by a call
-from the parser.
+the correct state. Rules may match in more than one
+state. Lexer states are
+typically used to implement different modes of lexing---for example,
+inside strings, or inside comments. The lexer's state may be changed
+internally by the lexer's own rule actions, or externally by
+the parser.
 
 Lookahead patterns are implemented subject to the same restrictions as
-for FLEX. That is, lookahead patterns such as ``ab*/b'' are not
-allowed. Finally, the lexer can handle character sets of arbitrary
-size, and this should be particularly useful with Unicode. Because a
-lookup table would grow impractically large with such a large
-character set, MINT's lexer implements the lexing table as an
-adjacency list and transitions are matched based on character ranges.
+for FLEX. That is, lookahead patterns such as ``ab*/b'' are
+not currently
+allowed. (However, handling for these sorts of patterns is
+expected to be added shortly.)  The lexer can handle character sets of arbitrary
+size. This is intended primarily for Unicode support. To
+control the size of the resulting tables,
+MINT's lexer implements the lexing table as an
+adjacency list, with transitions matched based on character ranges.
+
+\subsection{Parsing}
 
 The LR(1) parser generator algorithm is taken straight from the
-classic text by Aho, Sethi, and Ullman. We decided to implement LR(1)
-as opposed to SLR or LALR due to the simplicity of this algorithm and
-the fact that memory sizes are large enough now that the compactness
-of the tables is not so much of an issue. [cite table size statistics
-for different algorithms] The parser generator is additionally
-augmented with precedence rules to resolve shift/reduce conflicts.
+classic text by Aho, Sethi, and Ullman. Recall that LR(1)
+(as opposed to SLR or LALR) was chosen  due to the simplicity of this algorithm and
+the fact that memory sizes are large enough for tables now.
+The table sizes do indeed seem to be reasonable.
+[cite table size statistics
+for different algorithms]
 
+The parser generator is additionally augmented with
+precedence rules to resolve shift/reduce conflicts.  While
+these rules are currently operator-precedence rules in the
+style of YACC, one immediate plan for MINT is to allow
+a more complete resolution of shift/reduce and reduce/reduce
+conflicts via a more detailed specification of priorities.
 
 \subsection{Achieving goals}