[Commit] papers/mint_2004 Makefile, NONE, 1.1 abstract.tex, NONE, 1.1 lola.tex, NONE, 1.1 mint.tex, NONE, 1.1 nickle.tex, NONE, 1.1 outline, NONE, 1.1

Sun Nov 28 22:40:11 PST 2004

Committed by: keithp

Update of /local/src/CVS/papers/mint_2004
In directory home.keithp.com:/tmp/cvs-serv20762

Added Files:
	Makefile abstract.tex lola.tex mint.tex nickle.tex outline 
Log Message:
Add mint paper

--- NEW FILE: Makefile ---

.SUFFIXES: .tex .dvi .aux .eps .fig .dia .ps .pdf .bib .bbl

TOP=mint
TEXFILES=$(TOP).tex \
abstract.tex \
lola.tex \
nickle.tex

all: $(TOP).ps $(TOP).pdf $(TOP).www/$(TOP).html

.PHONY: subdirs

subdirs:
#subdirs:
#	@${MAKE} -C figures ${MAKECMDGOALS}
#	@${MAKE} -C examples ${MAKECMDGOALS}

FIGFILES:=$(wildcard *.fig)
EPSFILES:=$(wildcard *.eps)
EPSFILES+=$(FIGFILES:.fig=.eps)
PDFFILES=$(EPSFILES:.eps=.pdf)

.fig.eps:
	fig2dev -L eps $< >$@

.fig.pdf:
	fig2dev -L pdf $< >$@

.eps.pdf:
	epstopdf $<

$(TOP).ps: $(TOP).dvi
	dvips -t letter -o $(TOP).ps $(TOP)

$(TOP).dvi: $(TEXFILES) $(EPSFILES) subdirs
	latex $(TOP) || true
	bibtex $(TOP) || true
	latex $(TOP) || true
	latex $(TOP)

$(TOP).pdf: $(TEXFILES) $(PDFFILES)
	pdflatex $(TOP) || true
	bibtex $(TOP) || true
	pdflatex $(TOP) || true
	pdflatex $(TOP)

$(TOP).www/$(TOP).html: $(TOP).www $(TEXFILES)
	latex2html -white -dir $(TOP).www -split 0 -no_navigation $(TOP).tex

$(TOP).www:
	mkdir -p $@

clean: subdirs
	rm -f *.aux *.dvi *.log 
	rm -f $(TOP).ps $(TOP).pdf $(TOP).bbl $(TOP).blg
	rm -rf $(TOP).www

--- NEW FILE: abstract.tex ---
\abstract
The Nickle programming language extends C-like syntax and
semantics to accomodate modern programming techniques and
styles.  As such, it is a nice language for implementing
automated language recognizers.  In addition, its syntax
presents some unique challenges to automatic language
recognition.  The MINT translator is a tool for producing
automatic translators from a wide variety of target
languages to parse trees in the form of abstract syntax
trees.  These translators have broad application beyond the
Nickle environment.  To achieve this goal, the MINT
architecture has been factored in a fashion developed some
time ago by the authors. This split is used to allow the
generated translator to operate in a wide range of
languages.  MINT has already proven to be a useful tool, and
work is underway to extend it to a production-quality
application.

--- NEW FILE: lola.tex ---
\section{Lola and Flo: Language independent LL parser generation}

Lola was an LL parser generator written in 1987 in the Kalypso dialect of
lisp (a research language focusing on functional lisp programming).  As with
MINT, the output of the parser generator was a language independent data
structure which could drive parser frameworks in multiple languages.
Parsers for the Lola output were written in C, Pascal and Kalypso.

\begin{figure}
\begin{verbatim}
;
; this grammar recognises integer
; expressions involving +,-,*,/ and unary -
;
(
(lines          (expr "PRINT" \n lines)
                (\n lines)
                ()
                )
(expr           (term + term "ADD")
                (term - term "SUBTRACT")
                (term)
                )
(term           (fact * fact "MULTIPLY")
                (fact / fact "DIVIDE")
                (fact)
                )
(fact           (|(| expr |)|)
                (int "PUSH")
                (- fact "NEGATE")
                )
(int            ("CLEAR" "ADD-DIGIT" digit digits)
                )
(digits         ("ADD-DIGIT" digit digits)
                ()
                )
(digit          (|0|) (|1|) (|2|) (|3|) (|4|) (|5|) (|6|) (|7|) (|8|) (|9|)
                )
)
\end{verbatim}
\caption{A sample Lola grammar}\label{fig-lola-grammar}
\end{figure}

The Flo tool was a grammar-to-grammar transformation tool to factor left
common sub-expressions out of a grammar to permit more compact and easier to
understand representations of LL(1) languages.  It was also written in
Kalypso.  Grammars are represented in lisp s-expressions as see in
Figure~\ref{fig-lola-grammar}, eliminating the need for a custom grammar
parser and permitting easier grammar-to-grammar transformation tool
development.  

Semantic attributes were attached to the grammar in the form of special
tokens in the right side of each grammar production.  The parser would then
perform the semantic operation when the action appeared on the top of the
stack.

The ability to analyse grammars and systematically catch language design and
grammar errors long before sample language sentences were applied to the
parser provided a higher degree of confidence in the correctness of the
resulting parsing application.  The ability to produce parsers in one of
many languages encouraged the use of this higher level tool in place of
ad-hoc parsers on several occasions.

\subsection{Configuration Languages with Lola}

Lola and Flo were first used to produce an ANSI X3.64 character terminal
implementation in Pascal.  Using a parser generator allows the development
of an implementation artifact much closer in syntax and style to the ANSI
specification than the traditional maze of twisty low level language code.
Formal testing of this implementation demonstrated that it would correctly
parse the standard ASCII control sequences.  The lack of parsing correctness
in many existing adhoc ANSI X3.64 implementations reinforces the use of
higher level parsing tools in such designs.  As the parser itself was
written in a fairly low-level language (Pascal), performance of the
resulting implementation easily met the performance requirements.

>From that point forward, Lola and Flo were used together to produce parsers
for a variety of domain-specific languages supporting application
configuration and customization.  These implementations were characterized
by:
\begin{itemize}
\item Incomplete and changing specifications. Not only were the elements of
the configuration constantly changing as the specification for the
application was adjusted with reference to the ongoing implementation, but
the syntax of the language changed as whole new classes of configuration
information were incorporated into the system.
\item Unknown parsability.  The configuration requirements for some pieces
of the system were provided by developers ignorant of the limitations of a
context free grammar.
\item Target platform under development.  With the final execution location
of the parser still in the hardware design phase, having many existing
parser implementations provide for unit testing during initial development.
\item Lexer also under constant development.  Without an automatic lexer
generator, the hand-crafted lexer turned out to be a significant development
effort.  Using an existing simplistic tokenizing lexer allowed grammar
changes to be tested in advance of lexer reimplementation.
\end{itemize}

With the grammar validated by the automatic parser generator, the possiblity
for parser errors due to changes in the input language was greatly reduced.
This allowed changes to be rolled into the system late in the development
process.

\subsection{Parser Implementation Languages}

With parsers available in several languages, LL language grammars could be
rapidly tested in the command-line environment provided by Kalypso and only
when working properly would they be transferred to a parser written in the
target execution language.  Not only did this reduce development time of the
parser by speeding the turn-around from change to test, it also provide a
built-in lexer in the form of the lisp s-expression recognition code.

In the case of the X3.64 parser, a Kalypso lexer was built enabling
end-to-end regression testing against the final Pascal implementation.

\subsection{Limitations of Lola and Flo}

Lola and Flo were designed for LL languages.  This domain restriction was
acceptable for the tasks it was originally designed for in 1987, but as a
general purpose parser generator, the restriction is onerous today.  An LL
parser requires significantly less table space and processing than the
typical LR or even LALR parser, something occasionally relevant on the
machines common in that era.

As can be seen in the example grammar above, even the Flo tool which could
factor common left sub-expressions wasn't sufficient to present grammars in
their most natural form (cf the awkward int/digits productions).  Also, the
semantic action mechanism is as context-free as the grammar itself,
requiring the parser track significant language state on its own, in
particular it does not provide a mechanism for producing a parse tree.

Probably the most significant limitation for these tools today is that the
implementation language, Kalypso, has been effectively retired from general
use, making further development and improvement problematic.

--- NEW FILE: mint.tex ---
\documentclass{article}
\title{MINT Is A Nickle Translator\\
Generating Cross-Lingual Recognizers
}
\author{Emma Kuo \and James LaMar \and Bart Massey\\
Portland State University
\and Keith Packard\\
Hewlett-Packard Company}
\begin{document}
\maketitle
\input{abstract}
\input{lola}
\input{nickle}
\end{document}

--- NEW FILE: nickle.tex ---
\section{The Nickle Programming Language}

--- NEW FILE: outline ---
Bart's draft of 2004/11/20 18:05 PDT

MINT Is a Nickle Translator:
Generating Cross-Lingual Recognizers

(alphabetical?)
Emma Kuo, James LaMar, Bart Massey, Keith Packard
Portland State University

Abstract

The Nickle programming language extends C-like syntax and
semantics to accomodate modern programming techniques and
styles.  As such, it is a nice language for implementing
automated language recognizers.  In addition, its syntax
presents some unique challenges to automatic language
recognition.  The MINT translator is a tool for producing
automatic translators from a wide variety of target
languages to parse trees in the form of abstract syntax
trees.  These translators have broad application beyond the
Nickle environment.  To achieve this goal, the MINT
architecture has been factored in a fashion developed some
time ago by the authors. This split is used to allow the
generated translator to operate in a wide range of
languages.  MINT has already proven to be a useful tool, and
work is underway to extend it to a production-quality
application.

+ Recognizer Generators [Bart]
  + Parser generators
    + Why
    + RD, LL(1), LL(K), LALR(1), LR(1), etc: brief
      descriptions and example systems
    + Limitations
      + want lexing separated
      + validation/expressivity tradeoffs
  + Lexer generators
    + Lex, FLEX, etc
    + Interfacing to Parser Generators
  + Parse Tree and AST construction
    + Tree construction
    + Semantic actions
  + Problem summary
+ An early solution: LOLA/FLO [Keith]
  + Generate LL(1) parse tables in kalypso
    + Kalypso
    + FLO
    + Applications
  + Build parser in any target language
  + Problems with LOLA/FLO
    + Kalypso is dead
    + No lexer generator (??)
    + Even with FLO, restrictive language
    + Connecting ASTs to grammar across FLO
+ Nickle [Keith]
  + Introduction to Nickle
  + Nickle's translator is hard
    + Places where modified C grammar is painful
    + Places where connecting syntax and semantics is painful
  + Nickle is good for translator generation
    + Types, GC, nice initializer syntax, interactivity, etc.
  + Limitations of Nickle-based translators
    + Nickle is not a popular language
    + Nickle lexers/parsers would be slooow...
    + Hard to use Nickle lexers/parsers from other langs
+ MINT [James, Emma]
  + Design
    + LOLA/FLO-style split between generator + lexer and
      between generator + parser [fig]
    + Lexer and parser isolated in libraries
    + Generates Canonical LR(1) parsers
    + Produces ASTs
  + Implementation
    + Code base
    + Bootstrapping
  + Achieving goals
+ Translators in MINT [James, Emma]
  + Target language examples
    + regexp
    + BNF
    + Nickle subset
  + Backend language examples
    + Nickle
    + C (?)
+ Future Work [Bart]
  + Expand generator functionality
  + Bootstrap Nickle compiler
+ Conclusions [Bart]