[Nickle] More FreeBSD and Mint

Sun Sep 19 16:38:01 PDT 2004

> > > I can't reproduce this on FreeBSD 5-2-CURRENT.  If I can get at your 
> > > machine, I should be able to find and fix the problem.  How can we make 
> > > this work?
> > > 
> > > -keith
> > > 
> > 
> > Since 5.2-CURRENT is actually more recent than 5.3, I wouldn't worry
> > about these bus errors too much.  I hosed my FreeBSD install,
> > unfortunately, and haven't been able to recreate the problem since.
> 
> No, it isn't.  5.2-CURRENT is older than 5.3-BETA (all that could be
> run, since -RC/-RELEASE hasn't happened), since once the 5.3 branch is
> created head goes to either 5.3-CURRENT or 6.0-CURRENT.  In this case it
> was 6.0-CURRENT.
> 
> The particular system Keith used was 5.2-CURRENT as of June 18th.
> 

Ooops, for some reason I thought it was.  My apologies.

In that case, I will try to recreate the problem on an old computer I
have lying around.  If I can recreate it and there is sufficient
interest, I'll figure out some way to get the interested parties access
to the box.  I'm sorry to be such a flake!

On another note, I'd like a little input on how the parser generator
ought to work since I haven't done altogether that much of this sort of
thing in the past.

I'm currently thinking about a way to express precedence.  However
here's a little background, Mint currently is a parser generator that
does not generate code nor does it read an external description of a
language to generate a parser for -- it is more properly a perser
generation library then.  When Bart and I were discussing its design
last summer, we decided that would be the simplest and best approach to
the problem and avoid some of the pitfalls that make YACC so...well...
yaccy.

In essence, it takes a grammar (in this case -- this is the ubiquitous
expression grammar) described like this:

string ADD = "+";
string T = "Term";
string F = "Factor";
string id = "Identifier";
string MULT = "*";
string RPAREN = ")";
string LPAREN = "(";
string E = "Expression";

grammar expr_grammar = {
    start = E,
    rules = {
        E => {
            {E, ADD, T},
            {T}
        },
        T => {
            {T, MULT, F},
            {F}
        },
        F => {
            {LPAREN, E, RPAREN},
            {id}
        }
    }
};

And generates a table suitable for parsing using the table driven LR
parsing algorithm.  I decided to use strings as the symbol type instead
of integers since it provides more information to the user, especially
when tracking down grammar ambiguities.  That all works very nicely.

Algorithmically, precedence and assocativity are simple -- but I think
expressing them is complicated.  I originially envisioned a seperate
precedence table, sort of analogous to YACC's %left %right and
%nonassoc:

pred_table pred;
pred_level(&pred, left(MULT, DIV));  /* MULT and DIV have higher
priority */
pred_level(&pred, left(ADD));
...

Where left, right, and nonassoc would be varadic functions.

However, the idea is not that tokens have priority at all, it's that
rules have priority.  Bison just uses the priority of the last terminal
symbol in the rule to determine the priority of the rule in most cases
and it even has special syntax to confer seperate priorities on
individual rules (the canonical example being unary minus)

All the ways I can think of to add priority to individual rules within
the syntatic limits of Mint being a nickle library are terrible.  There
might be something like --

rule_pred_table pred;
rule_pred_level(&pred, left(E, (string[*]) {E, ADD, E}));
...

Which would be a table contaning rules and priority -- forcing the user
to write several rules more than once -- which is asking for trouble.

The grammar type could also be augmented to support this information in
the individual rules, which might clutter the rules as a rule would need
to be a struct of some kind.

The symbol type could also be implemented as a disjoint union in which
some symbols are interpreted as "options" -- I believe this is something
like YACC does rule based precedence.  This is also somewhat messy.

The possibility also exists of reading in external representations of
languages.  I'm starting to lean toward this option, if Mint was a
program that merely read a description of a language and wrote out a
description of a parse table, it could theoretically be used for parsing
using any host language (supposing the parse table output file was
simple enough to read.)  However, some flexibility is lost; it becomes
much more difficult to do something like lexer feedback (unless the
parse table inclued some generalized information about modifying lexer
state)

I'd really like some input on this, please let me know about any random
ideas?  I was under the impression that at some point some kind of tool
like this could be used to bootstrap nickle - what sort of solution to
the problem of expressing precedence and other per-rule options is the
one that will help achieve this goal?
-- 
  James
  giantrobot at f-m.fm