[Snek] Using 'black' to format python code in snek

Tue Feb 18 12:40:04 PST 2020

I've pushed a fairly long series of patches that manages to reformat all
of the python code in snek using 'black'. That includes all of the snek
test cases as well, and that ended up being a bit larger project than I
expected.

When black wraps a list, tuple, dictionary, or function parameter list to
keep it within the maximum line length, it formats them with one value
per line, each ending with a comma:

        check(2*3*5*17*19*23,many_args(2,3,5,a=17,b=19,c=23), "many_args (a=17,b=19,c=23)")

becomes

        check(
            2 * 3 * 5 * 17 * 19 * 23,
            many_args(2, 3, 5, a=17, b=19, c=23),
            "many_args (a=17,b=19,c=23)",
        )

That's lovely, except that Snek had a syntactic inconsistency compared
with python; it didn't allow a trailing comma in lists, tuples,
dictionaries and function parameter lists. Except for the special case
of single-element tuples.

So, to allow code to be reformatted with black, I needed to fix snek to
support this syntax.

To allow a trailing comma, we need to change the grammar from

        actuals-p       : COMMA expr actual-p actuals-p
                        ;

to

        actuals-p       : COMMA expr actual-p actuals-p
                        | COMMA
                        ;

Now our list of 'actuals' can end with a COMMA.

Snek uses 'lola' to convert it's LL(1) grammar into parse tables; LL(1)
grammars are much more restricted than the more usual LALR grammars
supported by tools like YACC or Bison. In particular, they require that
the parser be able to select which production given only a non-terminal
and the current input token.

If you look at the grammar change proposed above, you'll note that given
the non-terminal 'actuals-p' and the input token 'COMMA', there are two
possible matching productions. That's fine in an LALR grammar, but we
have to fix that for our LL parser generator to work. So we create an
intermediate non-terminal and refactor:

        actuals-p       : COMMA actuals-end
                        ;
        actuals-end     : expr actual-p actuals-p
                        |
                        ;

Now we can recognize COMMA and go try to match actuals-end.

And here we hit a bug in our parser generator -- it can't handle the
mutual-recursion between actuals-p and actuals-end. Lola has a special
case for handling recursion within the productions of a single
non-terminal, extending that to catch this mutual-recursion case was a
matter of having a stack of non-terminals and checking the production
elements against the whole stack instead of just the current
non-terminal.

So, I've release Lola version 1.3 today, which includes this fix, then
added the grammar changes to snek, then reformatted the code with black
and added a step in 'make check' which runs black against most of the
python code in the tree.

'most', not 'all', because there are a small handful of tests which
have intentional syntax errors, and black doesn't like those.

Sometimes simple ideas uncover a wealth of "opportunity".

-- 
-keith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://keithp.com/pipermail/snek/attachments/20200218/4f6bbf13/attachment.sig>