[Nickle] Type declaration syntax

Fri, 04 Oct 2002 23:33:56 -0700

Keith and I had a good argument about the Nickle type
declaration syntax today.  (I'm afraid I did some yelling
and screaming as I figured out how broken it all is: my
apologies.)  My conclusion: it's pretty much a disaster.

Recall that C uses "type by example".  This means that
a declaration looks like a use.  If I say
  int x[3][2];
I can then say
  x[2][1] = 4;
This syntax gets ugly when pointers and functions are
involved, and it's hard to build complicated declarations,
but it's easy to parse and users have a lot of experience
with it.

Java, on the other hand, permits a different syntax: the
language allows cast-like types entirely on the left
of the name.  For example
  int[][] x = new int[3][2];
declares an int array dereferencable via
  x[2][1] = 4;
Note that the array size descriptors in Java are
not part of the type (also true in Nickle).
Also note that the type passed to new is a bit confusing to
interpret: int[3][2] means an array of 3 arrays of 2
integers, and thus cannot be parenthesized in the sensible
fashion (int[3])[2].

My (sigh) idea for Nickle was borrowed from Java: have the
type name sit where the variable name would sit, and thus do
"type-by-example" with the type name a stand-alone entity.
This removes the C syntactic difference between casts and
type definitions, and generalizes the Java syntax.

This is an attractive idea, but it has problems.  Consider
  int[3][2]  x;
As noted above with Java, this could mean
  array of 2 arrays of 3 integers
or
  array of 3 arrays of 2 integers
Nickle originally chose the latter meaning, which naturally
reflects the implicit parenthesization of the operators.
Java and common sense both dictate that this is really
confusing: under this meaning one cannot legally say
  x[2][1] = 4
because the indices are the other way around.

We "fixed" this problem in the obvious fashion: turn all the
subscripts around in the type definition.  Note that our
case is worse than Java's right off: we also have function
types. Is 
  int()[*] y;
a synonym for an array of pointers to functions returning
int, or for a function returning a pointer to an array of
ints?  When we turned the subscripts around, we treated
function and array subscripts symmetrically; thus y is
currently treated as a function returning an array of ints,
which is counterintuitive, to say the least.

But wait. It gets worse, because we have pointers now.
Consider
  *int[3] z;
Is this an array of 3 pointers to integers, or a pointer to
an array of 3 integers?  C suggests the former: we had
chosen the latter, for similar reasons to some of our
choices above (as well as because the implementation of
the other way around is very hard).  In this implementation
we can no longer say
  *z[2] = 4;
because z is of the wrong type: we must say
  (*z)[2] = 4;

On the other hand, this interpretation respects
the parentheses in type definition: the above declaration
is the same as
  *(int[4]) z;
which is rather nice from an operator precedence point of
view.

I think a case can be made at this point for giving up
hackery and doing arrays of arrays and functions "wrong way
round".  In this scheme
  int[3][2] x;
is an array of 2 arrays of 3 integers, and can be indexed
like
  int[1][2]
Similarly,
  int[3]() y;
is a function returning an array of 3 ints, and
  *int[3] z;
is a pointer to an array of 3 ints (because [] binds more
tightly than *).  This is confusing for the user, and
incompatible with C: it is at least consistent and regular,
however.

Other alternatives include going back to C-style
type-by-example (perhaps with just the cast-based type
syntax), various kinds of precedence hackery, and perhaps
other things we haven't thought of.

Ideas and suggestions are gratefully appreciated at this
point.  Those that don't break large chunks of working
Nickle code are even better!

	Bart Massey
	bart@cs.pdx.edu