[Nickle]Parametric polymorphism
Keith Packard
nickle@keithp.com
Thu, 21 Mar 2002 23:58:19 -0800
I'm starting to research how we might add parametric polymorphism to
nickle.
Barbara Liskov's group at LCS has designed an OO language, Theta, (yeah,
just bear with me for a minute) called Theta which provides both subtype
and parametric polymorphism using methodology similar to their earlier CLU
language. There's a paper from OOPSLA '95 on just these issues that
can be found here:
http://www.pmg.lcs.mit.edu/papers/where-clauses.pdf
I've also reviewed the Modula 3 generic system, having not looked at it in
several years. It has not aged well, serving now as more of an example of
how not to do parametric polymorphism. What a disaster.
The Theta language defines parametric types by specifying a set of
interfaces that the parameterized type must provide and the type
relationships among them:
Define a new type 'set' with parametric type 'T':
set = type[T] where T has equal(T) returns (bool)
insert(x:T)
remove(x:T) signals (not_found)
elements() yields(T)
equal(s:set[T]) returns (bool)
end set
Declare a sort function:
sort[T](a: array[T]) where T has gt(T) returns (bool)
The key here is the 'where' clause -- this constrains the types which are
acceptable for use.
As Theta is an OO language, the constraints can reference methods of the
class of 'T', making these easier to use. Without objects, the sort
function could have been written:
sort[T](a: array[T], gt: function(T, T) returns (bool));
(no, I don't know the syntax of Theta, I just made the function parameter
up).
One interesting piece of Theta syntax is that the binding of the type
variables is explicit:
Declare 'si' to be a set of integers:
si: set[int]
Call the 'sort' function on an array of chars:
sort[char](ca)
This has some appeal for me -- having the system automatically determine
the types for the type parameters smacks of type inference, something we
have talked about but not yet considered the details of. I suggest that
we require explicit type bindings until the whole language supports
inference.
The paper contrasts these explicit constraints with C++ templates and
Modula 3 generics noting both of those mechanisms are typechecked by
substituting the actual type into the implementation and typechecking the
result. That essentially *requires* that the template be compiled (or at
least typechecked) for every different actual type.
That's crazy; the user can't be assured that the template will work if the
underlying implementation changes as the only typechecking that is done
uses the implementation code.
It also compares Theta to Rapide and Emerald; Rapide uses subtyping to
specify the constraints:
type HasEql(type T) is
interface
equal: function(x: T) return Boolean;
end interface
type HasElts(type T) is
interface
elements: iterator() yield T;
end interface
member: function(type E <: HasEql(E),
type C <: HasElts(E),
a: C, x: E) return Boolean)
the '<:' syntax indicates a subset relation -- 'E' is a subset of all
types which have an 'equal' interface, 'C' is a subset of all types which
have an 'elements' interface. The separate type specifications are
unnecessary in Theta or Rapide, and they serve to limit the expressive
power of the language as subtle interactions among many types can't be
described in this way.
I believe this function would be called using type parameters:
member (int, set_of_int, si, i)
These parametric types mix with subtyping in a very simple way. Given
P1 and P2, parametric in a single type, with P2 intending to be a subtype
of P1, then we must have:
P2[T] <: P1[T] for all T
I believe this also means that the set of types permissible as parameters
to P2 must be a subset of the types permissible for P1. No other
subtyping relationship can be determined, in particular:
S <: T does not imply P[S] <: P[T]
That's because of argument contravarience in functions implementing P.
With some (limited) understanding of how these features hang together,
let's examine how these might be provided in Nickle.
The first question is how we can encapsulate a parametric type along with
related methods. Theta uses objects, which Nickle doesn't (and won't)
have. Modula 3 uses interfaces, which are similar in nature to Nickle's
namespaces. Let's start with namespaces and see where we get to.
namespace stack[T] {
typedef stackelt;
typedef struct { *stackelt prev; T value; } stackelt;
public typedef * *stackelt stack;
public exception empty (stack s);
public stack function new() {
return reference(0);
}
public void function push (stack s, T value) {
*s = reference ((stackelt) { prev = *s, value = value });
}
public T function pop (stack s) {
if (!*s)
raise empty (s);
T i = (*s)->value;
(*s) = (*s)->prev;
return i;
}
}
A generic stack type -- it would be used either as:
stack[int]::stack s = stack[int]::new();
or
namespace int_stack = stack[int];
int_stack::stack s = int_stack::new();
Note that 'stack' needs no constraints on the parameterized type; it
doesn't perform any operations on the underlying type. Let's explore a
sort function:
namespace sort[T] {
public void function sort (T[*] data, int(T, T) gt) {
for (int i = 0; i < dim(data); i++)
for (int j = dim(data)-1; j > i; j--)
if (gt (data[j-1], data[j]) )
{
int t = data[j-1]; data[j-1] = data[j]; data[j] = t;
}
}
}
That seems pretty straightforward, but has only one function and no
datatypes, so perhaps we could permit:
public void function sort[T] (T[*] data, int(T, T) gt) {
for (int i = 0; i < dim(data); i++)
for (int j = dim(data)-1; j > i; j--)
if (gt (data[j-1], data[j]) )
{
int t = data[j-1]; data[j-1] = data[j]; data[j] = t;
}
}
instead?
As a final example, let's look at a set datatype:
namespace set[T] {
typedef setelt;
typedef struct { *setelt next; T value; } setelt;
public typedef * *setelt set;
public set function new () { return reference (0); }
set function find (set s, T value, int(T, T) equals)
{
if (!*s) return 0;
if (equals ((*s)->value, value)) return s;
return find (&(*s)->next, value, equals);
}
public void function insert (set s, T value, int(T,T) equals) {
if (!find (s, value, equals))
*s = reference ((setelt) { next = *s, value = value });
}
public void function delete (set s, T value, int(T,T) equals) {
s = find (s, value, equals);
if (!*s) return;
*s = (*s)->next;
}
public int function has (set s, T value, int(T,T) equals) {
return find (s, value, equals) != 0;
}
public void function elements (set s, void(T) iterate) {
if (!*s) return;
iterate ((*s)->value);
elements (&(*s)->next, iterate);
}
public int function equal (set s1, set s2, int(T, T) equals) {
continuation c;
if (setjmp (&c, 1)) {
void function check(set loop, set compare) {
elements (loop, (void func (T value) {
if (!has(compare, value, equals)) longjmp (c, 0);
}));
}
check (s1, s2);
check (s2, s1);
return 1;
} else
return 0;
}
}
This has two obvious problems:
1) the 'equals' operator is defined as the argument to many of the
functions.
2) There's no obvious way to encapsulate the 'equals' function with
any particular derivation of this type; each user would have to
"know" the magic comparison operator to pass to the various
functions.
What I'd rather see is perhaps something like:
namespace set[T] where
int equals (T,T);
...
namespace int_set set[int] where {
int equals (int a, b) {
return a == b;
}
}
This looks a lot more like the Theta examples. I have only briefly
considered how this might work, and it looks tricky, but I'm much more
interested in seeing what the semantics and syntax should look like before
diving into the implementation.
Trivially, the compiler could generate stubs for each of the public
functions in the parametric namespace to pass a hidden structure full of
function pointers. I'd rather avoid stubs, but at least I know it's
possible to implement what seems like the right idea.
Keith Packard XFree86 Core Team Compaq Cambridge Research Lab