[Nickle]Union value semantics

Keith Packard nickle@nickle.org
Thu, 29 Aug 2002 08:52:40 -0700


Nickle currently treats unions as a mutable object and associated tag.  
This follows the C model of treating unions as structs where the 
offset of all of the elements was zero.  

I'm sure I'll surprise many when I note that unions are currently 
implemented as mutable objects.  In:

	> typedef union { int X; string Y; } U;
	> U x;
	> x.X = 12;
	> x
	X = 12
	> x.Y = "hello";
	> x
	Y = "hello"

Executing 'x.X = 12' performs the following gymnastics:

	1)	Note that x is unset
	2)	Create a union value with tag set to 0 (!), set x to this value
	3)	Fetch the value stored in x
	4)	Note that the value is a union, not a struct
	5)	Set the current union tag to 'X'
	6)	Create a reference to the union storage location
	7)	Store 12 through the reference

I know this looks bizarre; we'd obviously prefer:

	1)	Create a reference to x
	2)	Note that x is of type union
	3)	Create a union value of type 'U' with tag 'X' and value '12'
	4)	Store the union value through the reference.

This way, unions would be immutable objects, and you wouldn't be able to 
reference union fields.  That particular semantic is visible in:

	> poly q = &x.Y;
	> q
	&12
	> x
	Y = 12		(!)

When the reference to 'x.Y' is created, the tag for 'x' is set to 'Y'; the 
assumption is that the reference will shortly be used to store a value.

One way out of this mire is to create a new representation for a reference
to a union; this new representation would include the tag so that when the
referenced value is fetched, the tag would be checked and when the
reference value is stored, the tag would be set.

Alternatively, we could disallow references to union members.  This would
eliminate the semantic problem discovered above. Without union member
references, the sequence:

	> poly v = x;
	> v.Y = "hello";

is a bit more difficult to compile; 'v' might contain a struct type which 
would require a different code sequence than when 'v' contains a union -- 
the struct case assigns a member of the structure while the union case 
casts the value "hello" to the appropriate union value.  There are two 
choices:

	1)	Detect when 'v' happens to contain a union value and
		perform a union cast operation, else a struct member
		assignment.

	2)	Assume that 'v' is a struct type and generate a run-time
		exception when this is not the case.

I'm leaning towards option 2); the '.' operator is a struct member 
reference unless in a known union type context.

-keith