[Nickle] Bitsets considered harmful?
Bart Massey
nickle at po8.org
Tue Nov 30 10:37:21 PST 2004
Note that Nickle contains an efficient implementation of
Math::ilog() which would help a lot with your troubles.
Hashes really aren't the right way to find the first 1 bit.
Actually, ilog() does a lot of unecessary math: I just
checked in a version of Nickle with Math::lsb() in it. I'll
be shocked if this doesn't solve your lookup speed problem.
Somebody should probably rewrite ilog() to work like lsb()
in the case where base == 2 someday.
Bart
In message <1101832604.14475.209749476 at webmail.messagingengine.com> you wrote:
> The audacious subject of this post is only half-serious -- I'm starting
> to think reimplementing the Mint parser generator using bitsets as a set
> representation may have been a bad idea.
>
> The reason behind this has to do with the nature of the canonical
> collection/closure/first set/goto system of algorithms. A bit of a
> review: this system manipulates sets of LR(1) items. Closure closes a
> set of LR(1) items such that if the parsing location (dot) is in front
> of a nonterminal in the set -- a corresponding item is added to that
> set. Goto takes a set of LR(1) items and some symbol and returns a set
> which represents a new state of the parser. The canonical collection
> set of algorithms seems to correspond roughly to a breadth first search
> of the set of states in the DFA-like entity which represents the parser.
>
> The current implementation maps LR(1) items to integers (powers of 2 in
> CVS, but I experimented with consecutive low integers on my own) and one
> of the major things that makes mint slow seemed to be going to and from
> this mapping. Since the set of LR(1) items is so incredibly vast, it
> takes a while for the lookup to find the corresponding integer in the
> hash table mapping LR(1) items to integers. This is done (mostly) for
> the benefit of the bitsets.
>
> I'm thinking (and tell me if my thinking is wrong) that the other
> interesting problem with bitsets is that all set operations are (in
> general) linear in the size of domain instead of in the size of the sets
> involved. Set membership in particular is very poor for large domains.
> The constants, however, are so incredibly low that in practice it's a
> very efficient set representation -- it's also a good set representation
> when the sets being operated on have lots of elements (with respect to
> the domain.) However, the set of all LR(1) items associated with a
> grammar is pretty big. It grows linearly in the number of symbols,
> roughly linearly in the average length of the productions, and linearly
> again in the number of productions. Furthermore, the sets of LR(1)
> representing the canonical collection are mostly disjoint meaning that
> for any appreciable number of sets, the bitset representation becomes
> very poor.
>
> I'm starting to think the most efficient repesention of sets of LR(1)
> items might be best achieved using a red-black tree -- this would
> preclude the need for a mapping from LR(1) items to the integers or the
> use of bitsets. What do you folks think?
> --
> James
> giantrobot at f-m.fm
>
>
> _______________________________________________
> Nickle mailing list
> Nickle at nickle.org
> http://nickle.org/mailman/listinfo/nickle
More information about the Nickle
mailing list