[Nickle]Nickle memory profiling

Wed, 05 Jun 2002 01:03:58 -0700

Around 17 o'clock on Jun 5, Michael Cahill wrote:

> So, does anyone have any recommendations for working out where in nickle
> code the memory is being allocated?

Well, assuming that the code will spend CPU time where it's allocating
memory, you can use profiling to find the hotspots in the code

(note - I edited miller-rabin.5c to make all of the functions public so 
they could be printed out).

> profile(1);
> load "numbers.5c"
> load "miller-rabin.5c"
> Miller_Rabin::primebits(100000, 20)
^C
<abort>
> print Miller_Rabin::primebits
public int primebits (int n, int d)
{
         0,0         :      while (1)
         0,0         :      {
     68018,0         :          int q = PRNG::randbits (n - 1) + 2 ** (n - 1);
    139827,0         :          int why = composite (q, d);
         0,0         :          if (!why)
         0,0         :              return q;
         0,0         :      }
}
> print Miller_Rabin::composite
public int composite (int n, int d)
{
         0,0         :      int i, j;
         0,0         :      for (i = 0; i < nprimes && primes[i] < n; i++)
       617,617       :          if (n % primes[i] == 0)
         0,0         :              return 1;
         0,0         :      for (j = 0; j < d; j++)
         0,0         :      {
     52151,0         :          int a = 1 + PRNG::randint (n - 1);
     87059,0         :          if (witness (a, n))
         0,0         :              return a;
         0,0         :      }
         0,0         :      return 0;
}
> print Miller_Rabin::witness
public int witness (int a, int n)
{
     87059,0         :      witness_result we = witnessexp (a, n - 1, n);
         0,0         :      if (we.wit)
         0,0         :          return 1;
         0,0         :      if (we.pow != 1)
         0,0         :          return 1;
         0,0         :      return 0;
}
> 
print Miller_Rabin::witnessexp
public witness_result witnessexp (int b, int e, int m)
{
      6179,6179      :      if (e == 0)
         0,0         :          return (witness_result) { pow = 0, wit = 1 };
      2016,2016      :      if (e == 1)
         0,0         :          return (witness_result) { pow = b % m, wit = 0 !
1455031948,78864     :      witness_result tmp = witnessexp (b, e // 2, m);
         0,0         :      if (tmp.wit)
         0,0         :          return tmp;
         0,0         :      int t = tmp.pow * tmp.pow % m;
         0,0         :      if (t == 1 && tmp.pow != 1 && tmp.pow != m - 1)
         0,0         :      {
         0,0         :          tmp.wit = tmp.pow;
         0,0         :          tmp.pow = t;
         0,0         :          return tmp;
         0,0         :      }
         0,0         :      if (e % 2 == 0)
         0,0         :          tmp.pow = t;
         0,0         :      else
         0,0         :          tmp.pow = t * b % m;
         0,0         :      return tmp;
}
> 

The left number next to each statement is the sum of times with that 
statement on the stack, the right number is time executing in that 
statement.

(note -- the profiling code had a little trouble with the 10 digit number 
above, current CVS has that bug fixed).

The goal was to make it the cumulative time for that statement and any
callees, however recursive functions break it pretty badly, as you can see 
the recursive call to witnessexp generates an exponential number of 
"ticks" in that statement.  Just ignore that for now until I figure out a 
cheap way of detecting recursion in this function.

Ticks are 10s of CPU milliseconds; it should be clear that the recursive
call to witnessexp is expensive here, I'm sure in this case the recursion
never bottomed out -- you'd end up with a stack 100000 frames deep, each
with some rather large numbers involved (around 3000 32-bit words), for
a total allocation of some 1.2 GiB.

Ask for a smaller number (say 50000 bits) and things are much more 
reasonable -- nickle chugs along happily at around 256M until you get 
bored waiting for the result.  It's still running here after 5 minutes.

You just needed a bit more memory and you never would have noticed.

This little example does point up how nice it would be to get a more 
sophisticated debugger running; the ability to suspend and restart 
execution would make this kind of issue much easier to diagnose.

Keith Packard        XFree86 Core Team        HP Cambridge Research Lab