[Ecls-list] Name of init function.

Sat May 31 00:37:25 UTC 2008

Juan Jose Garcia-Ripoll wrote:
> On Sat, May 31, 2008 at 12:42 AM, Waldek Hebisch
> <hebisch at math.uni.wroc.pl> wrote:
> > You mean standard Lisp hash function?  This does not look like
> > good choice.  Namely, such hash function is used to implement
> > hash tables and hash tables still work even if there is a lot
> > of collisions.  But for init function we need unique names...
> 
> The fact that the hash function we use is used for table lookups does
> not mean it is bad. The implementation is here:
> http://burtleburtle.net/bob/hash/doobs.html It has both a 32 bits and
> 64 bits variant. Now, that means one collision in every 2^32 cases or
> 2^64. If you add the build time, that should decrease the probability.
> 

Well, this function looks good.  But note:

1) Probablity of collision (assuming uniform distribution) is of order
   n^2/(2*m) where n is number of names we need and m is range of hash
   function.  If n=10000 and m=2^32, then probability is bigger than
   1/100, so in prolonged use we will almost surely get a collision.
   IMHO this shows that 32 bit hash is definitly too short.
   Now, 10000 object files is a lot, but reasonable for large programs,
   I certainly would feel better using system that has no problem
   handling 10^5 or 10^6 files.
2) Even with lower probability of failure if system has many users
   _some_ users may hit the problem.  I feel that we should make
   probability of failure significantly smaller than probability
   of hardware failure.  Which means something like 10^(-15) or
   10^(-18).  Given that square of number of names enters probability
   I would say that 64 bits is still to small a number.  128 may
   be enough.
3) If you have non-uniform distribution than probability of collision
   is significantly higher.  So I am affraid that 108 bits with
   current (improved) hash function is still too little.

-- 
                              Waldek Hebisch
hebisch at math.uni.wroc.pl