[Ecls-list] Name of init function.
Waldek Hebisch
hebisch at math.uni.wroc.pl
Sat May 31 00:37:25 UTC 2008
Juan Jose Garcia-Ripoll wrote:
> On Sat, May 31, 2008 at 12:42 AM, Waldek Hebisch
> <hebisch at math.uni.wroc.pl> wrote:
> > You mean standard Lisp hash function? This does not look like
> > good choice. Namely, such hash function is used to implement
> > hash tables and hash tables still work even if there is a lot
> > of collisions. But for init function we need unique names...
>
> The fact that the hash function we use is used for table lookups does
> not mean it is bad. The implementation is here:
> http://burtleburtle.net/bob/hash/doobs.html It has both a 32 bits and
> 64 bits variant. Now, that means one collision in every 2^32 cases or
> 2^64. If you add the build time, that should decrease the probability.
>
Well, this function looks good. But note:
1) Probablity of collision (assuming uniform distribution) is of order
n^2/(2*m) where n is number of names we need and m is range of hash
function. If n=10000 and m=2^32, then probability is bigger than
1/100, so in prolonged use we will almost surely get a collision.
IMHO this shows that 32 bit hash is definitly too short.
Now, 10000 object files is a lot, but reasonable for large programs,
I certainly would feel better using system that has no problem
handling 10^5 or 10^6 files.
2) Even with lower probability of failure if system has many users
_some_ users may hit the problem. I feel that we should make
probability of failure significantly smaller than probability
of hardware failure. Which means something like 10^(-15) or
10^(-18). Given that square of number of names enters probability
I would say that 64 bits is still to small a number. 128 may
be enough.
3) If you have non-uniform distribution than probability of collision
is significantly higher. So I am affraid that 108 bits with
current (improved) hash function is still too little.
--
Waldek Hebisch
hebisch at math.uni.wroc.pl
More information about the ecl-devel
mailing list