Hashing "object identities"

Antoniotti Marco antoniotti.marco at disco.unimib.it
Sun Dec 3 16:11:00 UTC 2017


> On Dec 3, 2017, at 16:31 , Pascal Bourguignon <pjb at informatimago.com> wrote:
> 
> 
> 
>> On 3 Dec 2017, at 12:02, Antoniotti Marco <antoniotti.marco at disco.unimib.it> wrote:
>> 
>> Hi
>> 
>> I am fooling around with a problem that eventually will have to use a hash table on “triples” of “integers” and “structures".  Triples I can portably pass to the EQUAL hash table.  I cannot use an EQUALP hash table, because I would end up wasting to much time.
>> 
>> Here is the rub:  my triples <N, O1, O2> need to use the “object identity” of O1 and O2 (two structs).  I could switch to an EQL hash table keyed on (HASH-TRIPLE-KEY N O1 O2).  How would you proceed (or write HASH-TRIPLE), while staying as much portable as possible?  As you well know SXHASH is practically useless.
>>> 
>>> On Dec 3, 2017, at 12:27 , Jason Cornez <jcornez at alum.mit.edu> wrote:
>>> 
>>> Add an integer slot on the structs; hash on that?  -Jason 
>> 
>> I forgot to mention it.  THAT is another thing I want to avoid because I need to share subgraphs (DAGs).  I need to hash on the “object identity”.
> 
> 
> The thing is that there’s no conforming way to obtain the “object identity” other than the object itself, and no other conforming way to use it than to use EQ/EQL or other operators using EQL, such as EQL hash-tables.

Yes.  Hence the question.  Is there any way to get them in a “semi-portable” way?  I remember reading about “locatives”, would they help?

> If you want to rely on implementation specific behavior, you can extract (a representation of) the object identity using print-unreadable-object.
>  If identity is true, the output from forms is followed by a space character and a representation of the object's identity, typically a storage address. 
> cf. eg. https://gitlab.com/com-informatimago/com-informatimago/blob/master/common-lisp/cesarum/utility.lisp#L833

I wouldn’t really want to use that.

> So a good conforming way to do it, is to add an integer slot to the structure, with a unique integer in it.
> Another way to do it, is to use the structure as a key in a EQL hash-table mapping to the unique integer (with amortized O(1) time to get the object identity).
> If you’re short on memory, you may store the structures in a vector, and use its index as object identity (but if you want to collect structures, it’ll leave holes, and it’s now O(n) to find the object identity).
> 
> So the cheapest way is indeed to add an integer slot to the structures.

After reading through the email of the day and a number of sites, it looks like I may have to resort to that, using a second EQL hash table as an indirection.


> Note that using sxhash doesn’t give the object identity. An implementation could very well return 42 for all the structures. So you could use it only to index buckets of structures, but you would still have to build the identifying integers in one of the above ways. 

As a matter of fact that is what actually happens (modulo 42) in many implementations.

Thanks

MA




--
Marco Antoniotti, Associate Professor		tel.	+39 - 02 64 48 79 01
DISCo, Università Milano Bicocca U14 2043		http://bimib.disco.unimib.it
Viale Sarca 336
I-20126 Milan (MI) ITALY

Please check: http://cdac2018.lakecomoschool.org
Please check: http://troncopackage.org

Please note that I am not checking my Spam-box anymore.
Please do not forward this email without asking me first (cum grano salis).







More information about the pro mailing list