[elephant-devel] Garbage collection problem

Chris Laux chris at terraminds.com
Thu Sep 27 15:46:14 UTC 2007


Thanks Ian, that is some good analysis.

> - If you are doing a significant amount of deserialization with lots of
> threads than you should know that each deserialization requires a call
> to (with-lock ...) to ensure that the shared pool of buffer streams is
> thread safe (a problem with elephant < 0.9).  This could conceivably
> cause a lockup if there are lots of small deserializations happening
> concurrently across threads mapping over the same Btrees.

I had a vague suspicion of something like that, but only looked at
transactions. I guess I would have to modify elephant to allow me to do
the locking to solve such a problem.

> Are you sure
> it's GC that's eating all the time, or non-lisp CPU time in general?

Well, the 99% CPU is reported for the sbcl process. I only know that
manually invoking a gc will trigger the problem.

> Although it breaks the abstraction barrier, using IDs will be a definite
> gain.  You'd just make that second BTree pairs of word-freq / obj-oid. 
> Then you use the OID and object type to grab the object directly from
> elephant: (elephant::get-cached-instance oid classname)

I have also been considering doing away with the second layer of BTrees,
and using my own, more "linear" structures. Not sure what that could
look like exactly though.

> You might be better off, performance
> wise, doing this in a C full-text indexing system and wrapping an
> interface to it.

I hadn't thought of that yet. Can you recommend any?

Anyway, I guess I was asking for trouble a bit with my setup. I'm not
sure how I'll proceed yet, but if I stick to the two-level BTree setup
and use id's I know what to look out for.

Thanks again,

Chris




More information about the elephant-devel mailing list