[elephant-devel] Garbage collection problem
Chris Laux
chris at terraminds.com
Thu Sep 27 15:46:14 UTC 2007
Thanks Ian, that is some good analysis.
> - If you are doing a significant amount of deserialization with lots of
> threads than you should know that each deserialization requires a call
> to (with-lock ...) to ensure that the shared pool of buffer streams is
> thread safe (a problem with elephant < 0.9). This could conceivably
> cause a lockup if there are lots of small deserializations happening
> concurrently across threads mapping over the same Btrees.
I had a vague suspicion of something like that, but only looked at
transactions. I guess I would have to modify elephant to allow me to do
the locking to solve such a problem.
> Are you sure
> it's GC that's eating all the time, or non-lisp CPU time in general?
Well, the 99% CPU is reported for the sbcl process. I only know that
manually invoking a gc will trigger the problem.
> Although it breaks the abstraction barrier, using IDs will be a definite
> gain. You'd just make that second BTree pairs of word-freq / obj-oid.
> Then you use the OID and object type to grab the object directly from
> elephant: (elephant::get-cached-instance oid classname)
I have also been considering doing away with the second layer of BTrees,
and using my own, more "linear" structures. Not sure what that could
look like exactly though.
> You might be better off, performance
> wise, doing this in a C full-text indexing system and wrapping an
> interface to it.
I hadn't thought of that yet. Can you recommend any?
Anyway, I guess I was asking for trouble a bit with my setup. I'm not
sure how I'll proceed yet, but if I stick to the two-level BTree setup
and use id's I know what to look out for.
Thanks again,
Chris
More information about the elephant-devel
mailing list