[elephant-devel] Garbage collection problem
Ian Eslick
eslick at csail.mit.edu
Thu Sep 27 15:57:38 UTC 2007
On Sep 27, 2007, at 11:46 AM, Chris Laux wrote:
> Thanks Ian, that is some good analysis.
>
>> - If you are doing a significant amount of deserialization with
>> lots of
>> threads than you should know that each deserialization requires a
>> call
>> to (with-lock ...) to ensure that the shared pool of buffer
>> streams is
>> thread safe (a problem with elephant < 0.9). This could conceivably
>> cause a lockup if there are lots of small deserializations happening
>> concurrently across threads mapping over the same Btrees.
>
> I had a vague suspicion of something like that, but only looked at
> transactions. I guess I would have to modify elephant to allow me
> to do
> the locking to solve such a problem.
By lockup I really meant bottleneck rather than deadlock. Elephant
really should be thread-safe now but it's always possible there is
some weird case we haven't seen yet.
>> Are you sure
>> it's GC that's eating all the time, or non-lisp CPU time in general?
>
> Well, the 99% CPU is reported for the sbcl process. I only know that
> manually invoking a gc will trigger the problem.
>
>> Although it breaks the abstraction barrier, using IDs will be a
>> definite
>> gain. You'd just make that second BTree pairs of word-freq / obj-
>> oid.
>> Then you use the OID and object type to grab the object directly from
>> elephant: (elephant::get-cached-instance oid classname)
>
> I have also been considering doing away with the second layer of
> BTrees,
> and using my own, more "linear" structures. Not sure what that could
> look like exactly though.
Updates are the real problem and you'd have to load the entire 2nd
level data structure to do any processing on it.
>> You might be better off, performance
>> wise, doing this in a C full-text indexing system and wrapping an
>> interface to it.
>
> I hadn't thought of that yet. Can you recommend any?
>
> Anyway, I guess I was asking for trouble a bit with my setup. I'm not
> sure how I'll proceed yet, but if I stick to the two-level BTree setup
> and use id's I know what to look out for.
I'd suggest you try this and see if it helps if the overhead isn't
too insane.
Ian
> Thanks again,
>
> Chris
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list