Parsing big XML files with klacks and sbcl

Russ Tyndall russ at acceleration.net
Wed May 30 14:45:23 UTC 2018


I ran into this problem on some unit tests, because they were
churning the memory as fast as they could. I created a function that
checked the current heap size and if it was above a certain limit
would GC (up to 3 times) till it was below that limit.

You may find this useful:
https://gist.github.com/bobbysmith007/8dd2da4483d32ab0d02d334f8b81f1bc

It contains some details that are maybe not relevant to you, but were
helpful in my circumstance (such as clearing caches along the way /
logging).

I sort of feel like the GC should be slightly more aggressive about what
its doing, you might also find that simply adding a small sleep between
rows to be sufficient to get the GC to do it on its own.

Cheers,
Russ Tyndall
Acceleration.net
Programmer


On 05/29/2018 04:33 PM, Mark Janssen wrote:
> On Tue, May 29, 2018 at 9:20 PM, Attila Lendvai <attila at lendvai.name> wrote:
>>> (loop while t do
>>>        (klacks:consume *src*))
>> try to add (sb-ext:gc :full t) inside the loop. if that helps, then
>> you're overwhelming SBCL's gc algorithm by allocating too much garbage
>> between two gc's (or something along that line, maybe someone else
>> with more knowledge of the details can elaborate).
>>
> This indeed keeps the memory usage in check.
> However a forced gc on every loop sounds less than ideal.
> I am a bit surprised that a streaming parser generates so much
> garbage, considering on of the main use cases is handling large files.
> Also I am wondering if the GC can be configured to run more
> aggressively without further explicit calls in the rest of the code.
>




More information about the cxml-devel mailing list