[armedbear-devel] Optimizing loading times: different strategy for externalizing

Fri May 21 09:35:20 UTC 2010

On Fri, May 21, 2010 at 5:22 AM, Alessio Stalla <alessiostalla at gmail.com> wrote:
> On Fri, May 21, 2010 at 9:39 AM, Erik Huelsmann <ehuels at gmail.com> wrote:
>> A follow-up on my progress this week:
>>
>>> As described by Alessio, it looks like our loading process profiles
>>> are dominated by reader functions. So, I've taken a look at what it
>>> actually is that we serialize. I found that many things we serialize
>>> today - which need to be restored by the reader - can be serialized
>>> without requiring the reader to restore it: lists of symbols and
>>> lists.
>>
>> Except for DECLARE-* functions related to function references, I have
>> changed the externalization code to go through a single function:
>> EMIT-LOAD-EXTERNALIZED-OBJECT. This function externalizes the object
>> (if that didn't already happen) and emits code to load a reference to
>> the restored object. The actual serialization doesn't differ much from
>> the original. The difference is in the boiler plate that was in each
>> of the DECLARE-* functions, which is no longer part of the
>> serialization functions. I use a dispatch table to find the
>> serialization function belonging to the object to be externalized.
>>
>>> That's where I decided to take a look at today's serialization
>>> mechanism. Roughly speaking, those are the functions in
>>> compiler-pass2.lisp with a function name starting with DECLARE-*; the
>>> namespace seems to contain functions for externalizing objects as well
>>> as for caching constant values.
>>
>> The caching / pre-evaluation is still in the DECLARE-* namespace;
>> nothing has changed there, not even the boiler plate :-)
>>
>>> On trunk, I'm working to:
>>>  * separate the caching from the externalizing name-spaces
>>>  * separate serialization and restoring functionalities in different functions
>>>   (they were conflated in a single function for each type of object)
>>>  * define serialization functions which allow recursive calling patterns for
>>>   nested serialization of objects (to be restored without requiring the reader)
>>
>> These actions are mostly completed. Enough for me to try the effect of
>> serializing lists differently. We have lots of lists with symbols in
>> them. These lists don't need to be read, but instead can be directly
>> constructed using "new Cons(new Fixnum(1), new Cons(..., NIL));"
>>
>> I created code yesterday which does exactly that. Unfortunately, there
>> was no measurable impact on our boot time.
>>
>> So, the conclusion must be that our fasl reader is great, to the
>> extent that it allows human-readable fasls, but it brings us the
>> negative side effect that we start up too slow to be useable on - for
>> example - Google App Engine.
>>
>>
>> Any ideas on improving our FASL format?
>>
>> Ideas I've had myself:
>>
>>  * Reduce the length of the names of the functions ABCL uses to create fasls
>>  * Embed documentation strings in CLS files instead of having them in the FASL
>>  * <Other things which reduce the size of a fasl>
>
> Can we assume that the textual part of a FASL is ASCII text and thus
> avoid UTF-8 conversions? It seems that it took a lot of time from my
> profiling.

Would that preclude having unicode string constants?

>
> Alessio
>
> _______________________________________________
> armedbear-devel mailing list
> armedbear-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/armedbear-devel
>