[armedbear-devel] Optimizing loading times: different strategy for externalizing

Fri May 21 09:22:40 UTC 2010

On Fri, May 21, 2010 at 9:39 AM, Erik Huelsmann <ehuels at gmail.com> wrote:
> A follow-up on my progress this week:
>
>> As described by Alessio, it looks like our loading process profiles
>> are dominated by reader functions. So, I've taken a look at what it
>> actually is that we serialize. I found that many things we serialize
>> today - which need to be restored by the reader - can be serialized
>> without requiring the reader to restore it: lists of symbols and
>> lists.
>
> Except for DECLARE-* functions related to function references, I have
> changed the externalization code to go through a single function:
> EMIT-LOAD-EXTERNALIZED-OBJECT. This function externalizes the object
> (if that didn't already happen) and emits code to load a reference to
> the restored object. The actual serialization doesn't differ much from
> the original. The difference is in the boiler plate that was in each
> of the DECLARE-* functions, which is no longer part of the
> serialization functions. I use a dispatch table to find the
> serialization function belonging to the object to be externalized.
>
>> That's where I decided to take a look at today's serialization
>> mechanism. Roughly speaking, those are the functions in
>> compiler-pass2.lisp with a function name starting with DECLARE-*; the
>> namespace seems to contain functions for externalizing objects as well
>> as for caching constant values.
>
> The caching / pre-evaluation is still in the DECLARE-* namespace;
> nothing has changed there, not even the boiler plate :-)
>
>> On trunk, I'm working to:
>>  * separate the caching from the externalizing name-spaces
>>  * separate serialization and restoring functionalities in different functions
>>   (they were conflated in a single function for each type of object)
>>  * define serialization functions which allow recursive calling patterns for
>>   nested serialization of objects (to be restored without requiring the reader)
>
> These actions are mostly completed. Enough for me to try the effect of
> serializing lists differently. We have lots of lists with symbols in
> them. These lists don't need to be read, but instead can be directly
> constructed using "new Cons(new Fixnum(1), new Cons(..., NIL));"
>
> I created code yesterday which does exactly that. Unfortunately, there
> was no measurable impact on our boot time.
>
> So, the conclusion must be that our fasl reader is great, to the
> extent that it allows human-readable fasls, but it brings us the
> negative side effect that we start up too slow to be useable on - for
> example - Google App Engine.
>
>
> Any ideas on improving our FASL format?
>
> Ideas I've had myself:
>
>  * Reduce the length of the names of the functions ABCL uses to create fasls
>  * Embed documentation strings in CLS files instead of having them in the FASL
>  * <Other things which reduce the size of a fasl>

Can we assume that the textual part of a FASL is ASCII text and thus
avoid UTF-8 conversions? It seems that it took a lot of time from my
profiling.

Alessio