[armedbear-devel] Fwd: [j-devel] Improving startup time: sanity check

Thu Oct 29 09:22:30 UTC 2009

forwarding

---------- Forwarded message ----------
From: Alessio Stalla <alessiostalla at gmail.com>
Date: Wed, Oct 28, 2009 at 11:20 PM
Subject: Re: [j-devel] Improving startup time: sanity check
To: Erik Huelsmann <ehuels at gmail.com>
Cc: armedbear-j-devel at lists.sourceforge.net, Alex Muscar <muscar at gmail.com>

On Wed, Oct 28, 2009 at 9:15 PM, Erik Huelsmann <ehuels at gmail.com> wrote:
> Last weekend, we experimented with better autoloading. It turned out
> to strip roughly .4 seconds from a cold startup time of 1.7s, making
> it a 25% improvement.
>
> However, the reason we started out with the startup time improvements
> in the first place was the ABCL startup time on Google App Engine. It
> turns out that our CPU usage during startup hasn't really decreased
> much (as per their benchmark indicator - they can't really give an
> actual figure).
>
> So, I asked for advice on #appengine (on freenode). Their reaction was
> "we can't imagine the startup time being related to the size of the
> JAR" even though Peter Graves calculated a 34% ratio between ABCL and
> Clojure jar sizes and a 35% ratio between startup times - that looks
> like a linear match. Their reaction continued "you're probably just
> doing too much work during the init() phase."
>
> The init() phase is where the ABCL environment gets loaded and all
> function objects get created.
>
> Let's assume for a second they're right. In that case we must assume
> it's not I/O holding us up: it's the work the CPU must do to get us up
> and running. If that's true, profiling the application should tell us
> something about the bottlenecks we're running into. I happen to have
> done quite a number of such profiles in the course of last week. The
> conclusion which stands out is that ABCL - during the startup process
> - spends ~ 40% of its time finding class constructors: the main
> component of creating function objects.
>
> This brought me to the conclusion that our startup process could be
> much faster, if we decided to delay function object creation until the
> function is actually used: we would eliminate the need to construct
> function objects until they're used instead of creating them when
> their siblings are requested to be loaded.
>
> The idea is to create another Autoload derivative which will be
> "installed" in the appropriate places which, when invoked, loads the
> actual class from the byte array. I'm hoping this will cause a more
> equally spread "initialization load". The performance hit will only be
> the first call to the function: after it has been converted from the
> byte array, the autoload object will remove itself from the function
> call chain.
>
> So, how about it? Comments most welcome!

I have mixed feelings about the idea. I think it's clever; but I also
think we (I, at least) need more data to know if it will be actually
beneficial.

If the goal is speeding up startup time in a context like AppEngine -
where not only Lisp, but the whole user application will be loaded
from scratch from time to time - then it is critical to know how many
Lisp functions a generic application uses on average (both directly
and indirectly). If it turns up that, say, 50% of Lisp is commonly
used, then no matter how clever an autoloading scheme you implement,
you'll cut loading times only by roughly 50% at best.
If getting constructors through reflection is really the bottleneck,
and if we determine that using new instead of reflection is
significantly faster (from a quick test of mine, it seems it *really*
is [1]), then it might be sensible to avoid reflection altogether and
devise another scheme. For example, the compiler-generated class X
could contain in its static initialization block the equivalent of
something like

Lisp.someThreadLocal.set(new X())

and loadCompiledFunction or what it is could just fetch the instance
from the threadlocal; not very elegant, but if it speeds things up...

Alessio

[1] this is the astounding result on a couple of runs on 50000
iterations (test files attached):
REFLECTION: 16262373155
NEW: 84267527
% SLOWER: 19298

REFLECTION: 15917190176
NEW: 103681915
% SLOWER: 15351

REFLECTION: 15838714133
NEW: 77235481
% SLOWER: 20507

(times in ns) i.e. reflection as we use it is roughly 150-200 times
slower than new and that's on a very simple class with no superclasses
and a single constructor! The test might be wrong as I wrote it
quickly and it's quite tricky. It uses the very same classloader of
abcl, though (copy-pasted).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test.java
Type: text/x-java
Size: 1659 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/armedbear-devel/attachments/20091029/e724512b/attachment.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JavaClassLoader.java
Type: text/x-java
Size: 2228 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/armedbear-devel/attachments/20091029/e724512b/attachment-0001.java>