[armedbear-devel] Idea for new FASL format: smaller, faster

Sat Jan 21 21:34:12 UTC 2012

On Sat, Jan 21, 2012 at 9:55 PM, Erik Huelsmann <ehuels at gmail.com> wrote:
> The spec allows functions in a fasl to hard code references to each
> other, except when declared not-inline. Currently, we take advantage
> of that fact only a little bit: the only bit we eliminate is the
> function lookup from the symbol. Anything else works just like
> function calls which are not in the same file. The mail below comes
> from my long-standing desire to take better advantage of the room
> offered by the spec.

Thanks a lot for thinking thoroughly about this. I have only a few
minor comments/questions.

> What things would I like to improve on?
>
> 1. We only inline backward referenced functions in the same fasl
> 2. We parse the argument list of each function call with respect to
> keywords and optional parameters
> 3. We store the names of symbols referenced in many .cls files each of
> which gets separately zipped
> 4. We store (cache) function references to inlined functions as objects
>
>
> What's the basic idea?
>
> The idea is to shift paradigms: instead of compiling each Lisp
> function into a class, we will start compiling each lisp function to a
> Java function. A FASLs would become one or more classes -- depending
> on if everything fits into a single class or not. The idea is by the
> way not to change the way we model functions in symbol function slots,
> but instead to change the way everything is stored in a fasl.
>
>
> Does it address the items mentioned above?
>
> 1. Since class files are written once the class is complete, we have
> to delay serializing the lisp functions to disk until they're all
> known. This means that we will know all functions defined in a fasl
> before it's serialized. With an appropriate linker phase, we could
> easily resolve all forward referenced function calls.
>
> 2. Because lisp functions have become Java functions, we can't
> directly expose them to the Lisp world anymore. This basically creates
> the option to have an 'internal function signature' and an 'external
> function signature'. SBCL has this: XEPs -- eXternal Entry Points.
> These entry points into a function sort the arguments into the right
> order before calling the internal entry point. Code which is compiled
> into the same fasl pre-sorts the arguments at compile time (when
> possible) and calls the internal entry point -- eliminating the need
> to sort keyword parameters.

That's awesome. To clarify, do you intend these XEPs to be method
overloads in the same FASL-class?

> 3. By having a single (or fewer) class files, we can achieve higher
> re-use ratios of the same constants which would otherwise be included
> in many-many .cls files.
>
> 4. Because function calls to 'inline' functions will become calls to
> sibling methods, they will become clear candidates for inlining to the
> JIT. With the object references, it is not clear this is an option.

That's a very good point. Currently by always passing through
getSymbolFunction() we basically disable JIT across Lisp functions.
Invokedynamic would help here, but it's not behind the corner. Your
proposal is very effective.

>
> What I think needs to happen to get this designed:
>
>
> a. The file compiler and function compiler should be based on a shared
> 'method compiler' which takes its context from the caller instead of
> the existing way of basing the file compiler on the function compiler.
>
> b. We need to implement a linker phase separate from the pass2 phase,
> which can re-order arguments and call sibling methods instead of Lisp
> Function objects.
>
> c. We need to find a way to correctly handle the interaction between
> the successive IN-PACKAGE, DEFPACKAGE, EVAL-WHEN, etc, forms appearing
> in the input file and the initialization of fields in the resulting
> class file.

Off the top of my head (but the standard might imply otherwise) the
only problem is IN-PACKAGE (and DEFPACKAGE) and that can be solved by
1) adding a static initializer to the fasl-class that pre-installs all
the packages and 2) serializing all symbols explicitly as
package::symbol. Everything else (mostly EVAL-WHEN combined with stuff
that affects the reader) is handled by isolating the reader used to
parse serialized stuff in the class from the reader used to read forms
in the fasl, and I believe it's already like that - isn't it?

> d. We need a way to expose the external entry points to the lisp world.

We could lazily construct (by generating bytecode at runtime) a
LispFunction instance that calls the right method. It would be
generated the first time someone calls, or otherwise references, a
function in the fasl, and then associated with the symbol as its
symbol-function. Hopefully these runtime-generated classes will be in
much smaller numbers than if we compiled each and every function to a
separate class. Additionally with invokedynamic the LispFunction will
only need to be generated when the function is reified (basically the
first time someone directly reads a symbol's function slot), while
regular function calls could be directed to the target method.

> e. We need to find a way to split fasls over multiple class files.
>
>
>
> Comments?
>
>
> Bye,
>
>
> Erik.

Cheers,
Alessio