[armedbear-devel] Some thoughts on classloaders, inner classes and in-memory compilation

Mon Sep 21 21:37:10 UTC 2009

Hello folks,

I'd like to share some thoughts on the "usual" problem of in-memory
compilation in abcl. I haven't written any code to support my thoughts
- I just hope to make things clearer for me and other people on the
list, and I'm asking you to share your ideas and comment mine. Warning
- this is going to be a long post.

Let's start by describing how abcl works now. Every piece of code in
the JVM needs to be contained in a method of some class, so when abcl
compiles a function it produces - surprise - a class, or better, a
stream of bytes that the JVM knows how to interpret to create a class.
Classes produced by the compiler extend a common abcl class providing
the methods that will be used to actually invoke the function; the
compiler, among other things, will override some of those methods.
When using the runtime compiler, the class is immediately loaded; when
using the file compiler, it is stored on the filesystem for later use.
So far, nothing requires temporary files to work.
However, a Lisp function can contain nested functions introduced with
FLET or LABELS, and those have to be compiled to classes as well.
Also, they need to be loaded contextually with the main function. How
does abcl solve this? By adding instructions in the main class to load
the local functions when the class itself is loaded, relatively to
where it is loaded from. And here lies the problem. That location is
always assumed to be a file in some filesystem subtree. So even when
using the runtime compiler all the classes must be, at least
temporarily, stored in files for the load machinery to work. Changing
this is hard; classes are resolved from strings, how do know when a
given string represents a file and when it represents some object in
memory instead? There are workarounds, but I think it's the approach
itself that's brittle.

So let's step back a bit and take a look at how the JVM and Java the
language work with respect to loading classes. The JVM uses dedicated
objects called classloaders. They are responsible of translating from
a class' symbolic name (a string) to a class metaobject, much like the
CLOS find-class function does. Classloaders are organized
hierarchically: every classloader has a parent which is first
consulted to see if it already has the class (there is of course a
built-in bootstrap classloader to break the circularity); if it has,
it returns it, if it hasn't, it is loaded in a manner dependent to the
particular classloader (e.g. from a file, from http, from memory,
...). The process of loading a class from a byte array is native in
the JVM, so classloaders only get to decide where the byte array comes
from and what it contains.
Now to the more interesting things:
1. A class never exists in isolation; to do its work it will need to
refer to other classes (at a bare minimum, its superclass and any
interface it implements). The JVM - automatically! - uses the same
classloader to load a class and, at linking time, all of its
dependencies.
2. If I had to manually redo in Java what the abcl compiler does with
functions, I'd use static inner classes to represent local functions.
Inner classes are classes which are textually defined inside another
class and share some data with it. Inner classes do not exist at the
bytecode level, only at the Java language level: the compiler (javac)
translates them to regular classes, with their name mangled. For
example, a class Inner defined inside a class Outer will be referred
to Outer.Inner in Java, but compiled to Outer$Inner.class by javac.
3. Inner classes then are treated exactly like the others: referred
using strings inside code, resolved by a classloader (generally -
always? - by the classloader of the containing class).

Return to abcl. As you may have guessed, I propose that we no longer
make classes autoload their dependencies, but properly use
classloaders instead, in a fashion similar to how inner classes work.
We will have an InMemoryClassLoader which will load classes from a
Map<String, byte[]>, and a slightly extended URLClassLoader to load
classes from the filesystem. Both, in addition to load classes, will
be used by the compiler to write classes as well, so it will continue
to use the class-file abstraction, changing only the code that
actually writes the bytes. Every time the compiler would have written
a call to loadCompiledFunction(classname) it will now use something
like functionFoo.class.getClassLoader().loadClass(classname) where
functionFoo is the compiler-generated name of the class representing
the compiled Lisp function. Everything else should stay the same.

Does this sound convincing? I admit I have left many things to
elaborate on, and I haven't rehashed the code in the compiler, going
mainly from memory instead. But I believe this approach has not been
proposed before and looks doable. The next few days I'll try writing
some sketch code to back up my ideas, if no-one finds any serious
problem with them.

Peace,
Alessio