Call for Interest: Clojure (or Lisp?) Code Camp with BLM focus

Tue Dec 8 16:39:10 UTC 2020

Pascal Costanza wrote:

>> 
>> 
>> > Parallel GC is no problem and implemented.
>> 
>> 
>> Which CL implementations have a parallel GC?

I'm a little late to the conversation, but we have been experimenting
with a multi-threaded garbage collector that runs mostly in parallel
with the lisp application code. The collector uses a mark-sweep
algorithm in which the application code runs in parallel with the
collector except for a few brief pauses.

Application threads must pause briefly during the mark phase to
allow references in their active frames to be recorded. This pause is
relatively short. Our current test system, heavily instrumented for
validation and performance-measurement purposes, and using just one
thread for the marking, is showing typical pauses for this initial
stack scan under 1 msec. After these initial references are
collected, the application threads run while the marker follows all
references to produce a live-object map that is almost correct. With
that initial map complete, all threads are stopped briefly to collect
any new references and add them to the map. In our test system these
reconciliation pauses are typically under 15 msecs and almost always
under 25 msec. 

After the live map has been completed, the application threads are
released to continue processing while the collector scans the live
map, collecting the storage occupied by dead objects into free
pools.

Once this is done, the collector can cycle back to do a new mark phase.

While we are currently testing the marker running on just one thread,
the code is essentially the same as the multi-threaded global gc
Allegro has had for several years now. The number of threads used by
the marker is a configurable option; experience suggests we can cut
the pause times in half by using three or four threads.

Mark-sweep collectors can be subject to storage fragmentation. Memory
is divided into areas each of which holds objects of one size. As
objects leave the live set, their storage is collected into pools
according to size. When several areas holding objects of the same size
become partially free, it can be useful to relocate live objects from
some of those areas into the others, reducing the memory footprint of
the live set and making the abandoned area available for objects of
some other size. Once the mark phase is complete, we have enough
information to recognize areas that could benefit from this sort of
reorganization. In order to move an object, however, we must adjust
all pointers to that object. This can only be done while all the
application threads are paused, and in order to keep application
pauses short, it's essential that we find all the in-heap references
to those objects without having to scan the entire heap.

Fortunately, the mark phase can do the work for us, providing us with
a map of those areas holding references to one of the objects
we wish to relocate. At the end of the second
pause-the-application-threads mark phase, we can see how much time
we've kept the application inactive, and estimate how much time it
will take to fix the pointers to the objects we want to move. If the
pase so far plus the estimated additional time is less than a settable
amount, we can perform the relocation and reap the benefits. If not,
we can check again next time.

Current efforts revolve around global optimization and the proper set
of user-visible controls and reports to help optimize performance. The
goal of the concurrent gc is to let us apply some of the extra memory
and cpu power available in today's machines toward elimination of the
lengthy and unpredictable gc pauses that can make lisp unattractive to
human users. Real-time response isn't achieved, and isn't the goal.

I know people will want to know when it's going to be released, but we
don't currently have a release date set.

Kevin Layer