[elephant-devel] Online garbage collector

Ian Eslick eslick at media.mit.edu
Mon Jan 5 23:43:25 UTC 2009


I've just checked in a long-awaited online garbage collection scheme.   
It's not heavily tested or optimized, but you can experiment with it  
right away.  Simply start the following function in a thread from your  
application, or from the repl, and see what happens.

(ele::mark-and-sweep-gc *store-controller* &optional test)

WARNING: this is a prototype and cannot be guaranteed not to lose  
important data.  Use only on a database you don't care about!

When test is true, the collector only prints out all objects that it  
will reclaim, leaving a btree in ele::*mark-table* with the list of  
marked objects.

LIVENESS

The collector is not really very useful right now for objects other  
than btrees and psets as all persistent classes are written to a class  
index by default.  If you pass :index nil as a class initarg you will  
lose the ability to (get-instances-by-class), but any non-live objects  
in the store will be reclaimed.  You can either redefine the class to  
inhibit indexing or do it explicitly via: (setf (ele::%class-indexing  
class) nil).

Liveness is identical to the notion of reachability used by migration  
with one exception, the store's instance cache is also treated as a  
root.  Thus all persistent objects reachable from the memory cache,  
root btree, and class indexes are considered live.  The mark pass  
recurses on both persistent aggregates (e.g. btree, pset) as well as  
standard aggregates (e.g. structures, standard classes, arrays, etc).

You may want to run in-memory GC prior to running persistent GC to  
improve reclamation.  One potential issue: I have noticed that btrees  
that are stored in in-memory variables are dropped from the cache (a  
weak hash table) under SBCL so it's possible that persistent objects  
that are live in memory are dropped from the cache and can be  
reclaimed, leaving dangling references.  This may be a bug in weak  
pointers in the version of SBCL I'm using.

PERFORMANCE / CONTENTION

The mark pass runs all at once, but is predominantly doing reads which  
shouldn't result in much contention.  A fresh btree is used to keep  
track of marks.  For large DBs, this strategy involves long-lived  
transactions and may exhaust locks or cache memory and so a future  
revision should do the mark pass in a discrete series of steps (this  
means doing bookkeeping and checkpointing the recursive descent of the  
tree from the root which is somewhat odious).

The sweep pass is done with one transaction per class to minimize  
contention; reclaiming storage can touch quite a few database pages.

TURNKEY

Later, when I have some time, I'll add an optional bordeaux-threads  
model so you can simply start the collector as part of open-store and  
put frequency and other parameters to control behavior into my- 
config.sexp.  For now you can simply manually start a thread that runs  
the mark-and-sweep-gc function to completion on a given store.

Cheers,
Ian

PS - As a bonus for BDB users, you can call (optimize-layout store)  
after the gc which will attempt to defragment and optionally return  
free pages to the disk.






More information about the elephant-devel mailing list