[elephant-devel] Online garbage collector
Ian Eslick
eslick at media.mit.edu
Mon Jan 5 23:43:25 UTC 2009
I've just checked in a long-awaited online garbage collection scheme.
It's not heavily tested or optimized, but you can experiment with it
right away. Simply start the following function in a thread from your
application, or from the repl, and see what happens.
(ele::mark-and-sweep-gc *store-controller* &optional test)
WARNING: this is a prototype and cannot be guaranteed not to lose
important data. Use only on a database you don't care about!
When test is true, the collector only prints out all objects that it
will reclaim, leaving a btree in ele::*mark-table* with the list of
marked objects.
LIVENESS
The collector is not really very useful right now for objects other
than btrees and psets as all persistent classes are written to a class
index by default. If you pass :index nil as a class initarg you will
lose the ability to (get-instances-by-class), but any non-live objects
in the store will be reclaimed. You can either redefine the class to
inhibit indexing or do it explicitly via: (setf (ele::%class-indexing
class) nil).
Liveness is identical to the notion of reachability used by migration
with one exception, the store's instance cache is also treated as a
root. Thus all persistent objects reachable from the memory cache,
root btree, and class indexes are considered live. The mark pass
recurses on both persistent aggregates (e.g. btree, pset) as well as
standard aggregates (e.g. structures, standard classes, arrays, etc).
You may want to run in-memory GC prior to running persistent GC to
improve reclamation. One potential issue: I have noticed that btrees
that are stored in in-memory variables are dropped from the cache (a
weak hash table) under SBCL so it's possible that persistent objects
that are live in memory are dropped from the cache and can be
reclaimed, leaving dangling references. This may be a bug in weak
pointers in the version of SBCL I'm using.
PERFORMANCE / CONTENTION
The mark pass runs all at once, but is predominantly doing reads which
shouldn't result in much contention. A fresh btree is used to keep
track of marks. For large DBs, this strategy involves long-lived
transactions and may exhaust locks or cache memory and so a future
revision should do the mark pass in a discrete series of steps (this
means doing bookkeeping and checkpointing the recursive descent of the
tree from the root which is somewhat odious).
The sweep pass is done with one transaction per class to minimize
contention; reclaiming storage can touch quite a few database pages.
TURNKEY
Later, when I have some time, I'll add an optional bordeaux-threads
model so you can simply start the collector as part of open-store and
put frequency and other parameters to control behavior into my-
config.sexp. For now you can simply manually start a thread that runs
the mark-and-sweep-gc function to completion on a given store.
Cheers,
Ian
PS - As a bonus for BDB users, you can call (optimize-layout store)
after the gc which will attempt to defragment and optionally return
free pages to the disk.
More information about the elephant-devel
mailing list