[elephant-devel] Online garbage collector

Ian Eslick eslick at media.mit.edu
Tue Jan 6 13:15:36 UTC 2009


As expected there are a few issues with the current collector in  
guaranteeing that all live objects are marked without having stopping  
the world.  Any new object are automatically and conservatively  
marked, but any operations that move a reference from in front of the  
mark process to behind it in parallel will cause a live variable to be  
unmarked.  The controller can keep track of any OIDs that were written  
and do a quick re-mark phase on those objects that does stop the world.

For very large heaps keeping track of written objects becomes too  
expensive and we'll need to look into generational approaches (which  
adds quite a bit of complexity).  Still, this is a quicker and simpler  
way to clean up your heap than migration.

So two additions needed for a full online solution:
- A way to stop the world
- A way to track written oids

Ian

On Jan 5, 2009, at 6:43 PM, Ian Eslick wrote:

> I've just checked in a long-awaited online garbage collection scheme.
> It's not heavily tested or optimized, but you can experiment with it
> right away.  Simply start the following function in a thread from your
> application, or from the repl, and see what happens.
>
> (ele::mark-and-sweep-gc *store-controller* &optional test)
>
> WARNING: this is a prototype and cannot be guaranteed not to lose
> important data.  Use only on a database you don't care about!
>
> When test is true, the collector only prints out all objects that it
> will reclaim, leaving a btree in ele::*mark-table* with the list of
> marked objects.
>
> LIVENESS
>
> The collector is not really very useful right now for objects other
> than btrees and psets as all persistent classes are written to a class
> index by default.  If you pass :index nil as a class initarg you will
> lose the ability to (get-instances-by-class), but any non-live objects
> in the store will be reclaimed.  You can either redefine the class to
> inhibit indexing or do it explicitly via: (setf (ele::%class-indexing
> class) nil).
>
> Liveness is identical to the notion of reachability used by migration
> with one exception, the store's instance cache is also treated as a
> root.  Thus all persistent objects reachable from the memory cache,
> root btree, and class indexes are considered live.  The mark pass
> recurses on both persistent aggregates (e.g. btree, pset) as well as
> standard aggregates (e.g. structures, standard classes, arrays, etc).
>
> You may want to run in-memory GC prior to running persistent GC to
> improve reclamation.  One potential issue: I have noticed that btrees
> that are stored in in-memory variables are dropped from the cache (a
> weak hash table) under SBCL so it's possible that persistent objects
> that are live in memory are dropped from the cache and can be
> reclaimed, leaving dangling references.  This may be a bug in weak
> pointers in the version of SBCL I'm using.
>
> PERFORMANCE / CONTENTION
>
> The mark pass runs all at once, but is predominantly doing reads which
> shouldn't result in much contention.  A fresh btree is used to keep
> track of marks.  For large DBs, this strategy involves long-lived
> transactions and may exhaust locks or cache memory and so a future
> revision should do the mark pass in a discrete series of steps (this
> means doing bookkeeping and checkpointing the recursive descent of the
> tree from the root which is somewhat odious).
>
> The sweep pass is done with one transaction per class to minimize
> contention; reclaiming storage can touch quite a few database pages.
>
> TURNKEY
>
> Later, when I have some time, I'll add an optional bordeaux-threads
> model so you can simply start the collector as part of open-store and
> put frequency and other parameters to control behavior into my-
> config.sexp.  For now you can simply manually start a thread that runs
> the mark-and-sweep-gc function to completion on a given store.
>
> Cheers,
> Ian
>
> PS - As a bonus for BDB users, you can call (optimize-layout store)
> after the gc which will attempt to defragment and optionally return
> free pages to the disk.
>
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel





More information about the elephant-devel mailing list