[elephant-devel] Postmodern backend; are values ever removed from the blob table

Alex Mizrahi killerstorm at newmail.ru
Fri Dec 12 16:15:19 UTC 2008


 RS> I have two other minor worries about this technique. First, it would
 RS> seem possible that if, say, we have slot1 in object1 which was set to
 RS> :some-opt at some time in the past, but is now totally unused by any
 RS> object, the blob row containing that would be considered dead; if
 RS> someone comes along while the cleanup is running and sets some slot to
 RS> :some-opt, then it will be in use again, but potentially get deleted
 RS> anyway, because it will have already been deemed dead.
 RS> This isn't really a problem for me; due to the nature of my dataset the
 RS> chances of this happening are remote, but I can imagine it'd be an
 RS> issue for some people.

yep, this might be a problem. unfortunately i do not see general solution
to this problem without modifying blob table and likely introducing 
considerable
overhead.

first possible solution would be to mark blobs with a serial numbers on
each sp_ensure_bid call, even if it finds it in table. then garbage blob
collector could just filter out ones that were touched after it have started 
working.
unfortunately this adds overhead of writinng data into database.

another solution is to do a garbage collection in mark & sweep style --  
first
mark all objects white, then go through the table and mark reachable black.
at same time, if sp_ensure_bid sees white blob, it marks it black too. then 
in
final stage delete all white objects. a race condition during sweeping stage
will be handled by postgresql's transactions -- if conflict happens, one of
transactions will fail and would be restarted. disadvatage of this solution
is that it will require large amounts of writing during collection. however 
it
also does not require any temporary storage and works entirely on database
side, and sp_ensure_bid performance is not impacted when no collection is
going on.

and totally different approach would be to delete blobs as soon as they get
unused, via triggers or something.

 RS> Secondly, there's the possibility that somebody could put a bid in,
 RS> say, a serialized list.

i think postmodern backend itself never does this, and as blobs are internal
to backend, neither should users mess with blobs.

 RS> Beyond those, though, in the normal case I am correct to assume that a
 RS> blob entry can be considered totally unused if not referred to in
 RS> either slots or any tree?

i guess so. blob ids might be also temporarily kept in memory, but only 
within
a single transaction, so that shouldn't be an issue. neither blob ids are 
cached. 







More information about the elephant-devel mailing list