[elephant-devel] Postmodern backend; are values ever removed from the blob table
Alex Mizrahi
killerstorm at newmail.ru
Fri Dec 12 16:15:19 UTC 2008
RS> I have two other minor worries about this technique. First, it would
RS> seem possible that if, say, we have slot1 in object1 which was set to
RS> :some-opt at some time in the past, but is now totally unused by any
RS> object, the blob row containing that would be considered dead; if
RS> someone comes along while the cleanup is running and sets some slot to
RS> :some-opt, then it will be in use again, but potentially get deleted
RS> anyway, because it will have already been deemed dead.
RS> This isn't really a problem for me; due to the nature of my dataset the
RS> chances of this happening are remote, but I can imagine it'd be an
RS> issue for some people.
yep, this might be a problem. unfortunately i do not see general solution
to this problem without modifying blob table and likely introducing
considerable
overhead.
first possible solution would be to mark blobs with a serial numbers on
each sp_ensure_bid call, even if it finds it in table. then garbage blob
collector could just filter out ones that were touched after it have started
working.
unfortunately this adds overhead of writinng data into database.
another solution is to do a garbage collection in mark & sweep style --
first
mark all objects white, then go through the table and mark reachable black.
at same time, if sp_ensure_bid sees white blob, it marks it black too. then
in
final stage delete all white objects. a race condition during sweeping stage
will be handled by postgresql's transactions -- if conflict happens, one of
transactions will fail and would be restarted. disadvatage of this solution
is that it will require large amounts of writing during collection. however
it
also does not require any temporary storage and works entirely on database
side, and sp_ensure_bid performance is not impacted when no collection is
going on.
and totally different approach would be to delete blobs as soon as they get
unused, via triggers or something.
RS> Secondly, there's the possibility that somebody could put a bid in,
RS> say, a serialized list.
i think postmodern backend itself never does this, and as blobs are internal
to backend, neither should users mess with blobs.
RS> Beyond those, though, in the normal case I am correct to assume that a
RS> blob entry can be considered totally unused if not referred to in
RS> either slots or any tree?
i guess so. blob ids might be also temporarily kept in memory, but only
within
a single transaction, so that shouldn't be an issue. neither blob ids are
cached.
More information about the elephant-devel
mailing list