[rucksack-devel] Re: Rucksack, ECLM

Thu May 18 11:46:49 UTC 2006

"Arthur Lemmens" <alemmens at xs4all.nl> writes:

> I'm not totally sure why you put it like that.  Let me try to explain
> what I expect from transactions:
>
> - Once a transaction has committed, you should have a guarantee that
>   your changes are made persistent: when you reboot Rucksack after
>   a commit, it should find all chose changes.  That's the durability
>   part.
>
> - Transactions should be all-or-nothing.  If an error occurs during a
>   transaction and a transaction is aborted (rolled back), all of the
>   changes made by that transaction should be undone.  That's the
>   atomicity part.
>
> - Parallel transactions should behave as if they happened sequentially.
>   They should have the same effect as if the oldest transaction has
>   started and completed its job before all younger transactions.  That's
>   the isolation part I think.  (Maybe it's also the consistency part;
>   but it always seemed to me that consistency must be guaranteed by
>   the application, not just by the persistence library.)

this is the isolation part. the consistency part is generally used
when talking about foreign key references, which i don't think really
apply to rucksack (except for the constraint that if object A has a
reference to object B then B must exist).

however i disagree with "They should have the same effect as if the
oldest transaction has started and completed its job before all
younger transactions" i believe it should read:

"They should have the same effect as if all the operations happend in
the same instant as the commits, and they should have no affect at all
in the case of an abort or a rollback."

the difference is that an older transaction has no implicit priority
over a younger one, the only thing that matters is the order of the
commits (or aborts). the problem of a long running transaction getting
starved by a constant flow of short transactions remains, as does the
problem of a long running transaction blocking all other
transactions. this is a well known problem and has various solutions
(atm i can't remeber all the details), the one solution i believe is
best is to randomly choose the victim (but giving lower kill
probabilities to old transactions).

>> Scenario 1.
>>
>>  A enters.
>>  A increments C -> 1.
>>  B enters.
>>  B increments C -> 2.

this is wrong.

>>  A aborts (rollback).
>>  B commits.
>
> Good point.  I think B shouldn't be allowed to commit as is, because it
> depends on a value that was modified by A.  The simplest solution seems
> to be to abort B when A is aborted.  In general, we could abort a
> transaction if it depends on a value that was created/modified by a
> transaction that's being aborted.  I think this is doable in practice;
> Rucksack already keeps track of all the information that's necessary
> to do this.

At the time of B's commit we have a history of:

read(C,A); write(C,A); read(C,B); write(C,B); abort(A); commit(B)

This is a serializable history, and it must be equivalent to:

read(C,B); write(C,B); commit(B)

due to the isolation constraint. 

so we need to end up with C -> 1. since write(C,A) creates a new C,
thanks to multiple versioning, I don't see a problem with implementing
this.

> This seems to be consistent with my transaction requirements above.

except for the fact that C should be 1 (in both B and A's
transactions) I agree.

>> Scenerio 2.
>>
>>  A enters.
>>  A increments C -> 1.
>>  B enters.
>>  B increments C -> 2.
>>  B commits.
>>  A aborts (rollback).
>
> This one is more problematic.  After B commits, the application should
> have the guarantee that B's values won't change anymore.  But when A
> aborts, the correct solution seems to be to abort B too (because B has
> a value that depends on A).  The only solution I can come up with is
> to prevent B from committing until A has committed.  That means it would
> have to block until A has committed, but that may destroy part of the
> performance advantage that we could get from multiple versioning.

nb: as above I believe B should have set C to 1, not 2.

since transactions occur in complete isolation we don't look at
conflicts until the transactions commit and only in that instant
should you decide who wins and who loses. therefore this history isn't
any more problematic than the first and A needs to abort. Since A and
B would conflict it may be neccessary to block B's commit for a
certain amount of time so that we keep the option of aborting B
(assuming A is in an important, but long runnig, transaction) open.

I hope this makes sense and i hope i'm not making too great of a fool
of myself.

-- 
-Marco
Ring the bells that still can ring.
Forget the perfect offering.
There is a crack in everything.
That's how the light gets in.
	-Leonard Cohen