[elephant-devel] How should I handle a controller-lost-error?

Ian Eslick eslick at media.mit.edu
Mon Oct 5 01:45:44 UTC 2009


There was a bug with the get-con function that returns nil after the  
restart.  I just checked in a fix for that and went ahead and  
implemented your suggestion for a saner error + restart model.  An  
interactive error is asserted which provides a 'reopen-controller  
restart.  Keep in mind that re-opening the store does not bind *store- 
controller* so you have to be careful with your uses of with- 
transaction in a multi-store environment.

A few notes.  New objects are created in the store provided in the :sc  
argument to make-instance.  The default is to use the current dynamic  
value of *store-controller*.  Once created, objects know what their  
home store is and get-con is used to retrieve it.  Within the body of  
a transaction, *store-controller* is bound to the store on which that  
transaction is run so any new objects that don't have an explicit :sc  
argument will be created in that store.  You'll get a nice error if  
you try to create an object, provide no :sc designation and *store- 
controller* is nil.

For the web responding function you describe, with-transaction will  
handle most cases of error as it aborts a transaction on an error and  
the web page should respond with a 500 error.   If the store has  
closed, this is basically a global and probably catastrophic server- 
level rather than request-level problem.  There is no reason you  
couldn't reopen the store as part of a recovery procedure however.

Frankly the code paths that are exercised when *store-controller* is  
nil or a store is closed have not been heavily used or terribly well  
thought through.  All the usage models for Elephant that I'm aware of  
have been single-store settings where *store-controller* is bound once  
globally.  The only real reason I can see to use with-open-store is in  
a multiple store environment and the issues involved with this are not  
well documented, as you rightly point out.

We have been adding more signals (thanks mostly to Leslie) for  
serialization and other errors since the last revision of the manual,  
but   I'm not sure when one of us will have the time to update the  
manual.

Is this helpful?

Ian

On Oct 3, 2009, at 5:28 PM, Alain Picard wrote:

>
> Dear Elephants,
>
> I'm trying to understand how one is supposed to recover from
> store-controller errors, and what the anticipated usage pattern
> for when to open controllers is.  Consider this example:
>
>   (with-open-store (*elephant-connection-spec*)
>     (setq u (last-room-update :room nil)))
>
>   => #<WAITING-ROOM-UPDATE oid:86987>
>
> If you then attempt to access a slot on u, you get
> a CONTROLLER-LOST-ERROR signalled, with a continue
> restart saying "do you want to reopen":
>
>  (timestamp u)
>   ==>
>   Condition #<CONTROLLER-LOST-ERROR #x300045D5CD5D>
>      [Condition of type CONTROLLER-LOST-ERROR]
>
>   Restarts:
>    0: [CONTINUE] Open a new instance and continue?
>    1: [RETRY] Retry SLIME REPL evaluation request.
>    2: [ABORT] Return to SLIME's top level.
>    3: [ABORT-BREAK] Reset this thread
>    4: [ABORT] Kill this thread
>
> This occurs because WITH-OPEN-STORE has now closed the controller
> which is referred to in the instance U.  Fine.
> Now, I want to find out how to handle lost controllers,
> and automatically restart them.  I attempted something like this:
>
> (handler-bind ((controller-lost-error #'(lambda (c)
> 					  (describe c)
> 					    (let ((r (find-restart 'continue c)))
> 					      (when r
> 						(print 'continuing)
> 						(invoke-restart r))))))
>  (timestamp u))
>
> However, that doesn't work because you then get

>    There is no applicable method for the generic function:
>      #<STANDARD-GENERIC-FUNCTION ELEPHANT::PERSISTENT-SLOT-READER  
> #x30004284B1EF>
>    when called with arguments:
>      (NIL #<WAITING-ROOM-UPDATE oid:86987> TIMESTAMP)
>       [Condition of type SIMPLE-ERROR]
>
>    Restarts:
>     0: [CONTINUE] Try calling it again
>     1: [RETRY] Retry SLIME REPL evaluation request.
>     2: [ABORT] Return to SLIME's top level.
>     3: [ABORT-BREAK] Reset this thread
>
> Trying the RETRY restart at this point _does_ succeed, however.
>
> I think it's a bug to encounter this second condition; i.e. I think
> after the first CONTINUE, when the controller gets successfully re- 
> opened,
> GET-CON should be able to return the new controller immediately,
> so that the retried SLOT-VALUE accessor should then succeed.
>
> Secondly, and, less important, I think using CERROR/CONTINUE is
> a bit too generic for this class of error; I think it'd be much
> friendlier to be able to invoke a specific restart, like REOPEN-LOST- 
> CONTROLLER,
> so I could write some sort of REOPENING-STORE macro.
>
>
> Maybe I'm missing something basic, here, but the reason I delved into
> this is that I noticed that opening a new store controller is a
> HUGELY expensive operation; so an idiom like
>
>    (defun some-web-responding-method (...)
>      (with-open-store (spec)
> 	(with-transaction ()
> 	  (do-stuff))))
>
> ends up being really, really, really slow; almost 1000 times slower
> than just
>
>    (open-store spec)
>
>    (defun some-web-responding-method (...)
>      (with-transaction ()
> 	(do-stuff)))
>
> That's fine - I understand why that is, but the above seems very  
> unsafe.
> What am I supposed to do or catch if there are store related errors?
> So I was trying to write something sort of like
>
>    (defun some-web-responding-method (...)
>      (reopening-store
> 	(with-transaction ()
> 	  (do-stuff))))
>
> if you know what I mean.  Is this misguided?  There a section in
> the manual about "Design Patterns" and "Multithreaded Web  
> Applications",
> but it seems very incomplete and I think it should discuss at length
> these sorts of issues.
>
> I guess I'm also unsure of what _other_ sorts of errors might get
> thrown and how I'm supposed to handle them during "normal" elephant  
> work.
> A section documenting the responsibilities of the programmer in this
> area would be invaluable, IMHO.
>
> Thanks for any pointers!
>
>       	   --Alain Picard
>
> -- 
> Please read about why Top Posting
> is evil at: http://en.wikipedia.org/wiki/Top-posting
> and http://www.dickalba.demon.co.uk/usenet/guide/faq_topp.html
> Please read about why HTML in email is evil at: http://www.birdhouse.org/etc/evilmail.html
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel





More information about the elephant-devel mailing list