[elephant-devel] Understanding real-world use of Elephant

Fri May 25 21:29:17 UTC 2007

On May 25, 2007, at 4:32 PM, lists at infoway.net wrote:

> Hello Ian, Robert, and Henrik
>
> I'll try to comment based on the responses received from the three  
> of you in this single thread so as to minimize the posts. Before  
> proceeding, let me just clarify that I am only interested in using  
> the BDB backend.
>
>> I would have to disagree about the documentation for Elephant not  
>> being
>> abundant---Ian has written a 118 page manual.
>>
>> Nonetheless, you are correct that the use of Elephant in a multi-
>> threaded webserver environment is not heavily documented.  Ian and I
>> have discussed the need for a killer "example app" and eagerly await
>> someone contributing one.
>
> First of all, I want to apologize if my comment came across the  
> wrong way. I know that Ian (and whoever else has been contributing)  
> has done a superb job at enhancing Elephant's documentation. It  
> definitely has come a long way. I first had a bit of difficulty  
> finding the latest documentation since I couldn't find it online. I  
> then learned that it came in the doc directory and that you had to  
> "make" it. Anyway, it is great!

Documentation has always been available online so only developers  
updating the web site or editing the documentation will need to  
'make' them.  The new site has documentation at: http://common- 
lisp.net/project/elephant/documentation.html.  This page can be  
reached by clicking the "documentation" link which you can find in  
the leftmost column of the home page.  You can jump directly to the  
latest online texinfo style HTML by clicking 'Online Docs' in the  
upper right hand corner of the home page.  Do you know what caused  
you to miss these links to documentation?  Was there anything  
confusing about our site that we could fix?

> As far as multi-threaded webserver environment, I know there was a  
> section about it in the doc (section 6.5) but, as you said, it's  
> not very elaborate.

Read 4.10, 4.11 and 4.13.  6.5 needs more work to server as a proper  
example.  Section 6 mostly has placeholders at present.  I'll see  
about expanding on 4.10 and 6.5 as time allows.

>
>>> However, I don't have such a strong background on using ODBs and  
>>> mainly come from
>>> the SQL world. So, just for curiousity's sake, I read the  
>>> tutorial for
>>> AllegroCache which tries to show the "proper" way to use  
>>> AllegroCache in
>>> real-world systems
>>> (http://www.franz.com/products/allegrocache/docs/ 
>>> acachetutorial.pdf).
>
> I'd like to clarify my comment above. Because I read several  
> AllegroCache documents, I misreferenced the document I really  
> wanted to reference.
>
> The document in question is title "AllegroCache with Web Servers"  
> and can be found here: http://www.franz.com/products/allegrocache/ 
> docs/acweb.pdf

As you comment below, reading the acache document created a great  
deal of confusion!  Please ignore it.  While the object and class  
interfaces are similar, the system implications and usage models can  
be very different so as you comment, you are comparing apples and  
oranges.

>
>> In the first place, one has to go back to basics a little bit.   
>> Whenever
>> you have concurrency, you have to have concurrency control.   
>> Personally,
>> I think to think of this at the object level, but I know it is now
>> common to think of it at the "database level".  You are generally
>> correct that if you are using SQL (or BDB, for that matter) as a
>> database and you keep ALL of your state in the database, then you can
>> generally rely on the concurrency control of those systems to  
>> serialize
>> everything and not allow two threads to interfere with each other.
>> However, almost ANY application will have to think about  
>> concurrency; if
>> your are SQL-orieneted, you will do this by defining  
>> "transactions".  If
>> you define your transactions wrong, you will have concurrency errors,
>> even though the database serializes transactions perfectly.
>>
>> For example, generally, since the Web forces us into "page based" or
>> "form based" interaction (which javascript, CSS and Ajax promptly
>> complicate), one can generally think of a web applications as "one- 
>> page
>> turn == one transaction".  But even that might not be true---you  
>> could
>> take 10 page turns to submit an order, and the order must be  
>> atomic---
>> that is, you either order the thing with 10 pages of  
>> specifications, or
>> you don't order it.  A half-order is a corrupt order.
>
> I agree with you that, in general, when dealing with web  
> applications involving multiple clients and servers, you have to  
> have concurrency control. How much do you have to have in your own  
> application vs how much does the "database framework" offer is, in  
> my opinion, a good question.
>
> Making reference to the Allegro document, it says "In AllegroCache  
> a program opens a connection to a database and that connection can  
> only be used by one thread at a time." Then, as you read the  
> document and focus on their client-server model, they present  
> sample code that uses "thread-safe" connection pools, with a macro  
> named with-client-connectionwith-client-connection. "This macro  
> retrieves a connection from the pool.  If no connection is  
> available it will create a new connection but it
> will create no more than *connection count* connections. If all  
> connections are created and a new connection is needed the  
> requesting thread will be put to sleep until a connection is  
> returned to the pool."
>
> The macro is not the problem, since I could "think of" this macro  
> as something like Elephant's with-transaction. The problem, and the  
> overhead I was referring to in my original post is that, to perform  
> a basic operation such as to update a hash table value, they write  
> the function as this:
>
> (defun set-password-for-pool (user new-password)
>   (with-client-connection poolobj
>     (with-transaction-restart nil
>         (setf (map-value (or (poolobj-map poolobj)
>                              (setf (poolobj-map poolobj)
>                                (retrieve-from-index 'ac-map
>                                                     'ac-map-name
>                                                     "password")))
>                          user)
>           new-password)
>         (commit))))
>
> As you can see, there is some, possibly, unnecessary overhead in  
> the fact that you are getting a connection from the pool and then  
> obtaining a handle to the "password" hash table before anything can  
> be set. The reason, as I understand it, they do this is because  
> since each connection handle works independently in each thread,  
> each connection has to maintain a separate handle to each  
> persistent object class and their solution involves storing in the  
> poolobj structure a handle to the connection and a handle to the  
> hash table.
>
> So, if this was a more complex application, involving n persistent  
> classes with m persistent attributes per class, the overhead of  
> writing all this is significant. Assuming we follow the Elephant  
> recommendation in section 2.9.3 where actions should be reduced to  
> minimal operations and nest them with with-transaction/ensure- 
> transaction, I would have to write, potentially, 2*n*m defuns  
> (getter/setter) for all the attributes with all the code to fetch  
> and cache the handles to the connection and to the respective n  
> persistent class.
>
>> Elephant has the "with-tranaction" macro.  This is really the best  
>> way
>> to use Elephant in most circumstances --- but even then, if you are
>> keeping ANYTHING in memory, whether to cache it for speed or  
>> because you
>> don't want to put it in the database (session-based information  
>> would be
>> a typical example), you may still have to "serialize" access to that
>> state.  That unavoidable means that you have to understand some  
>> sort of
>> mutual exclusion lock; unfortunately, these are not 100% standard  
>> across
>> different lisps.  However, most lisps do provide a basic "with-mutex"
>> facility.  I use this in the DCM code (in the contrib directory) to
>> serialize access to a "director", which is very similar to a  
>> version of
>> Ian's persistent classes, but based on a "keep it all in memory  
>> and push
>> writes to the DB" model (that is, Prevalence.)
>
> The idea I have is to rely on the persistent data instead of in- 
> memory data. Once I get this going, I may decide to improve  
> performance with in-memory caches, or anything else. Just want to  
> get the concept going in a stable and scaleable format.
>
>> If you will forgive me over-simplifying things, if:
>> 1) You can legitimately think of every page turn as a transaction,  
>> and
>> 2) You keep all of the meaningful state in the Elephant DB, and
>> 3) You wrap your basic "process-a-page" function in "with- 
>> transaction",
>>
>> then you won't have a concurrency control problem.
>>
>> That is a completely appropriate style for certain relatively simple
>> web-applications; for other kinds of web-applications it would be  
>> very
>> constraining and slow --- but one should never optimize for speed  
>> before
>> it is necessary.
>
> I don't mind the over-simplification as long as I understand it :),  
> and I do. However, thinking back to the AllegroCache document, from  
> what I understood, they basically take a handle to the connection,  
> perform the operation, and then release the connection. If this was  
> a multi-page web operation, it seems that their recommendation  
> would not be most appropriate, IMHO, but then again, I don't know.

Connections and handles are completely different in elephant, acache  
docs are not helpful.

> In your recommendation, if I had a order entry system with multiple  
> pages to be completed before committing the order, I could  
> understand wrapping the whole thing with with-transaction. However,  
> wouldn't that present a possible problem locking resources and  
> leaving it up to the human user to complete the process before  
> committing or rolling back the transaction?

There are lots of ways to think about this.

One is that you keep track of the ongoing session using in-memory  
objects unique to the session.  When you need to manipulate a  
database (to submit an order, a blog entry, etc) the handler for the  
'submit' action uses with-transaction to take the data from the in- 
memory session object and commit it to the database (an entry in a  
per-user btree, adding a new instance to a class, etc).

If you need session history or want to maintain ongoing state, make  
this session object a persistent object instead.  Then each post or  
get action in the session is logged so you can recover if the user  
goes away for awhile, or there is a server error.  You will  
eventually fill up your disk with sessions (in the absence of GC) so  
you need to either drop the session objects when you are done with  
them or use a separate store for session objects and periodically  
delete and recreate it.  We still need a clean model for online GC of  
persistent objects to avoid explicit reclamation.

As for contention, with-transaction will retry the transaction code  
so if you have a POST handler you can do something like:

(defun handle-post-1
   (with-post-data
     (send-response-page
       (with-transaction ()
         <copy post data to persistent objects>
         <return response persistent object>))))

This way the update can robustly handle contention while the user  
only sees the final page that results from the update to the  
persistent object for that user/session.  If there is a real problem  
and the process fails, you can wrap (send-response-page ...) with a  
'handler-case' form that sends a server error page with a link to  
restart the transaction (perhaps with the session object so the form  
entries are properly initialized on the retry) if the transaction  
cannot be committed.

Failing transactions signal 'transaction-retry-count-exceeded.

If you are using BDB make sure that db_deadlock is running.  (Either  
with the :deadlock-detect keyword option or by running an external  
process (if using multiple lisp processes).

>> As a user of Elephant, you really shouldn't have to worry too much
>> about threading so long as you follow the simple rules laid out in
>> the manual under "multi-threading".  I think you are trying to
>> understand how we make this possible since it seems harder from your
>> read of the acache interface.
>
> You may be right. However, thinking more about this whole thing and  
> from my understanding of Elephant and what I understood of  
> AllegroCache, I may be trying to compare apples and oranges. They  
> may be similar systems, but I don't know if it makes justice  
> comparing Elephant with AllegroCache client-server model. If I  
> understand it correctly (now), the current implementation of  
> Elephant is more similar to AllegroCache stand alone (non-client- 
> server) model. So, each web process that accesses Elephant can do  
> so seamlessly with the standard *store-controller* (assuming a  
> single store controller) and not have to deal with having to manage  
> connection pools and all that. Keeping this in mind, I would also  
> assume that in Elephant, I don't have to keep a handle to each  
> persistent class for each connection. Maybe this is what confused  
> me and maybe I shouldn't be reading AllegroCache's documentation :)

Correct and correct.  We implement the physical storage persistent  
classes _very_ differently than acache and trying to compare the  
system implications of using each is likely to be more confusing than  
helpful.  Don't think of them as the same kind of system, they are  
two different systems optimizing different aspects of the common  
problem of implementing persistent classes.

>> A simple conceptual model is that each thread has its own
>> transaction.  If these transactions are executing concurrently, the
>> BDB library or SQL server logs the side read dependencies and the
>> side effecting writes until a commit or abort.  On abort, you throw
>> away the log.  On a commit, one transaction at a time writes their
>> side effects and cancels any transaction that has read or written one
>> of the entries written by the committing transaction.
>>
>> Thread isolation is guaranteed by a policy for using the BDB library
>> or SQL server.  Calls to these libraries are re-entrant.  The current
>> transaction ID (only used by one thread) determines where the reads
>> and writes are logged (this is a little more complex for SQL, but
>> handled by the data store and transparent to the user).
>
> I guess this goes back to what I just commented on the fact that  
> each web thread/request will use the connection in place in the  
> Lisp VM and not have to deal with establishing a new connection (I  
> could have checks to make sure that if the store controller is not  
> opened, I could open it, but once it's opened, I "shouldn't" have  
> to worry too much about it). Right?

Correct.  Elephant maintains a connection to a BDB session which  
maintains an open file of the underlying logs and database files.   
This is shared among threads because BDB is re-entrant and  
transaction ids are used to provide isolation in the presence of  
concurrency.

>
>> The only other thing we need to do is make sure that the parts of
>> elephant shared across threads are themselves protected by Lisp
>> locks.  Most of this is the store-controller and some data structures
>> used to initialize a new store controller.
>
> As an end-user application developer, do I need to worry about this  
> or should I expect the Elephant framework to handle it?

Elephant handles it, sorry if I confused the issue but I thought you  
were trying to understand how elephant implements thread safety.

>
>> If you stick to the user contract in the documentation, you shouldn't
>> have to worry further about interactions of multiple threads (other
>> than the usual considerations if you have shared lisp variables
>> instead of shared persistent objects).
>
> I would assume you are referring to my own application shared  
> variables and not Elephant-related variables, right?

Yes

>> I think that SQL databases are a safer bet than Berkeley DB
>> for having several processes on different machines talking to the  
>> same
>> store, so I will have one instance of postgresql running on a server
>> with scsi raid 10 and lots of ram.
>
> Henrik, would you mind elaborating more on this? Why would SQL  
> databases be safer than the BDB stores? I know they are handled by  
> separate processes, potentially, on separate machines, so in  
> essence, they are independent of your application. However, isn't  
> BDB designed just to tackle that using an application library  
> instead of a separate process?

The problem he is trying to solve is scaling computational power by  
using multiple CPUs and multiple servers.  This is doable with  
Elephant so long as each independent lisp image is using the same  
data store spec.  However, if you have two machines Berkeley DB in  
its normal mode will not work correctly as it's locking facilities  
require shared memory between all processes sharing a given disk  
database.  So the multiple-CPU problem is solved by using N lisp  
processes for N CPUs with shared memory.  However the multiple server  
problem requires a common server that all web servers can talk to.   
This is easier to setup with SQL than to write your own server on top  
of BDB.

> Overall, and being this my first experience with Lisp and ODBs, I  
> really like Elephant. After reading some of AllegroCache's  
> documentation, I would still prefer using Elephant. Maybe I'm  
> trying to see deeper than I need to. Maybe I just need to see more  
> samples of real-world applications. I would love to contribute  
> sample applications to the project so as to make it clearer and  
> easier for others to learn, but I guess, I have to learn it myself  
> first. Code-wise, I think I have grasped the whole thing. However,  
> since I currently have no ability to test anything to a larger  
> scale, I'm trying to understand what it would take for an  
> application that uses Elephant to work in a large scale system  
> (both hardware and software).

The biggest issue in scaling is when you think your application needs  
to be larger than a single server.  Elephant is great for single  
server applications.  When you scale to multiple servers it is  
because you are talking about high hundreds to thousands of  
concurrent sessions instead of dozens.  That kind of traffic likely  
requires a highly reliable substrate and I'm not sure Elephant is  
sufficiently hardened that I could recommend it for that kind of  
use.  Unless, of course, you want to pave new ground with it in which  
case I think Elephant can get there.

> Thanks again for everyone of your comments. They did in fact help  
> me and am sure you follow up comments will further help me even  
> more. Now, while you guys digest this and reply to my post, I will  
> go back and read the updated Elephant manual :)
>
> Thanks,
> Daniel

Good luck, when you figure all this out a detailed summary of the  
primary things that confused you would be helpful in improving the  
documentation.

> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel