[elephant-devel] Querying Advice [w/code example]

Mon Nov 13 18:43:03 UTC 2006

Ok.

I got Elephant to work again with SBCL on PPC. I guess I was still  
using BDB 4.3 when 4.4 seems to be required. I couldn't find that  
anywhere in the docs.

I see what you're saying and can start to envision where this can go.  
I will keep playing and provide more feedback later today.

FYI, for curiosity purposes, I just ran (random-users 5000) on an  
empty database and it took 5 min 56 secs to complete. At the end, the  
%ELEPHANT file was 9MB in size and both log files where 10MB. That's  
kind of big considering what it does.

I then made the random-users2 function to look like:

(defun random-users2 (n)
   (setq *auto-commit* nil)
   (with-transaction ()
     (dotimes (x n)
       (let ((u (make-instance
                 'User
                 :uname (format nil "user~A" x)
                 :pword (random-password)
                 :email (format nil "user~A at .nowheresville.org" x)
                 :fullname (format nil "~A~A ~A~A" (random-password)  
x (random-password) x)
                 :balance (random 100))))
         (add-to-root x u))))
   (setq *auto-commit* t))

When I ran (random-users2 5000) I got a Berkeley DB Error: Cannot  
allocate memory, so I changed random-users2 to:

(defun random-users2 (n)
   (setq *auto-commit* nil)
   (start-ele-transaction)
   (dotimes (x n)
     (if (eq (mod x 1000)
             0)
         (progn
           (commit-transaction)
           (start-ele-transaction)))
     (let ((u (make-instance
               'User
               :uname (format nil "user~A" x)
               :pword (random-password)
               :email (format nil "user~A at .nowheresville.org" x)
               :fullname (format nil "~A~A ~A~A" (random-password) x  
(random-password) x)
               :balance (random 100))))
       (add-to-root x u)))
   (commit-transaction)
   (setq *auto-commit* t))

When I ran (random-users2 5000) on an empty database, this time it  
only 25 secs to complete. The %ELEPHANT file was still 9MB in size  
and both log files where also 10MB.

I suppose your code was not designed for performance and only for  
illustration purposes. Needless to say, knowing how to use  
transactions certainly helps and can dramatically affect application  
performance.

My only concern at this moment (which I also mentioned in another  
email) is the size of the data files. Whether or not that only  
reflects the persistent storage and not necessarily the memory  
footprint, I don't know. Therefore, if I loaded my database with  
650,000 customer records, the data files will easily exceed 1GB of  
storage, and that's just one "table".

Thanks,
Daniel

On Nov 12, 2006, at 6:45 PM, Robert L. Read wrote:

> Dear Daniel and Team,
>
>     I think the code below, which I have tested on SBCL,  
> illustrated a typical problem that Daniel Salama introduces.  To  
> paraphrase, you have a datatype (perhaps compound) which has a lot  
> of slots; you have a GUI, perhaps web-based, that you use to both  
> select or filter the large database, and to decide how to present  
> sort the results.  I've written the below example as if you  
> operating directly on the slots.  The fact that there are often  
> intervening functions does not fundamentally change the problem.   
> (An example of this is storing a timestamp as an integer, but  
> presenting it in a human-readable format.)
>     SQL supports a powerful querying ability based on both  
> selection and sorting.  One might think that this is an advantage  
> of SQL; it is conventional reason that this is actually an  
> advantage of using a relational database.  However, since LISP  
> treats functions as first-class citizens that can be constructed  
> dynamically, you actually have a full Turing-complete capabilities  
> in doing queries that SQL cannot match.  This same ability applies  
> to sorting; you can sort on any lexical order that you can program.
>     In practice, however, one doesn't always need this power.  More  
> typically, a user will select fields that they want to use to  
> filter the results (that is, construct a query from), and perhaps  
> how they would like the results to be sorted.  I assume that you  
> know how to interpret an HTTP query or a McClim user interface or  
> something to associate the GUI with underlying functions.  (My  
> personal framework has a way to do this, and UCW is probably the  
> most common or famous way to do it now.)
>     The code below generated 100 random "users".  The bare act of  
> defining this class defines accessor-functions that we can use in  
> dynamically constructed lists as below: (list 'username-of 'balance- 
> of).  I have written very small functions that use such lists  
> either to define define "lexicographic" sort orders based on the  
> order of the functions within the list.  That is, the primary sort  
> criteria is the first function in the list, but of that function is  
> equal for two values, the next is used and so on.  If you load the  
> below code and execute (show-off) several times I think you will  
> see what I mean.  You can then see how easily you can change the  
> list of functions that are either in the selector or the sort  
> criteria. If this is a web-based app, these list will be generated  
> from the http-query, which is generated by the user's clicks.
>     That is a "columnar" based approach; but it one can do  
> something similar but more powerful based on computed functions  
> that aren't based on individual columns, but on the entire data  
> element.  For example "find users whose username is equal to their  
> password" cannot be done in this way --- but can be done by just  
> using a function #(lambda (x) (equal (username-of x) (password-of  
> x)).  SQL can do this --- but LISP can could use any function  
> there, such as "find users who have both short usernames and  
> passwords that can be cracked by routine-x".
>     Instead of adding things to the root or the store-controller  
> directly, one would generally prefer
> to use consistent classes:
> http://common-lisp.net/project/elephant/doc/Persistent- 
> Classes.html#Persistent-Classes
>
> This doesn't change the nature of the problem.  If you like, you  
> can create an index on any slot in a very convenient way: http:// 
> common-lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices
> This holds out the possibility of NOT having to iterate over the  
> entire data-set, but rather honing in directly upon
> matching values. (You can in fact create functional indexes based  
> on any function at all in Elephant, which is something that SQL  
> can't do conveniently, but the times you will need to do this are  
> rare.)
>
> It would take maybe 20 minutes more coding to uses Ian's "get- 
> instances-by-range" directly, and in a very efficient manner for  
> performing the query (if the GUI elements correspond to the class's  
> slots.)   This would be very efficient; but of course you should  
> not do this until you know that this is really the bottleneck in  
> your system.  By using cursors, you can avoid reading the entire  
> data into memory, and thus process huge datasets.
>     However, one should note that Ian's code makes creating indices  
> on slots zero-effort; but indexes always have overhead.  The real  
> question is: when will your queries actually utilize the index?   
> (That is, if you always select on one column/slot, then that one  
> should be indexed....but if your query pattern is more complicated,  
> it becomes fuzzy.)
>     Let me know if you find this useful;  after I get feedback from  
> you and Ian has made his post, perhaps we will put
> this in the documentation.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061113/b6d6de65/attachment.html>