[elephant-devel] Querying Advice [w/code example]

Ian Eslick eslick at csail.mit.edu
Mon Nov 13 18:48:32 UTC 2006


Are running from HEAD or 0.6.0?  I'll answer the size question tonight (on travel today).  -Ian
Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Daniel Salama <lists at infoway.net>
Date: Mon, 13 Nov 2006 13:43:03 
To:Elephant bugs and development <elephant-devel at common-lisp.net>
Subject: Re: [elephant-devel] Querying Advice [w/code example]

Ok.


I got Elephant to work again with SBCL on PPC. I guess I was still using BDB 4.3 when 4.4 seems to be required. I couldn't find that anywhere in the docs.


I see what you're saying and can start to envision where this can go. I will keep playing and provide more feedback later today.


FYI, for curiosity purposes, I just ran (random-users 5000) on an empty database and it took 5 min 56 secs to complete. At the end, the %ELEPHANT file was 9MB in size and both log files where 10MB. That's kind of big considering what it does.


I then made the random-users2 function to look like:


(defun random-users2 (n)
  (setq *auto-commit* nil)
  (with-transaction ()
    (dotimes (x n)
      (let ((u (make-instance 
                'User
                :uname (format nil "user~A" x)
                :pword (random-password)
                :email (format nil "user~A at .nowheresville.org" x)
                :fullname (format nil "~A~A ~A~A" (random-password) x (random-password) x)
                :balance (random 100))))
        (add-to-root x u))))
  (setq *auto-commit* t))


When I ran (random-users2 5000) I got a Berkeley DB Error: Cannot allocate memory, so I changed random-users2 to:


(defun random-users2 (n)
  (setq *auto-commit* nil)
  (start-ele-transaction)
  (dotimes (x n)
    (if (eq (mod x 1000)
            0)
        (progn
          (commit-transaction)
          (start-ele-transaction)))
    (let ((u (make-instance 
              'User
              :uname (format nil "user~A" x)
              :pword (random-password)
              :email (format nil "user~A at .nowheresville.org" x)
              :fullname (format nil "~A~A ~A~A" (random-password) x (random-password) x)
              :balance (random 100))))
      (add-to-root x u)))
  (commit-transaction)
  (setq *auto-commit* t))


When I ran (random-users2 5000) on an empty database, this time it only 25 secs to complete. The %ELEPHANT file was still 9MB in size and both log files where also 10MB.


I suppose your code was not designed for performance and only for illustration purposes. Needless to say, knowing how to use transactions certainly helps and can dramatically affect application performance.


My only concern at this moment (which I also mentioned in another email) is the size of the data files. Whether or not that only reflects the persistent storage and not necessarily the memory footprint, I don't know. Therefore, if I loaded my database with 650,000 customer records, the data files will easily exceed 1GB of storage, and that's just one "table".


Thanks,
Daniel



On Nov 12, 2006, at 6:45 PM, Robert L. Read wrote:
 Dear Daniel and Team,
 
     I think the code below, which I have tested on SBCL, illustrated a typical problem that Daniel Salama introduces.  To paraphrase, you have a datatype (perhaps compound) which has a lot of slots; you have a GUI, perhaps web-based, that you use to both select or filter the large database, and to decide how to present sort the results.  I've written the below example as if you operating directly on the slots.  The fact that there are often intervening functions does not fundamentally change the problem.  (An example of this is storing a timestamp as an integer, but presenting it in a human-readable format.)
     SQL supports a powerful querying ability based on both selection and sorting.  One might think that this is an advantage of SQL; it is conventional reason that this is actually an advantage of using a relational database.  However, since LISP treats functions as first-class citizens that can be constructed dynamically, you actually have a full Turing-complete capabilities in doing queries that SQL cannot match.  This same ability applies to sorting; you can sort on any lexical order that you can program.
     In practice, however, one doesn't always need this power.  More typically, a user will select fields that they want to use to filter the results (that is, construct a query from), and perhaps how they would like the results to be sorted.  I assume that you know how to interpret an HTTP query or a McClim user interface or something to associate the GUI with underlying functions.  (My personal framework has a way to do this, and UCW is probably the most common or famous way to do it now.)
     The code below generated 100 random "users".  The bare act of defining this class defines accessor-functions that we can use in dynamically constructed lists as below: (list 'username-of 'balance-of).  I have written very small functions that use such lists either to define define "lexicographic" sort orders based on the order of the functions within the list.  That is, the primary sort criteria is the first function in the list, but of that function is equal for two values, the next is used and so on.  If you load the below code and execute (show-off) several times I think you will see what I mean.  You can then see how easily you can change the list of functions that are either in the selector or the sort criteria. If this is a web-based app, these list will be generated from the http-query, which is generated by the user's clicks.
     That is a "columnar" based approach; but it one can do something similar but more powerful based on computed functions that aren't based on individual columns, but on the entire data element.  For example "find users whose username is equal to their password" cannot be done in this way --- but can be done by just using a function #(lambda (x) (equal (username-of x) (password-of x)).  SQL can do this --- but LISP can could use any function there, such as "find users who have both short usernames and passwords that can be cracked by routine-x".
     Instead of adding things to the root or the store-controller directly, one would generally prefer
 to use consistent classes:
 http://common-lisp.net/project/elephant/doc/Persistent-Classes.html#Persistent-Classes: <http://common-lisp.net/project/elephant/doc/Persistent-Classes.html#Persistent-Classes> 
 
 This doesn't change the nature of the problem.  If you like, you can create an index on any slot in a very convenient way: http://common-lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices: <http://common-lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices>  
 This holds out the possibility of NOT having to iterate over the entire data-set, but rather honing in directly upon 
 matching values. (You can in fact create functional indexes based on any function at all in Elephant, which is something that SQL can't do conveniently, but the times you will need to do this are rare.)
 
 It would take maybe 20 minutes more coding to uses Ian's "get-instances-by-range" directly, and in a very efficient manner for performing the query (if the GUI elements correspond to the class's slots.)   This would be very efficient; but of course you should not do this until you know that this is really the bottleneck in your system.  By using cursors, you can avoid reading the entire data into memory, and thus process huge datasets.
     However, one should note that Ian's code makes creating indices on slots zero-effort; but indexes always have overhead.  The real question is: when will your queries actually utilize the index?  (That is, if you always select on one column/slot, then that one should be indexed....but if your query pattern is more complicated, it becomes fuzzy.)
     Let me know if you find this useful;  after I get feedback from you and Ian has made his post, perhaps we will put
 this in the documentation.
 
_______________________________________________
elephant-devel site list
elephant-devel at common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


More information about the elephant-devel mailing list