[elephant-devel] Querying Advice [w/code example]

Robert L. Read read at robertlread.net
Mon Nov 13 19:17:17 UTC 2006


I wrote that code in about an hour and a half; it is a throw-away spike
solution.
Furthermore I developed it using the SQL backend rather than the BDB.
I would never claim it is an ideal solution.  Morever, coding style
isn't really my strong suit.

As Ian mentioned, our serialization could be improved.

However, (/  9M 5000) = 180 bytes/User.  180 bytes of storage total for
a data object as below that 
has to contain 45 or so separate characters is not crazy; there are
lot's of SQL systems that 
would be worse under normal circumstances (at least that was true 8
years ago...)

The size of the log files are probably irrelevant.


On Mon, 2006-11-13 at 13:43 -0500, Daniel Salama wrote:

> Ok.
> 
> 
> I got Elephant to work again with SBCL on PPC. I guess I was still
> using BDB 4.3 when 4.4 seems to be required. I couldn't find that
> anywhere in the docs.
> 
> 
> I see what you're saying and can start to envision where this can go.
> I will keep playing and provide more feedback later today.
> 
> 
> FYI, for curiosity purposes, I just ran (random-users 5000) on an
> empty database and it took 5 min 56 secs to complete. At the end, the
> %ELEPHANT file was 9MB in size and both log files where 10MB. That's
> kind of big considering what it does.
> 
> 
> I then made the random-users2 function to look like:
> 
> 
> (defun random-users2 (n)
>   (setq *auto-commit* nil)
>   (with-transaction ()
>     (dotimes (x n)
>       (let ((u (make-instance 
>                 'User
>                 :uname (format nil "user~A" x)
>                 :pword (random-password)
>                 :email (format nil "user~A at .nowheresville.org" x)
>                 :fullname (format nil "~A~A ~A~A" (random-password) x
> (random-password) x)
>                 :balance (random 100))))
>         (add-to-root x u))))
>   (setq *auto-commit* t))
> 
> 
> When I ran (random-users2 5000) I got a Berkeley DB Error: Cannot
> allocate memory, so I changed random-users2 to:
> 
> 
> (defun random-users2 (n)
>   (setq *auto-commit* nil)
>   (start-ele-transaction)
>   (dotimes (x n)
>     (if (eq (mod x 1000)
>             0)
>         (progn
>           (commit-transaction)
>           (start-ele-transaction)))
>     (let ((u (make-instance 
>               'User
>               :uname (format nil "user~A" x)
>               :pword (random-password)
>               :email (format nil "user~A at .nowheresville.org" x)
>               :fullname (format nil "~A~A ~A~A" (random-password) x
> (random-password) x)
>               :balance (random 100))))
>       (add-to-root x u)))
>   (commit-transaction)
>   (setq *auto-commit* t))
> 
> 
> When I ran (random-users2 5000) on an empty database, this time it
> only 25 secs to complete. The %ELEPHANT file was still 9MB in size and
> both log files where also 10MB.
> 
> 
> I suppose your code was not designed for performance and only for
> illustration purposes. Needless to say, knowing how to use
> transactions certainly helps and can dramatically affect application
> performance.
> 
> 
> My only concern at this moment (which I also mentioned in another
> email) is the size of the data files. Whether or not that only
> reflects the persistent storage and not necessarily the memory
> footprint, I don't know. Therefore, if I loaded my database with
> 650,000 customer records, the data files will easily exceed 1GB of
> storage, and that's just one "table".
> 
> 
> Thanks,
> Daniel
> 
> On Nov 12, 2006, at 6:45 PM, Robert L. Read wrote:
> 
> 
> > Dear Daniel and Team,
> > 
> >     I think the code below, which I have tested on SBCL, illustrated
> > a typical problem that Daniel Salama introduces.  To paraphrase, you
> > have a datatype (perhaps compound) which has a lot of slots; you
> > have a GUI, perhaps web-based, that you use to both select or filter
> > the large database, and to decide how to present sort the results.
> > I've written the below example as if you operating directly on the
> > slots.  The fact that there are often intervening functions does not
> > fundamentally change the problem.  (An example of this is storing a
> > timestamp as an integer, but presenting it in a human-readable
> > format.)
> >     SQL supports a powerful querying ability based on both selection
> > and sorting.  One might think that this is an advantage of SQL; it
> > is conventional reason that this is actually an advantage of using a
> > relational database.  However, since LISP treats functions as first-
> > class citizens that can be constructed dynamically, you actually
> > have a full Turing-complete capabilities in doing queries that SQL
> > cannot match.  This same ability applies to sorting; you can sort on
> > any lexical order that you can program.
> >     In practice, however, one doesn't always need this power.  More
> > typically, a user will select fields that they want to use to filter
> > the results (that is, construct a query from), and perhaps how they
> > would like the results to be sorted.  I assume that you know how to
> > interpret an HTTP query or a McClim user interface or something to
> > associate the GUI with underlying functions.  (My personal framework
> > has a way to do this, and UCW is probably the most common or famous
> > way to do it now.)
> >     The code below generated 100 random "users".  The bare act of
> > defining this class defines accessor-functions that we can use in
> > dynamically constructed lists as below: (list 'username-of 'balance-
> > of).  I have written very small functions that use such lists either
> > to define define "lexicographic" sort orders based on the order of
> > the functions within the list.  That is, the primary sort criteria
> > is the first function in the list, but of that function is equal for
> > two values, the next is used and so on.  If you load the below code
> > and execute (show-off) several times I think you will see what I
> > mean.  You can then see how easily you can change the list of
> > functions that are either in the selector or the sort criteria. If
> > this is a web-based app, these list will be generated from the http-
> > query, which is generated by the user's clicks.
> >     That is a "columnar" based approach; but it one can do something
> > similar but more powerful based on computed functions that aren't
> > based on individual columns, but on the entire data element.  For
> > example "find users whose username is equal to their password"
> > cannot be done in this way --- but can be done by just using a
> > function #(lambda (x) (equal (username-of x) (password-of x)).  SQL
> > can do this --- but LISP can could use any function there, such as
> > "find users who have both short usernames and passwords that can be
> > cracked by routine-x".
> >     Instead of adding things to the root or the store-controller
> > directly, one would generally prefer
> > to use consistent classes:
> > http://common-lisp.net/project/elephant/doc/Persistent-
> > Classes.html#Persistent-Classes
> > 
> > This doesn't change the nature of the problem.  If you like, you can
> > create an index on any slot in a very convenient way: http://common-
> > lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices 
> > This holds out the possibility of NOT having to iterate over the
> > entire data-set, but rather honing in directly upon 
> > matching values. (You can in fact create functional indexes based on
> > any function at all in Elephant, which is something that SQL can't
> > do conveniently, but the times you will need to do this are rare.)
> > 
> > It would take maybe 20 minutes more coding to uses Ian's "get-
> > instances-by-range" directly, and in a very efficient manner for
> > performing the query (if the GUI elements correspond to the class's
> > slots.)   This would be very efficient; but of course you should not
> > do this until you know that this is really the bottleneck in your
> > system.  By using cursors, you can avoid reading the entire data
> > into memory, and thus process huge datasets.
> >     However, one should note that Ian's code makes creating indices
> > on slots zero-effort; but indexes always have overhead.  The real
> > question is: when will your queries actually utilize the index?
> > (That is, if you always select on one column/slot, then that one
> > should be indexed....but if your query pattern is more complicated,
> > it becomes fuzzy.)
> >     Let me know if you find this useful;  after I get feedback from
> > you and Ian has made his post, perhaps we will put
> > this in the documentation.
> > 
> 
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061113/c885989e/attachment.html>


More information about the elephant-devel mailing list