[elephant-devel] Querying Advice [w/code example]
Robert L. Read
read at robertlread.net
Sun Nov 12 23:45:20 UTC 2006
Dear Daniel and Team,
I think the code below, which I have tested on SBCL, illustrated a
typical problem that Daniel Salama introduces. To paraphrase, you have
a datatype (perhaps compound) which has a lot of slots; you have a GUI,
perhaps web-based, that you use to both select or filter the large
database, and to decide how to present sort the results. I've written
the below example as if you operating directly on the slots. The fact
that there are often intervening functions does not fundamentally change
the problem. (An example of this is storing a timestamp as an integer,
but presenting it in a human-readable format.)
SQL supports a powerful querying ability based on both selection and
sorting. One might think that this is an advantage of SQL; it is
conventional reason that this is actually an advantage of using a
relational database. However, since LISP treats functions as first-
class citizens that can be constructed dynamically, you actually have a
full Turing-complete capabilities in doing queries that SQL cannot
match. This same ability applies to sorting; you can sort on any
lexical order that you can program.
In practice, however, one doesn't always need this power. More
typically, a user will select fields that they want to use to filter the
results (that is, construct a query from), and perhaps how they would
like the results to be sorted. I assume that you know how to interpret
an HTTP query or a McClim user interface or something to associate the
GUI with underlying functions. (My personal framework has a way to do
this, and UCW is probably the most common or famous way to do it now.)
The code below generated 100 random "users". The bare act of
defining this class defines accessor-functions that we can use in
dynamically constructed lists as below: (list 'username-of 'balance-of).
I have written very small functions that use such lists either to define
define "lexicographic" sort orders based on the order of the functions
within the list. That is, the primary sort criteria is the first
function in the list, but of that function is equal for two values, the
next is used and so on. If you load the below code and execute (show-
off) several times I think you will see what I mean. You can then see
how easily you can change the list of functions that are either in the
selector or the sort criteria. If this is a web-based app, these list
will be generated from the http-query, which is generated by the user's
clicks.
That is a "columnar" based approach; but it one can do something
similar but more powerful based on computed functions that aren't based
on individual columns, but on the entire data element. For example
"find users whose username is equal to their password" cannot be done in
this way --- but can be done by just using a function #(lambda (x)
(equal (username-of x) (password-of x)). SQL can do this --- but LISP
can could use any function there, such as "find users who have both
short usernames and passwords that can be cracked by routine-x".
Instead of adding things to the root or the store-controller
directly, one would generally prefer
to use consistent classes:
http://common-lisp.net/project/elephant/doc/Persistent-
Classes.html#Persistent-Classes
This doesn't change the nature of the problem. If you like, you can
create an index on any slot in a very convenient way: http://common-
lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices
This holds out the possibility of NOT having to iterate over the entire
data-set, but rather honing in directly upon
matching values. (You can in fact create functional indexes based on any
function at all in Elephant, which is something that SQL can't do
conveniently, but the times you will need to do this are rare.)
It would take maybe 20 minutes more coding to uses Ian's "get-instances-
by-range" directly, and in a very efficient manner for performing the
query (if the GUI elements correspond to the class's slots.) This
would be very efficient; but of course you should not do this until you
know that this is really the bottleneck in your system. By using
cursors, you can avoid reading the entire data into memory, and thus
process huge datasets.
However, one should note that Ian's code makes creating indices on
slots zero-effort; but indexes always have overhead. The real question
is: when will your queries actually utilize the index? (That is, if you
always select on one column/slot, then that one should be indexed....but
if your query pattern is more complicated, it becomes fuzzy.)
Let me know if you find this useful; after I get feedback from you
and Ian has made his post, perhaps we will put
this in the documentation.
(asdf:operate 'asdf:load-op :elephant-tests)
(in-package "ELEPHANT-TESTS")
(setf *default-spec* *testbdb-spec*)
(defclass User ()
((username :type 'string :initform "" :initarg :uname :accessor
username-of)
(password :type 'string :initform "" :initarg :pword :accessor
password-of)
(email :type 'string :initform "" :initarg :email :accessor email-
of)
(fullname :type 'string :initform "" :initarg :fullname :accessor
fullname-of)
(balance :type 'integer :initform 0 :initarg :balance :accessor
balance-of)
))
(defun random-letter ()
(if (= 0 (random 2))
(code-char (+ 65 (random 26)))
(code-char (+ 97 (random 26)))))
(defun random-password ()
(let ((pw ""))
(dotimes (i 8)
(setf pw (format nil "~A~A" pw (random-letter))))
pw))
(defun random-users (n)
(dotimes (x n)
(let ((u (make-instance
'User
:uname (format nil "user~A" x)
:pword (random-password)
:email (format nil "user~A at .nowheresville.org" x)
:fullname (format nil "~A~A ~A~A" (random-password) x (random-
password) x)
:balance (random 100))))
(add-to-root x u)
)
))
(defun my-lexco (a b)
(cond
((typep a 'string)
(string< a b))
((typep a 'number)
(< a b))
(t
(string<
(format nil "~A" a)
(format nil "~A" b)))
))
(defun sort-by-columns (objs lst)
(sort
objs
#'(lambda (a b)
(let ((res t))
(dolist (fun lst)
(progn
(format t "fun = ~A~%" fun)
(let ((av (funcall fun a))
(bv (funcall fun b )))
(if (equal av bv)
nil
(return (my-lexco av bv))))
res))))
))
(defun select-by-many-selectors (objs lst)
(let ((ret nil))
(dolist (o objs)
(let ((selected t))
(dolist (fun lst)
(setf selected (and selected (funcall fun o)))
)
(if selected
(push o ret))
))
ret))
(defun get-all-objects (n)
(let ((res nil))
(dotimes (x n)
(push (get-from-root x) res))
res))
(defmethod print-object ((obj User) stream)
(format stream
"(username, password, email, fullname, balance) = (~A, ~A, ~A, ~A,
~A)"
(username-of obj)
(password-of obj)
(email-of obj)
(fullname-of obj)
(balance-of obj)
))
(defun show-off ()
(let ((n 100))
(open-store *default-spec*)
;; Let's create 100 users....
(random-users n)
;; Now we'll sort based on a list of functions..
(sort-by-columns
;; But first we'll filter by a list of functions that must be true
of each object...
(select-by-many-selectors
(get-all-objects n)
;; This list of functions (which has only
(list #'(lambda (x) (< (balance-of x) 50))
#'(lambda (x) (equal #\3
(aref (username-of x)
(- (length (username-of x)) 1))))))
;; This is the list of functions that we will
(list #'(lambda (x) (floor (/ (balance-of x) 10))) #'username-
of))))
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061112/ad92e5b5/attachment.html>
More information about the elephant-devel
mailing list