[elephant-devel] Querying Advice [w/code example]

Mon Nov 13 15:18:58 UTC 2006

Some notes; forgive me if these are out of order:

*) (format t "blah~A" x) is not as efficient as (concatenate 'string
"blah" x), but I typically use it for debugging and its fine unless the
strings are long.
*) If you have many tables/persistent classes, then you are quite
correct,  the code which I posted won't handle this in a direct way.
However, there is no reason that one of the small functions in question
can't access the  data store, via "get-from-root", a btree-access, or a
persistent class access.  A SQL advocate would say that once these
queries become complicate enough, a query optimizer can do a better job
organizing the access.  There is some truth in this; I get around this
by using a prevalence like-model in anyway.  To make SQL work really
well in this way, you have to design your indicespretty carefully; the
same is basically true of using Elephant (except that its easier, due to
Ian's coding) and that if you put everything in memory it almost doesn't
matter. (Of course, you may end up having to build indices in memory as
well, but I wouldn't do that until careful profiling showed it to be
needed.)

So in my code example, a reasonable strategy is to "loop over" the basic
data set, and go out to other data sets only when they are needed.  In a
complex data situation, it can be hard to distinguish which class should
be considered basic; frankly in my experience things that complicated
rarely arise.  The ideal situation is to run the loop over the dataset
that contributes the most objects, in terms of cardinality, to the
result, and look for chances not to reach into other data sets.
Of course, the more smarts you put into working on this, the closer you
come to duplicated the work that a standard "query optimizer" in a RDBMS
system does.  There might be times when an RDBMS is actually better; but
for my money the prevalence model matches normal applications more
often.  The more dynamic your set of queries is, the less well a query
optimizer will perform, whether inside an RDBMS or an expression of your
own code; so use the K.I.S.S. principle if you want maximum query
flexibility.

*) Pierre pointed out a good use of macros ---- You could used them to
make a nice little query syntax.  Note that CL-SQL does this quite
nicely, and in the SQL-backend of elephant we use this a little bit.
The issue of whether you need a macro system for this or not somewhat
depends on how the GUI actions will be translated into the LISP
functions.  For example, if you wanted to let someone actually type in:
"field1>4 && field3 ~= "\d+*fizmo"", then a macro language is a good way
to do that.  I uses a macro package called LML for producing (X)HTML,
and it is a compelling reason to use lisp.  (BTW, that query is a bit of
a joke---I doubt anyone would want to expose a Perl-like query-syntax to
an end-user; but my point is, you could---and with the CL-PCCRE package
(which rocks) and a prevalence like model, it would be fast.)
[snipey personality on: Try doing that in Java.]

BTW, with just a little thought, you should be able to adopt the code
that I posted to OpenMCL, I think.
That might be faster than installing SBCL.

On Mon, 2006-11-13 at 00:19 -0500, Daniel Salama wrote:

> Wow! I wasn't expecting to hear back from you until Monday :)
> 
> 
> I haven't been able to run your code yet. I downloaded SBCL for PPC
> but have not been able to successfully compile Elephant with it. I
> will continue trying. I used to have it working before, but stopped
> using SBCL a while ago for OpenMCL.
> 
> 
> Anyway, some comments before I can run your code and provide more
> feedback:
> 
> 
> 100% agree with all this. However, one thing that might have been
> oversimplified is when it comes down to complex relational queries.
> There are two main issues to address here:
> 
> 
> 1) The nature of dynamic queries as presented in my original email:
> your explanation and code seems to address this in a subtle way and,
> although I believe you when you say it works, I'd say that it could be
> enhanced in a way to make it generic enough. I remember the how Peter
> Siebel presents such a nice and simple interface to querying a CD/MP3
> database in Chapter 3 of his Practical Common Lisp book. In a way, he
> kind of developed a small domain language for querying and I think
> something like that can be implemented with the sample code you
> provided below.
> 
> 
> 2) What happens when you have multiple relational tables... or more
> appropriate in our case, when you want to query an object model based
> on different object and object relational hierarchies. Say, for
> example, you have class person which has references to an object
> called demographics and an object postal-code, which, in turns, has a
> reference to an object state. How would you apply your proposal to a
> query such as "give me all 'males' under the age of 30 who live in
> 'NY' sorted by age in descending order and then by last, first names".
> It obviously is very simple to do this in SQL (if the proper relations
> exist between the tables), but may not be (or at least doesn't seem to
> be initially) as trivial in Elephant or applying your code below. Your
> dynamically constructed list of query-able columns approach may be a
> bit limited. Also, class indices may not necessarily be appropriate
> either in these situation.
> 
> 
> Maybe I haven't delved deeply enough into your proposal and/or haven't
> given it sufficient thought (I apologize for that if I belittled your
> proposal).
> 
> 
> I definitely understand the potential overhead of using secondary
> indices and agree that their use should be determined by how often you
> really need to search by those slots.
> 
> 
> I will continue trying to make SBCL work again in my PPC and then run
> your code. Maybe play with it and try to simulate different scenarios
> such as I mentioned above. Will then send more comments.
> 
> 
> Thanks again,
> Daniel
> 
> 
> P.S. I don't mean to distract anyone from the main thread of this
> email. But I noticed how you make use of format to append values to
> strings (e.g. random-password). Is that like the only way to do so, or
> maybe the more efficient way of doing so? Thanks again
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061113/01a657a5/attachment.html>