[elephant-devel] Query System

Fri May 9 22:02:15 UTC 2008

Hello everyone,

I apologize for being disconnected for so long. I had volunteered to  
help in the query system and should have done more progress by now.  
Unfortunately, the same as some (most or all) of you, putting food on  
the table for my family has a higher priority and my current job has  
demanded 110% of my time lately.

Enough excuses! I have been passively reading several of your email  
threads. I am convinced that a query system will bring a lot of value  
to Elephant. The question that still arises is whether or not people  
want a SQL-like syntax or a Lisp-like syntax.

As Ian has suggested, publicly and/or privately, we should start  
designing the query system in a very basic form. The most critical  
part would be query optimization, which I'd rather work on after we  
have the basic query system in place. But there are a lot of decisions  
to make before we get there and coming to a consensus of how it should  
look and how it should work is of critical importance.

 From a simplistic point of view, a SQL-like syntax should allow for  
the execution of the basic relational algebraic operations (union,  
difference, cartesian product, projection, and selection). For the  
most part, these would not be difficult to implement. However, IMHO,  
there is an intrinsic "contradiction" in applying a SQL-like syntax on  
top of Elephant.

Assume you have the following Tables (relations) in a SQL world:

Books (
   book_id,
   title,
   author
)

Publishers (
   publisher_id,
   name
)

BooksPublishers (
   book_id,
   publisher_id,
   year
)

Suppose you wanted to get the cartesian product of all the books  
published in 2008, you could run a SQL query like:

SELECT Books.*, Publishers.* FROM Books, Publishers, BooksPublishers  
WHERE Books.book_id = BooksPublishers.book_id AND  
Publishers.publisher_id = BooksPublishers.publisher_id AND  
BooksPublishers.year = 2008

The result will be a concatenation of all the columns from the Books  
and Publishers tables. In a SQL-world, you would access these results  
in a key-value pair type mode (e.g. Books.book_id = 1, Books.title =  
"1984", etc). However, when you think in terms of Elephant (at least  
my understanding of it), you're dealing with objects and not key-value  
pairs from multiple tables. So, instead of getting a concatenation of  
all the columns, you "should" be getting just a list of Book objects  
(or Publisher objects) that met your query criteria, such that when  
you iterate thru them, you could "query" their Publishers (or the  
Books). So, if we had something like (please keep in mind this is no  
suggestion to syntax or correctness but just for illustrative purposes):

(defpclass book ()
   ((title :accessor book-title :index t)
    (author :accessor book-author :index t)
    (published_copies :accessor book-copies :initform (make-pset))))

(defpclass publisher ()
   ((name :accessor publisher-name :index t)))

(defmethod add-published-copy ((bk book) (pb publisher) year)
   (insert-item '(pb year) (book-copies bk)))

(defmethod map-published-copies (fn (bk book))
   (map-pset fn (book-copies bk)))

(setq objs (select book :where ((map-published-copies (lambda (item  
year) (= (second item) year)) $bk 2008)))))

 From then on, you could just iterate through the book objects in the  
result set for their respective published copies. The problem with  
this is that, ok, you get all the books that met your criteria but if  
you then wanted to get a list of all the published copies, you would  
need to apply the filter criteria again. The reason I think it "should  
behave" this way is because Elephant deals with sets of objects, and  
you use Lisp to navigate through the object space, whereas in a SQL- 
world you are not dealing with objects but with a result set that  
contains all the columns you asked for. If we were to emulate the same  
behavior in the query system, that would sort of defeat the purpose of  
Elephant. For that matter, you might as well use some of the other  
libraries (e.g. CL-SQL, cl-perec, cl-rdbms, etc).

The above example is a very simple example. We haven't looked at  
SORTING, LIMIT, OFFSET, etc. Things which will simply make this whole  
dilemma more difficult.

I haven't looked into Ian's association mechanism yet. Maybe the query  
system could/should be an extension to that with some specialized  
features to apply filter criteria instead (and possibly evolve into  
something similar to Ruby's ActiveRecord). I know the association  
mechanism is still being developed and I haven't really seen anyone  
comment much on it other than what Ian has mentioned. In one of Ian's  
comments, he said:

      "A more general query language is probably the right solution  
for this interface.  The query language would know about associations,  
derived indices, etc and perform query planning via introspection over  
the class objects."

At the same time, Robert said on another thread:

      "One might philosophically prefer SQL.  I personally vaster  
prefer to work in a powerful programming language to accomplish these  
things. Obviously, whether two classes that refer to each other stand  
in a "parent-child" relationship or not depends entirely on the  
circumstances.  I prefer to write simple functions such as "delete- 
order" below, which both utilize and (in a sense) expand the power of  
LISP applied to persistent objects."

Leslie said on yet another thread:

      "While I'm at it: OFFSET and LIMIT (a real limit which lets you  
specify an arbitrary Lisp expression) are things we definitely want to  
aim for in 1.0. They are not difficult to implement at all, but they  
don't work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. This means  
everyone has to write their own version of these functions that take  
appropriate arguments and move the cursor around themselves instead of  
relying on a simple high-level API.

I'd have implemented these extensions myself, but I thought it better  
to wait for the integration of the query language to add it."

And Alex said:

      "I think main problem is not how it looks, but that query  
language actually makes programming a lot easier."

All those comments make sense. There seems to be a group agreement  
that something is needed, but everyone has their own ideas of how it  
should work. Both the query language and the associations are still  
being developed, so if we get consensus no how these should work, it  
may give a better direction to both feature sets. If anyone has any  
comments or suggestion as to whether a query system be of real  
interest/necessity and if so, which would be the preferred query  
syntax and expected behavior, that would really help.

I'm willing to work on this in as much as possible with my limited  
knowledge of Lisp and Elephant. However, given a clear direction of  
where this should go, I will be able to focus better and learn faster  
what I haven't learned so far.

Again, your feedback is much appreciated. I'm hopeful to be able to  
work more on this over the weekend, assuming I get some feedback from  
you guys.

Thanks
Daniel