[elephant-devel] Query performance

Tue Feb 10 23:33:38 UTC 2009

For that particular query there is a more efficient way, but it's not  
natively supported in any of the high level APIs.  We're not (yet)  
trying to reproduce all the queries you are accustomed to in SQL  
databases.  The goal of the query system is to move in that direction  
when it makes sense.  For example we don't have an efficient COUNT.

If you need to do lots of table ops, better to use a SQL database!   
The goal of an OODB is really to support queries based on the link- 
structure of the objects and not based on treating their slots like  
ORM columns.

That said you can use the underlying dup-btrees directly when you need  
to...but it turns out I wrote one for my own use that you can look at  
as an example:

(ele::get-unique-values (find-inverted-index 'provider 'state))

I can probably export some of the macros I use to make using cursors  
more convenient, but I'd rather put the limited time I have into the  
query system which we can extend to do stuff like this...

I'm curious what performance you get on the get-unique-values call...

Ian

On Feb 10, 2009, at 5:39 PM, John wrote:

> Hi,
>
> I created a database of 85958 persistent objects containing 31  
> string slots with items such as first, last, middle, etc names,  
> company name, address, etc... Total disk space is about 1.3GB.
>
> Now, I wanted to write a small routine to generate CSV files  
> containing the information in this database. However, I wanted to  
> generate one CSV file per state, so I wrote this code:
>
> ;;; utility functions
> (defun open-db ()
>   "Open the Elephant DB, if not already opened."
>   (unless *store-controller*
>     (open-store *dentemax-spec* :max-locks 10000 :max-objects 10000))
>   *store-controller*)
>
> (defun close-db ()
>   "Close the Elephant DB, if opened."
>   (if *store-controller*
>       (close-store)))
>
> ;;; export function
> ;;; basically, given a filename and a state (string), export all  
> objects whose 'state index matches the given state
> (defun write-csv-file (filename state)
>   "Export CSV file of providers in a given state."
>   (if (open-db)
>    (with-open-file (stream
>                     filename
>                     :direction :output
>                     :if-exists :supersede
>                     :if-does-not-exist :create)
>      (princ-csv (csv-header) stream) ; write the header
>      (map-inverted-index #'(lambda (k o)
>                             (declare (ignore k))
>                             (princ-csv (record-list o) stream))
>                          'provider
>                          'state
>                          :value state))))
>
> So far, this seems to be working fine. Some notes:
> (time (open-db))
> Evaluation took:
>   0.692 seconds of real time
>   0.232830 seconds of total run time (0.124051 user, 0.108779 system)
>   33.67% CPU
>   21 forms interpreted
>   315 lambdas converted
>   1,776,202,259 processor cycles
>   5,661,240 bytes consed
>
> Not bad to open the database
>
> I was able to generate a CSV file just fine:
>
> (time (write-csv-file "sample.csv" "CA"))
> Evaluation took:
>   14.196 seconds of real time
>   13.495860 seconds of total run time (12.846913 user, 0.648947  
> system)
>   [ Run times consist of 1.064 seconds GC time, and 12.432 seconds  
> non-GC time. ]
>   95.07% CPU
>   89 lambdas converted
>   36,814,250,083 processor cycles
>   1,419,496,056 bytes consed
>
> I then wanted to create an automated way of generating a file for  
> each state in the database. So, I wrote this:
>
> (defun unique-states ()
>   "Generates a list of unique states from the DB."
>   (open-db)
>   (let ((states nil))
>     (map-inverted-index #'(lambda (k o)
>                             (declare (ignore o))
>                             (pushnew k states :test #'string-equal))
>                         'provider
>                         'state)
>     (nreverse states)))
>
> and here is where my questions are:
>
> (time (unique-states))
> Evaluation took:
>   6.970 seconds of real time
>   6.807557 seconds of total run time (6.672462 user, 0.135095 system)
>   [ Run times consist of 0.290 seconds GC time, and 6.518 seconds  
> non-GC time. ]
>   97.68% CPU
>   18,073,334,071 processor cycles
>   140,438,296 bytes consed
>
> ("AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "GA" "HI" "IA" "ID" "IL"  
> "IN" "KS"
>  "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC" "ND" "NE"  
> "NH" "NJ"
>  "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"  
> "VA" "VT"
>  "WA" "WI" "WV" "WY")
>
> 1) Is it normal for it to take "so long" to "simulate" the  
> equivalent of SELECT DISTINCT state FROM source? When I run the same  
> query on a freshly launched MySQL instance it returned the results  
> in: 62 rows in set (0.00 sec)
>
> 2) Is my approach of selecting the unique states inefficient?
>
> 3) Is there a better way to perform this "query" in Elephant?
>
> BTW, this is all with the latest elephant from http://www.common-lisp.net/project/elephant/darcs/elephant-1.0 
>  in SBCL 1.0.24 on OS X and BDB 4.7.25.
>
> Thanks in advance,
> JD
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel