[elephant-devel] Query performance

John develgenius at gmail.com
Tue Feb 10 22:39:57 UTC 2009


Hi,

I created a database of 85958 persistent objects containing 31 string slots
with items such as first, last, middle, etc names, company name, address,
etc... Total disk space is about 1.3GB.

Now, I wanted to write a small routine to generate CSV files containing the
information in this database. However, I wanted to generate one CSV file per
state, so I wrote this code:

;;; utility functions
(defun open-db ()
  "Open the Elephant DB, if not already opened."
  (unless *store-controller*
    (open-store *dentemax-spec* :max-locks 10000 :max-objects 10000))
  *store-controller*)

(defun close-db ()
  "Close the Elephant DB, if opened."
  (if *store-controller*
      (close-store)))

;;; export function
;;; basically, given a filename and a state (string), export all objects
whose 'state index matches the given state
(defun write-csv-file (filename state)
  "Export CSV file of providers in a given state."
  (if (open-db)
   (with-open-file (stream
                    filename
                    :direction :output
                    :if-exists :supersede
                    :if-does-not-exist :create)
     (princ-csv (csv-header) stream) ; write the header
     (map-inverted-index #'(lambda (k o)
                            (declare (ignore k))
                            (princ-csv (record-list o) stream))
                         'provider
                         'state
                         :value state))))

So far, this seems to be working fine. Some notes:
(time (open-db))
Evaluation took:
  0.692 seconds of real time
  0.232830 seconds of total run time (0.124051 user, 0.108779 system)
  33.67% CPU
  21 forms interpreted
  315 lambdas converted
  1,776,202,259 processor cycles
  5,661,240 bytes consed

Not bad to open the database

I was able to generate a CSV file just fine:

(time (write-csv-file "sample.csv" "CA"))
Evaluation took:
  14.196 seconds of real time
  13.495860 seconds of total run time (12.846913 user, 0.648947 system)
  [ Run times consist of 1.064 seconds GC time, and 12.432 seconds non-GC
time. ]
  95.07% CPU
  89 lambdas converted
  36,814,250,083 processor cycles
  1,419,496,056 bytes consed

I then wanted to create an automated way of generating a file for each state
in the database. So, I wrote this:

(defun unique-states ()
  "Generates a list of unique states from the DB."
  (open-db)
  (let ((states nil))
    (map-inverted-index #'(lambda (k o)
                            (declare (ignore o))
                            (pushnew k states :test #'string-equal))
                        'provider
                        'state)
    (nreverse states)))

and here is where my questions are:

(time (unique-states))
Evaluation took:
  6.970 seconds of real time
  6.807557 seconds of total run time (6.672462 user, 0.135095 system)
  [ Run times consist of 0.290 seconds GC time, and 6.518 seconds non-GC
time. ]
  97.68% CPU
  18,073,334,071 processor cycles
  140,438,296 bytes consed

("AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "GA" "HI" "IA" "ID" "IL" "IN" "KS"
 "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC" "ND" "NE" "NH" "NJ"
 "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VA" "VT"
 "WA" "WI" "WV" "WY")

1) Is it normal for it to take "so long" to "simulate" the equivalent of
SELECT DISTINCT state FROM source? When I run the same query on a freshly
launched MySQL instance it returned the results in: 62 rows in set (0.00
sec)

2) Is my approach of selecting the unique states inefficient?

3) Is there a better way to perform this "query" in Elephant?

BTW, this is all with the latest elephant from
http://www.common-lisp.net/project/elephant/darcs/elephant-1.0 in SBCL
1.0.24 on OS X and BDB 4.7.25.

Thanks in advance,
JD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20090210/33d63443/attachment.html>


More information about the elephant-devel mailing list