Hi,<br><br>I created a database of 85958 persistent objects containing 31 string slots with items such as first, last, middle, etc names, company name, address, etc... Total disk space is about 1.3GB.<br><br>Now, I wanted to write a small routine to generate CSV files containing the information in this database. However, I wanted to generate one CSV file per state, so I wrote this code:<br>
<br>;;; utility functions<br>(defun open-db ()<br> "Open the Elephant DB, if not already opened."<br> (unless *store-controller*<br> (open-store *dentemax-spec* :max-locks 10000 :max-objects 10000))<br> *store-controller*)<br>
<br>(defun close-db ()<br> "Close the Elephant DB, if opened."<br> (if *store-controller*<br> (close-store)))<br><br>;;; export function<br>;;; basically, given a filename and a state (string), export all objects whose 'state index matches the given state<br>
(defun write-csv-file (filename state)<br> "Export CSV file of providers in a given state."<br> (if (open-db)<br> (with-open-file (stream<br> filename<br> :direction :output<br>
:if-exists :supersede<br> :if-does-not-exist :create)<br> (princ-csv (csv-header) stream) ; write the header<br> (map-inverted-index #'(lambda (k o)<br> (declare (ignore k))<br>
(princ-csv (record-list o) stream))<br> 'provider<br> 'state<br> :value state))))<br><br>So far, this seems to be working fine. Some notes:<br>
(time (open-db))<br>Evaluation took:<br> 0.692 seconds of real time<br> 0.232830 seconds of total run time (0.124051 user, 0.108779 system)<br> 33.67% CPU<br> 21 forms interpreted<br> 315 lambdas converted<br> 1,776,202,259 processor cycles<br>
5,661,240 bytes consed<br><br>Not bad to open the database<br><br>I was able to generate a CSV file just fine:<br><br>(time (write-csv-file "sample.csv" "CA"))<br>Evaluation took:<br> 14.196 seconds of real time<br>
13.495860 seconds of total run time (12.846913 user, 0.648947 system)<br> [ Run times consist of 1.064 seconds GC time, and 12.432 seconds non-GC time. ]<br> 95.07% CPU<br> 89 lambdas converted<br> 36,814,250,083 processor cycles<br>
1,419,496,056 bytes consed<br><br>I then wanted to create an automated way of generating a file for each state in the database. So, I wrote this:<br><br>(defun unique-states ()<br> "Generates a list of unique states from the DB."<br>
(open-db)<br> (let ((states nil))<br> (map-inverted-index #'(lambda (k o)<br> (declare (ignore o))<br> (pushnew k states :test #'string-equal))<br> 'provider<br>
'state)<br> (nreverse states)))<br><br>and here is where my questions are:<br><br>(time (unique-states))<br>Evaluation took:<br> 6.970 seconds of real time<br> 6.807557 seconds of total run time (6.672462 user, 0.135095 system)<br>
[ Run times consist of 0.290 seconds GC time, and 6.518 seconds non-GC time. ]<br> 97.68% CPU<br> 18,073,334,071 processor cycles<br> 140,438,296 bytes consed<br> <br>("AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "GA" "HI" "IA" "ID" "IL" "IN" "KS"<br>
"KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC" "ND" "NE" "NH" "NJ"<br>
"NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VA" "VT"<br>
"WA" "WI" "WV" "WY")<br><br>1) Is it normal for it to take "so long" to "simulate" the equivalent of SELECT DISTINCT state FROM source? When I run the same query on a freshly launched MySQL instance it returned the results in: 62 rows in set (0.00 sec)<br>
<br>2) Is my approach of selecting the unique states inefficient?<br><br>3) Is there a better way to perform this "query" in Elephant?<br><br>BTW, this is all with the latest elephant from <a href="http://www.common-lisp.net/project/elephant/darcs/elephant-1.0">http://www.common-lisp.net/project/elephant/darcs/elephant-1.0</a> in SBCL 1.0.24 on OS X and BDB 4.7.25.<br>
<br>Thanks in advance,<br>JD<br>