[elephant-devel] Severely poor performance in some obvious cases

Alain Picard Alain.Picard at memetrics.com
Wed Nov 28 08:53:18 UTC 2007


Dear Elephant developers,

I've been considering using Elephant for a project of mine,
and have been doing some basic performance tests, using the
new postmodern back end (which seems way cool, btw).

The scenario I'm testing is something like this; you
have a base class:

(defclass person-mixin ()
  ((name :accessor person-name :initarg :name :index t))
  (:metaclass persistent-metaclass))

and a derived one:

(defclass employee (person-mixin)
  ((job       :accessor  job
              :initarg  :job))
  (:metaclass persistent-metaclass))

And you go off an make a million instances of employees.
[Let's say we're a very big corporation.  :-)]

Then when I did the following:

(time (get-instance-by-value 'employee 'name name))

I was surprised to find that not only is it slow, but it conses
like a madman.  This led me to inspect what this function actually
does, and it turns out that it ends up doing a map-index, which
does a with-btree-cursor to find a get-instances-by-value and
then throws away all but the first.

;;; Current definition, in 0.9.1

(defmethod get-instance-by-value ((class symbol) slot-name value)
  (let ((list (get-instances-by-value (find-class class) slot-name value)))
    (when (consp list)
      (car list))))

(defmethod get-instance-by-value ((class persistent-metaclass) slot-name value)
  (let ((list (get-instances-by-value class slot-name value)))
    (when (consp list)
      (car list))))

It seems odd to create a cursor to find something when you have an
index on that slot.  Also, it seems to me users of
GET-INSTANCE-BY-VALUE probably imagine there is only 1 instance to
return; and so would there be a huge problem in using something like
the following instead:

;;; Proposed definitions:

(defmethod get-instance-by-value ((class persistent-metaclass) slot-name value)
  (let ((bt (find-inverted-index class slot-name)))
    (if bt
	(get-value value bt) ; Do it the "simple" way
	(first (get-instances-by-value class slot-name value)))))

(defmethod get-instance-by-value ((class symbol) slot-name value)
  (get-instance-by-value (find-class class) slot-name value))

This is more than a factor of 10 faster under elephant/postmodern
for a class with 30,000 instances.

Am I missing something really basic here?  Is there a simpler
way to do what I want without this performance penalty?
Will this simply not work for some other back ends I'm not
aware of?  I feel a certain tension in the code trying to
be "all things to all back-ends", and certain decisions are
clearly inspired by the Berkeley DB back end, which sadly I 
could not use for the venture I have in mind (for licensing reasons).


Lastly, is there a way to trace all the SQL commands going
back and forth to postgresql in postmodern?  So far I've resorted
to Postgres statement logging, which is painful to match up
with what the application does.  I'm looking for the postmodern
equivalent of CLSQL's START-SQL-RECORDING.


Thanks in advance!

                                Alain Picard



-- 
Please read about why Top Posting
is evil at: http://en.wikipedia.org/wiki/Top-posting
and http://www.dickalba.demon.co.uk/usenet/guide/faq_topp.html

Please read about why HTML in email is evil at: http://www.birdhouse.org/etc/evilmail.html



More information about the elephant-devel mailing list