From gabor.vitez at gmail.com  Wed Apr  4 10:40:26 2007
From: gabor.vitez at gmail.com (Gabor Vitez)
Date: Wed, 4 Apr 2007 12:40:26 +0200
Subject: [cl-prevalence-devel] cl-prevalence speed issues
Message-ID: <19f5edc40704040340j2a7f256bt93990906bfad5fc5@mail.gmail.com>

Hi,

I just started to toy around with cl-prevalence; however I found strange
speed issues:

loading a database from transaction log is way faster than loading it from
snapshot.

I modified one of the test scripts from the cl-prevalence distribution:

(require 'asdf)
(require 'cl-prevalence)
(in-package :cl-prevalence)
(defclass numbers ()
  ((numbers-list :accessor get-numbers-list :initform nil))
    (:documentation "Object to hold our list of numbers"))
(defun tx-create-numbers-root (system)
      "Transaction function to create a numbers instance as a root object"
        (setf (get-root-object system :numbers) (make-instance 'numbers)))
(defun tx-add-number (system number)
  "Transaction function to add a number to the numbers list"
  (let ((numbers (get-root-object system :numbers)))
    (push number (get-numbers-list numbers))))
(defparameter *system-location* (pathname "/tmp/demo1-prevalence-system/")
  "Filesystem location of the prevalence system")
(defvar *system* (time (make-prevalence-system *system-location*)) )
(execute *system* (make-transaction 'tx-create-numbers-root))
(time (dotimes (i 100000) (execute *system* (make-transaction 'tx-add-number
i)))        )
;(time (snapshot *system*))
(close-open-streams *system*)


I use this script to create a database; later to load it; then to snapshot
it and load it again (uncommenting the appropriate parts between the runs).

Creating and snapshotting is fast; however loading from snapshot is slow.
Times:
creating: 19.89 seconds
loading from transaction log: 53.942 seconds   < this is good
snapshotting: 5.297 seconds
loading from snapshot: 182.713 seconds          < this is strange
snapshotting again: 1.165 seconds

Any ideas what this strangeness can be?


    Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/cl-prevalence-devel/attachments/20070404/ce554d37/attachment.html>

From scaekenberghe at common-lisp.net  Wed Apr  4 11:15:34 2007
From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe)
Date: Wed, 4 Apr 2007 13:15:34 +0200
Subject: [cl-prevalence-devel] cl-prevalence speed issues
In-Reply-To: <19f5edc40704040340j2a7f256bt93990906bfad5fc5@mail.gmail.com>
References: <19f5edc40704040340j2a7f256bt93990906bfad5fc5@mail.gmail.com>
Message-ID: <C03CDFA3-0D4F-46E3-ADDC-608723D8181D@common-lisp.net>

Gabor,

On 04 Apr 2007, at 12:40, Gabor Vitez wrote:

> Hi,
>
> I just started to toy around with cl-prevalence; however I found  
> strange speed issues:
>
> loading a database from transaction log is way faster than loading  
> it from snapshot.
>
> I modified one of the test scripts from the cl-prevalence  
> distribution:
>
> (require 'asdf)
> (require 'cl-prevalence)
> (in-package :cl-prevalence)
> (defclass numbers ()
>   ((numbers-list :accessor get-numbers-list :initform nil))
>     (:documentation "Object to hold our list of numbers"))
> (defun tx-create-numbers-root (system)
>       "Transaction function to create a numbers instance as a root  
> object"
>         (setf (get-root-object system :numbers) (make-instance  
> 'numbers)))
> (defun tx-add-number (system number)
>   "Transaction function to add a number to the numbers list"
>   (let ((numbers (get-root-object system :numbers)))
>     (push number (get-numbers-list numbers))))
> (defparameter *system-location* (pathname "/tmp/demo1-prevalence- 
> system/")
>   "Filesystem location of the prevalence system")
> (defvar *system* (time (make-prevalence-system *system-location*)) )
> (execute *system* (make-transaction 'tx-create-numbers-root))
> (time (dotimes (i 100000) (execute *system* (make-transaction 'tx- 
> add-number i)))        )
> ;(time (snapshot *system*))
> (close-open-streams *system*)
>
>
> I use this script to create a database; later to load it; then to  
> snapshot it and load it again (uncommenting the appropriate parts  
> between the runs).
>
> Creating and snapshotting is fast; however loading from snapshot is  
> slow.
> Times:
> creating: 19.89 seconds
> loading from transaction log: 53.942 seconds   < this is good
> snapshotting: 5.297 seconds
> loading from snapshot: 182.713 seconds          < this is strange
> snapshotting again: 1.165 seconds
>
> Any ideas what this strangeness can be?
>
>
>     Gabor

I haven't run you code or experimented with it, but by looking at it,  
one possible explanation might be the following:

When serializing Lisp datastructures, using either the XML or the S- 
EXPRESSION format, the serializer must constantly watch out for  
shared and circular datastructures. This is done using a hashtable  
holding all Lisp objects seen during a serialization session. Reading  
a small serialized transaction is much less work than reading a 100K  
list. Lists are serialized and deserialized using individual cons  
cells, which is costly. While doing this serialization or  
deserialization, a hashtable of the same size is built and each  
element checked against it. This might in effect by slower to do at  
once than applying 100K transactions.

A possibility to speed this up might be to use a properly sized  
sequence instead of a list. Be sure to look at the resulting  
serialization text file itself too.

HTH,

Sven


From mike at sharedlogic.ca  Thu Apr  5 15:19:35 2007
From: mike at sharedlogic.ca (Michael J. Forster)
Date: Thu, 5 Apr 2007 10:19:35 -0500
Subject: [cl-prevalence-devel] simple-array serialization patch
Message-ID: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca>

Hi Sven,

I don't know if you or anyone else is interested, but I have implemented
xml and sexp serialization/deserialization of simple arrays -- I  
needed it
for an app that uses cl-prevalence.  I've attached the patch.

BTW, I would like to say that cl-prevalence is fantastic.  We've been  
using
it for five non-trivial (>25 classes, avg. 3000 instances per class)  
webapps
without a hitch for almost a year now.

Regards,

Mike

--
Michael J. Forster <mike at sharedlogic.ca>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cl-prevalence-serialization.patch
Type: application/octet-stream
Size: 6163 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/cl-prevalence-devel/attachments/20070405/ce1931eb/attachment.obj>

From scaekenberghe at common-lisp.net  Fri Apr  6 08:34:51 2007
From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe)
Date: Fri, 6 Apr 2007 10:34:51 +0200
Subject: [cl-prevalence-devel] simple-array serialization patch
In-Reply-To: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca>
References: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca>
Message-ID: <0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net>

Mike,

On 05 Apr 2007, at 17:19, Michael J. Forster wrote:

> I don't know if you or anyone else is interested, but I have  
> implemented
> xml and sexp serialization/deserialization of simple arrays -- I  
> needed it
> for an app that uses cl-prevalence.  I've attached the patch.

The patch is OK in terms of code (I guess it is working fine in your  
situation), but I am not sure that it is conceptually correct (but  
maybe I am wrong).

According to my reading of CLHS the type simple-array on itself does  
not guarantee a (what I would call) homogeneous array (an array with  
the same type of element everywhere). The typespecs '(simple-array *)  
and '(simple-array <element-type>) would refer to this, but I don't  
know whether you can use them in method signatures.

Even so, the array-element-type could very well be too general, like  
T or cons or array. In that case, your serialization code fails to  
take shared and circular references into account (you are effectively  
assuming more primitive, non-shared, non-circural element-types -  
which probably works in the way you are using CL-PREVALENCE).

So, as I see and understand it now, your code would be OK, if we  
further qualify it with a test that the array-element-type is  
somewhat 'primitive'. But I am not sure how to express that in the  
method signature or how to test/enforce it in code, maybe we need a  
custom type predicate ?

Also, it would be very helpful if we had unit tests covering your  
extended serialization special cases.

Anyway, your patch would be an important optimalization for better/ 
faster serialization in some important cases!

> BTW, I would like to say that cl-prevalence is fantastic.  We've  
> been using
> it for five non-trivial (>25 classes, avg. 3000 instances per  
> class) webapps
> without a hitch for almost a year now.

That is very nice to hear: could you give some more details, like:

- what CL implementation you are using ?
- what serialization you are using ?
- the typical sizes of you transaction and snapshot files ?
- total number of objects under prevalence, 75000 ?
- rate of change (transaction log growth per day or so) ?
- size of the image ?
- machine details ?
- do you have any GC problems ?
- anything else you want to share

Regards,

Sven


From mike at sharedlogic.ca  Fri Apr  6 17:24:23 2007
From: mike at sharedlogic.ca (Michael J. Forster)
Date: Fri, 6 Apr 2007 12:24:23 -0500
Subject: [cl-prevalence-devel] simple-array serialization patch
In-Reply-To: <0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net>
References: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca>
	<0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net>
Message-ID: <E98C8924-6434-4C17-A278-E446279D5286@sharedlogic.ca>


On 2007-04-06, at 03:34, Sven Van Caekenberghe wrote:

> Mike,
>
> On 05 Apr 2007, at 17:19, Michael J. Forster wrote:
>
>> I don't know if you or anyone else is interested, but I have  
>> implemented
>> xml and sexp serialization/deserialization of simple arrays -- I  
>> needed it
>> for an app that uses cl-prevalence.  I've attached the patch.
>
> The patch is OK in terms of code (I guess it is working fine in  
> your situation), but I am not sure that it is conceptually correct  
> (but maybe I am wrong).
>

No, you are correct, and, in my haste, I posted the patch without  
fully describing
my scenario or intentions.  My apologies.


> According to my reading of CLHS the type simple-array on itself  
> does not guarantee a (what I would call) homogeneous array (an  
> array with the same type of element everywhere). The typespecs  
> '(simple-array *) and '(simple-array <element-type>) would refer to  
> this, but I don't know whether you can use them in method signatures.
>
> Even so, the array-element-type could very well be too general,  
> like T or cons or array. In that case, your serialization code  
> fails to take shared and circular references into account (you are  
> effectively assuming more primitive, non-shared, non-circural  
> element-types - which probably works in the way you are using CL- 
> PREVALENCE).
>
> So, as I see and understand it now, your code would be OK, if we  
> further qualify it with a test that the array-element-type is  
> somewhat 'primitive'. But I am not sure how to express that in the  
> method signature or how to test/enforce it in code, maybe we need a  
> custom type predicate ?
>

Yes, method signatures, one of my bigger CL gripes, though I do  
appreciate the
reasons that the CLOS designers allowed dispatch on class rather than  
type,
including compound typespecs.  (It's like complaining that Feanor's  
Simarils didn't
come in orange. ;-)

I think you nailed the issue in your second last sentence above.  To  
my thinking,
non-vector arrays are concrete types as opposed to the more abstract  
vectors and
lists and even more abstract sequences.  One has to qualify non- 
vector array
element type on a case-by-case basis, which is perfectly acceptable  
-- and
expected -- at the application level, but not for reusable  
libraries.  Hence, the
inviability of my patch.

Really, what I wanted to do was extend the cl-prevalence  
serialization/deserialization
for my-application-specific-2D-array-of-rationals by writing methods  
in my application
sources.  However, while serialize-xml-internal and serialize-sexp- 
internal are generic
functions, the corresponding deserialization functions are not.  So,  
with barely an
hour to deliver a feature, I hacked the ugly hack ;-)

Perhaps the deserialization functions could be reworked as GFs,  
allowing complete
application-specific extension?  I would be happy to help out if  
you're interested.


>> BTW, I would like to say that cl-prevalence is fantastic.  We've  
>> been using
>> it for five non-trivial (>25 classes, avg. 3000 instances per  
>> class) webapps
>> without a hitch for almost a year now.
>
> That is very nice to hear: could you give some more details, like:
>
> - what CL implementation you are using ?

We develop with LW 4.4 and 5.0 on Mac and Windows; we deploy to CMUCL
19b on FreeBSD and LW 5.0 on Mac.

> - what serialization you are using ?

We've tried both and would prefer to use the sexp format for its greater
readability.  However, we started with xml and haven't had an  
opportunity to
change it.


> - machine details ?

Dell 2U
Intel P4 3.2GHz
4GB RAM
160GB usable disk, RAID1

Apple Xserve
G5 Dual 2.3GHz
2GB RAM
140GB usable disk, RAID 5


> - the typical sizes of you transaction and snapshot files ?
> - total number of objects under prevalence, 75000 ?
> - rate of change (transaction log growth per day or so) ?
> - size of the image ?

I will collect some stats over the next few weeks and post them.


> - do you have any GC problems ?

None that we've detected, though, without any outward signs of memory  
exhaustion,
dying processes, or poor overall application performance, we haven't  
gone looking
for trouble.  I will start recording the GC stats as well.


> - anything else you want to share

Probably, yes, though I need to find some time to organize my thoughts.

Suffice it to say, we've built a substantial database management  
layer atop of
cl-prevalence, and, often, when I try to explain it to customers or  
business partners,
most can't understand why we didn't just use SQL, some object- 
relational mapping
package, and so forth.

It's hard to explain, given my rather unique experience in the  
database application
market.  My first employer and mentor, Dave Voorhis, is the author of  
one of only a
handful of true relational database management systems:

	http://dbappbuilder.sourceforge.net/Rel.html

If I can't convince someone that a SQL DBMS is not an RDBMS, then I  
can't begin to
explain why we don't use SQL, why we went to the trouble of building  
our own DBMS,
and why we can, legitimately, call it a RDBMS in spite of the word  
"prevalence" and
the associated flame-fest.

Anyway, sorry, the rant wasn't meant for you. :-)  Simply covering my  
corporate butt in
case a customer or competitor ever reads this and attempts to  
misrepresent our position.
In the end, cl-prevalence is a real boon to our work.  If you have a  
PayPal button for the
project, I would happily click it!


Regards,

Mike


--
Michael J. Forster <mike at sharedlogic.ca>


From ndj at hivsa.com  Wed Apr 18 09:33:51 2007
From: ndj at hivsa.com (Nico de Jager)
Date: Wed, 18 Apr 2007 11:33:51 +0200
Subject: [cl-prevalence-devel] Removal of managed prevalence object does not
	delete corresponding non-id indexes' entries and duplicate
	index values allowed
Message-ID: <200704181133.52037.ndj@hivsa.com>

Hi

When one deletes an object with tx-delete-object after creating an index on a slot (other than id) of a managed prevalence class, the corresponding index entry is not deleted. Because adding a new object with tx-create-object appears to update all the indexes on the class, one would expect tx-delete-object to also update the indexes accordingly. Is this a bug or the intended behaviour?

Also, shouldn't tx-create-object (or tx-change-object-slots) signal a condition when the addition of an object or the change of a slot results in duplicate slot values for an indexed slot?

Thanks.
Nico


From scaekenberghe at common-lisp.net  Wed Apr 18 11:59:57 2007
From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe)
Date: Wed, 18 Apr 2007 13:59:57 +0200
Subject: [cl-prevalence-devel] Removal of managed prevalence object does
	not delete corresponding non-id indexes' entries and duplicate
	index values allowed
In-Reply-To: <200704181133.52037.ndj@hivsa.com>
References: <200704181133.52037.ndj@hivsa.com>
Message-ID: <B1095611-99B5-4868-88F3-49BBFF2CD6F3@common-lisp.net>

Nico,

On 18 Apr 2007, at 11:33, Nico de Jager wrote:


> When one deletes an object with tx-delete-object after creating an  
> index on a slot (other than id) of a managed prevalence class, the  
> corresponding index entry is not deleted. Because adding a new  
> object with tx-create-object appears to update all the indexes on  
> the class, one would expect tx-delete-object to also update the  
> indexes accordingly. Is this a bug or the intended behaviour?
>

The 'indexes on arbitrary slots' was contributed at one point. I do  
think the implementation was quite clear and simple.
But you are right, this is a bug (or oversight): when deleting an  
object, the index entries should be cleaned up.


> Also, shouldn't tx-create-object (or tx-change-object-slots) signal  
> a condition when the addition of an object or the change of a slot  
> results in duplicate slot values for an indexed slot?
>

Apart from indexes on keys (like ID), which are necessarily unique,  
indexes on arbitrary slots could theoretically be unique or not (as  
in SQL). The current implementation is using hashtables and is  
overwriting entries: so there too, there is a bug (or oversight):  
either we should enforce unique indexes by signalling errors, or we  
should allow multiple objects with the same indexed slot value, but  
then there should be a list as value there (and we have to manage all  
that correctly, esp. wrt. deleting objects).

What to you think ? How are you using this feature ?

So, thanks for reporting this problem. Do you feel like trying to fix  
this with a patch ?

Are there any other people on the list who would like to comment ?

Sven


From ndj at hivsa.com  Wed Apr 18 13:19:30 2007
From: ndj at hivsa.com (Nico de Jager)
Date: Wed, 18 Apr 2007 15:19:30 +0200
Subject: [cl-prevalence-devel] Removal of managed prevalence object does
	not delete corresponding non-id indexes' entries and duplicate
	index values allowed
In-Reply-To: <344DC255-1047-4D27-A82F-751C74631730@beta9.be>
References: <200704181133.52037.ndj@hivsa.com>
	<344DC255-1047-4D27-A82F-751C74631730@beta9.be>
Message-ID: <200704181519.30642.ndj@hivsa.com>

On Wednesday 18 April 2007 13:45, Sven Van Caekenberghe wrote:
> Nico,
> 
> On 18 Apr 2007, at 11:33, Nico de Jager wrote:
> 
> > When one deletes an object with tx-delete-object after creating an  
> > index on a slot (other than id) of a managed prevalence class, the  
> > corresponding index entry is not deleted. Because adding a new  
> > object with tx-create-object appears to update all the indexes on  
> > the class, one would expect tx-delete-object to also update the  
> > indexes accordingly. Is this a bug or the intended behaviour?
> 
> The 'indexes on arbitrary slots' was contributed at one point. I do  
> think the implementation was quite clear and simple.
> But you are right, this is a bug (or oversight): when deleting an  
> object, the index entries should be cleaned up.
> 
> > Also, shouldn't tx-create-object (or tx-change-object-slots) signal  
> > a condition when the addition of an object or the change of a slot  
> > results in duplicate slot values for an indexed slot?
> 
> Apart from indexes on keys (like ID), which are necessarily unique,  
> indexes on arbitrary slots could theoretically be unique or not (as  
> in SQL). The current implementation is using hashtables and is  
> overwriting entries: so there too, there is a bug (or oversight):  
> either we should enforce unique indexes by signalling errors, or we  
> should allow multiple objects with the same indexed slot value, but  
> then there should be a list as value there (and we have to manage all  
> that correctly, esp. wrt. deleting objects).
> 
> What to you think ? How are you using this feature ?

It would be really nice if INDEX-ON had an optional UNIQUE parameter that allows one to choose between either a unique or non-unique index. I would use both types.

B.t.w. why is SLOTS an optional parameter in the definition of INDEX-ON? As it is the function would do nothing if SLOTS is not supplied:

(defun index-on (system class &optional slots (test 'equalp))
  "Create indexes on each of the slots provided."
  (dolist (slot slots)
    (execute-transaction (tx-create-objects-slot-index system class slot test))))

> 
> So, thanks for reporting this problem. Do you feel like trying to fix  
> this with a patch ?

I would give it a try as soon as I can find the time. I am not an expert Lisper yet, so it will probably take me longer. But it will definitely be worth it, I have replaced all my CLSQL databases with cl-prevalence.

> 
> Are there any other people on the list who would like to comment ?
> 
> Sven
> 

Thanks for your quick feedback.

Nico


From scaekenberghe at common-lisp.net  Thu Apr 19 15:03:45 2007
From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe)
Date: Thu, 19 Apr 2007 17:03:45 +0200
Subject: [cl-prevalence-devel] simple-array serialization patch
In-Reply-To: <E98C8924-6434-4C17-A278-E446279D5286@sharedlogic.ca>
References: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca>
	<0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net>
	<E98C8924-6434-4C17-A278-E446279D5286@sharedlogic.ca>
Message-ID: <F9CE5A2A-1CB3-4367-8539-416AF6E57024@common-lisp.net>


On 06 Apr 2007, at 19:24, Michael J. Forster wrote:

>>> BTW, I would like to say that cl-prevalence is fantastic.  We've  
>>> been using
>>> it for five non-trivial (>25 classes, avg. 3000 instances per  
>>> class) webapps
>>> without a hitch for almost a year now.
>>
>> That is very nice to hear: could you give some more details, like:
>>
>> - what CL implementation you are using ?
>
> We develop with LW 4.4 and 5.0 on Mac and Windows; we deploy to CMUCL
> 19b on FreeBSD and LW 5.0 on Mac.
>
>> - what serialization you are using ?
>
> We've tried both and would prefer to use the sexp format for its  
> greater
> readability.  However, we started with xml and haven't had an  
> opportunity to
> change it.
>
>> - machine details ?
>
> Dell 2U
> Intel P4 3.2GHz
> 4GB RAM
> 160GB usable disk, RAID1
>
> Apple Xserve
> G5 Dual 2.3GHz
> 2GB RAM
> 140GB usable disk, RAID 5
>
>> - the typical sizes of you transaction and snapshot files ?
>> - total number of objects under prevalence, 75000 ?
>> - rate of change (transaction log growth per day or so) ?
>> - size of the image ?
>
> I will collect some stats over the next few weeks and post them.
>
>> - do you have any GC problems ?
>
> None that we've detected, though, without any outward signs of  
> memory exhaustion,
> dying processes, or poor overall application performance, we  
> haven't gone looking
> for trouble.  I will start recording the GC stats as well.

This sounds like an awesome application!
I am glad CL-PREVALENCE helped you in achieving your goals.

Regards,

Sven