From gabor.vitez at gmail.com Wed Apr 4 10:40:26 2007 From: gabor.vitez at gmail.com (Gabor Vitez) Date: Wed, 4 Apr 2007 12:40:26 +0200 Subject: [cl-prevalence-devel] cl-prevalence speed issues Message-ID: <19f5edc40704040340j2a7f256bt93990906bfad5fc5@mail.gmail.com> Hi, I just started to toy around with cl-prevalence; however I found strange speed issues: loading a database from transaction log is way faster than loading it from snapshot. I modified one of the test scripts from the cl-prevalence distribution: (require 'asdf) (require 'cl-prevalence) (in-package :cl-prevalence) (defclass numbers () ((numbers-list :accessor get-numbers-list :initform nil)) (:documentation "Object to hold our list of numbers")) (defun tx-create-numbers-root (system) "Transaction function to create a numbers instance as a root object" (setf (get-root-object system :numbers) (make-instance 'numbers))) (defun tx-add-number (system number) "Transaction function to add a number to the numbers list" (let ((numbers (get-root-object system :numbers))) (push number (get-numbers-list numbers)))) (defparameter *system-location* (pathname "/tmp/demo1-prevalence-system/") "Filesystem location of the prevalence system") (defvar *system* (time (make-prevalence-system *system-location*)) ) (execute *system* (make-transaction 'tx-create-numbers-root)) (time (dotimes (i 100000) (execute *system* (make-transaction 'tx-add-number i))) ) ;(time (snapshot *system*)) (close-open-streams *system*) I use this script to create a database; later to load it; then to snapshot it and load it again (uncommenting the appropriate parts between the runs). Creating and snapshotting is fast; however loading from snapshot is slow. Times: creating: 19.89 seconds loading from transaction log: 53.942 seconds < this is good snapshotting: 5.297 seconds loading from snapshot: 182.713 seconds < this is strange snapshotting again: 1.165 seconds Any ideas what this strangeness can be? Gabor -------------- next part -------------- An HTML attachment was scrubbed... URL: From scaekenberghe at common-lisp.net Wed Apr 4 11:15:34 2007 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Wed, 4 Apr 2007 13:15:34 +0200 Subject: [cl-prevalence-devel] cl-prevalence speed issues In-Reply-To: <19f5edc40704040340j2a7f256bt93990906bfad5fc5@mail.gmail.com> References: <19f5edc40704040340j2a7f256bt93990906bfad5fc5@mail.gmail.com> Message-ID: Gabor, On 04 Apr 2007, at 12:40, Gabor Vitez wrote: > Hi, > > I just started to toy around with cl-prevalence; however I found > strange speed issues: > > loading a database from transaction log is way faster than loading > it from snapshot. > > I modified one of the test scripts from the cl-prevalence > distribution: > > (require 'asdf) > (require 'cl-prevalence) > (in-package :cl-prevalence) > (defclass numbers () > ((numbers-list :accessor get-numbers-list :initform nil)) > (:documentation "Object to hold our list of numbers")) > (defun tx-create-numbers-root (system) > "Transaction function to create a numbers instance as a root > object" > (setf (get-root-object system :numbers) (make-instance > 'numbers))) > (defun tx-add-number (system number) > "Transaction function to add a number to the numbers list" > (let ((numbers (get-root-object system :numbers))) > (push number (get-numbers-list numbers)))) > (defparameter *system-location* (pathname "/tmp/demo1-prevalence- > system/") > "Filesystem location of the prevalence system") > (defvar *system* (time (make-prevalence-system *system-location*)) ) > (execute *system* (make-transaction 'tx-create-numbers-root)) > (time (dotimes (i 100000) (execute *system* (make-transaction 'tx- > add-number i))) ) > ;(time (snapshot *system*)) > (close-open-streams *system*) > > > I use this script to create a database; later to load it; then to > snapshot it and load it again (uncommenting the appropriate parts > between the runs). > > Creating and snapshotting is fast; however loading from snapshot is > slow. > Times: > creating: 19.89 seconds > loading from transaction log: 53.942 seconds < this is good > snapshotting: 5.297 seconds > loading from snapshot: 182.713 seconds < this is strange > snapshotting again: 1.165 seconds > > Any ideas what this strangeness can be? > > > Gabor I haven't run you code or experimented with it, but by looking at it, one possible explanation might be the following: When serializing Lisp datastructures, using either the XML or the S- EXPRESSION format, the serializer must constantly watch out for shared and circular datastructures. This is done using a hashtable holding all Lisp objects seen during a serialization session. Reading a small serialized transaction is much less work than reading a 100K list. Lists are serialized and deserialized using individual cons cells, which is costly. While doing this serialization or deserialization, a hashtable of the same size is built and each element checked against it. This might in effect by slower to do at once than applying 100K transactions. A possibility to speed this up might be to use a properly sized sequence instead of a list. Be sure to look at the resulting serialization text file itself too. HTH, Sven From mike at sharedlogic.ca Thu Apr 5 15:19:35 2007 From: mike at sharedlogic.ca (Michael J. Forster) Date: Thu, 5 Apr 2007 10:19:35 -0500 Subject: [cl-prevalence-devel] simple-array serialization patch Message-ID: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca> Hi Sven, I don't know if you or anyone else is interested, but I have implemented xml and sexp serialization/deserialization of simple arrays -- I needed it for an app that uses cl-prevalence. I've attached the patch. BTW, I would like to say that cl-prevalence is fantastic. We've been using it for five non-trivial (>25 classes, avg. 3000 instances per class) webapps without a hitch for almost a year now. Regards, Mike -- Michael J. Forster -------------- next part -------------- A non-text attachment was scrubbed... Name: cl-prevalence-serialization.patch Type: application/octet-stream Size: 6163 bytes Desc: not available URL: From scaekenberghe at common-lisp.net Fri Apr 6 08:34:51 2007 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Fri, 6 Apr 2007 10:34:51 +0200 Subject: [cl-prevalence-devel] simple-array serialization patch In-Reply-To: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca> References: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca> Message-ID: <0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net> Mike, On 05 Apr 2007, at 17:19, Michael J. Forster wrote: > I don't know if you or anyone else is interested, but I have > implemented > xml and sexp serialization/deserialization of simple arrays -- I > needed it > for an app that uses cl-prevalence. I've attached the patch. The patch is OK in terms of code (I guess it is working fine in your situation), but I am not sure that it is conceptually correct (but maybe I am wrong). According to my reading of CLHS the type simple-array on itself does not guarantee a (what I would call) homogeneous array (an array with the same type of element everywhere). The typespecs '(simple-array *) and '(simple-array ) would refer to this, but I don't know whether you can use them in method signatures. Even so, the array-element-type could very well be too general, like T or cons or array. In that case, your serialization code fails to take shared and circular references into account (you are effectively assuming more primitive, non-shared, non-circural element-types - which probably works in the way you are using CL-PREVALENCE). So, as I see and understand it now, your code would be OK, if we further qualify it with a test that the array-element-type is somewhat 'primitive'. But I am not sure how to express that in the method signature or how to test/enforce it in code, maybe we need a custom type predicate ? Also, it would be very helpful if we had unit tests covering your extended serialization special cases. Anyway, your patch would be an important optimalization for better/ faster serialization in some important cases! > BTW, I would like to say that cl-prevalence is fantastic. We've > been using > it for five non-trivial (>25 classes, avg. 3000 instances per > class) webapps > without a hitch for almost a year now. That is very nice to hear: could you give some more details, like: - what CL implementation you are using ? - what serialization you are using ? - the typical sizes of you transaction and snapshot files ? - total number of objects under prevalence, 75000 ? - rate of change (transaction log growth per day or so) ? - size of the image ? - machine details ? - do you have any GC problems ? - anything else you want to share Regards, Sven From mike at sharedlogic.ca Fri Apr 6 17:24:23 2007 From: mike at sharedlogic.ca (Michael J. Forster) Date: Fri, 6 Apr 2007 12:24:23 -0500 Subject: [cl-prevalence-devel] simple-array serialization patch In-Reply-To: <0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net> References: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca> <0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net> Message-ID: On 2007-04-06, at 03:34, Sven Van Caekenberghe wrote: > Mike, > > On 05 Apr 2007, at 17:19, Michael J. Forster wrote: > >> I don't know if you or anyone else is interested, but I have >> implemented >> xml and sexp serialization/deserialization of simple arrays -- I >> needed it >> for an app that uses cl-prevalence. I've attached the patch. > > The patch is OK in terms of code (I guess it is working fine in > your situation), but I am not sure that it is conceptually correct > (but maybe I am wrong). > No, you are correct, and, in my haste, I posted the patch without fully describing my scenario or intentions. My apologies. > According to my reading of CLHS the type simple-array on itself > does not guarantee a (what I would call) homogeneous array (an > array with the same type of element everywhere). The typespecs > '(simple-array *) and '(simple-array ) would refer to > this, but I don't know whether you can use them in method signatures. > > Even so, the array-element-type could very well be too general, > like T or cons or array. In that case, your serialization code > fails to take shared and circular references into account (you are > effectively assuming more primitive, non-shared, non-circural > element-types - which probably works in the way you are using CL- > PREVALENCE). > > So, as I see and understand it now, your code would be OK, if we > further qualify it with a test that the array-element-type is > somewhat 'primitive'. But I am not sure how to express that in the > method signature or how to test/enforce it in code, maybe we need a > custom type predicate ? > Yes, method signatures, one of my bigger CL gripes, though I do appreciate the reasons that the CLOS designers allowed dispatch on class rather than type, including compound typespecs. (It's like complaining that Feanor's Simarils didn't come in orange. ;-) I think you nailed the issue in your second last sentence above. To my thinking, non-vector arrays are concrete types as opposed to the more abstract vectors and lists and even more abstract sequences. One has to qualify non- vector array element type on a case-by-case basis, which is perfectly acceptable -- and expected -- at the application level, but not for reusable libraries. Hence, the inviability of my patch. Really, what I wanted to do was extend the cl-prevalence serialization/deserialization for my-application-specific-2D-array-of-rationals by writing methods in my application sources. However, while serialize-xml-internal and serialize-sexp- internal are generic functions, the corresponding deserialization functions are not. So, with barely an hour to deliver a feature, I hacked the ugly hack ;-) Perhaps the deserialization functions could be reworked as GFs, allowing complete application-specific extension? I would be happy to help out if you're interested. >> BTW, I would like to say that cl-prevalence is fantastic. We've >> been using >> it for five non-trivial (>25 classes, avg. 3000 instances per >> class) webapps >> without a hitch for almost a year now. > > That is very nice to hear: could you give some more details, like: > > - what CL implementation you are using ? We develop with LW 4.4 and 5.0 on Mac and Windows; we deploy to CMUCL 19b on FreeBSD and LW 5.0 on Mac. > - what serialization you are using ? We've tried both and would prefer to use the sexp format for its greater readability. However, we started with xml and haven't had an opportunity to change it. > - machine details ? Dell 2U Intel P4 3.2GHz 4GB RAM 160GB usable disk, RAID1 Apple Xserve G5 Dual 2.3GHz 2GB RAM 140GB usable disk, RAID 5 > - the typical sizes of you transaction and snapshot files ? > - total number of objects under prevalence, 75000 ? > - rate of change (transaction log growth per day or so) ? > - size of the image ? I will collect some stats over the next few weeks and post them. > - do you have any GC problems ? None that we've detected, though, without any outward signs of memory exhaustion, dying processes, or poor overall application performance, we haven't gone looking for trouble. I will start recording the GC stats as well. > - anything else you want to share Probably, yes, though I need to find some time to organize my thoughts. Suffice it to say, we've built a substantial database management layer atop of cl-prevalence, and, often, when I try to explain it to customers or business partners, most can't understand why we didn't just use SQL, some object- relational mapping package, and so forth. It's hard to explain, given my rather unique experience in the database application market. My first employer and mentor, Dave Voorhis, is the author of one of only a handful of true relational database management systems: http://dbappbuilder.sourceforge.net/Rel.html If I can't convince someone that a SQL DBMS is not an RDBMS, then I can't begin to explain why we don't use SQL, why we went to the trouble of building our own DBMS, and why we can, legitimately, call it a RDBMS in spite of the word "prevalence" and the associated flame-fest. Anyway, sorry, the rant wasn't meant for you. :-) Simply covering my corporate butt in case a customer or competitor ever reads this and attempts to misrepresent our position. In the end, cl-prevalence is a real boon to our work. If you have a PayPal button for the project, I would happily click it! Regards, Mike -- Michael J. Forster From ndj at hivsa.com Wed Apr 18 09:33:51 2007 From: ndj at hivsa.com (Nico de Jager) Date: Wed, 18 Apr 2007 11:33:51 +0200 Subject: [cl-prevalence-devel] Removal of managed prevalence object does not delete corresponding non-id indexes' entries and duplicate index values allowed Message-ID: <200704181133.52037.ndj@hivsa.com> Hi When one deletes an object with tx-delete-object after creating an index on a slot (other than id) of a managed prevalence class, the corresponding index entry is not deleted. Because adding a new object with tx-create-object appears to update all the indexes on the class, one would expect tx-delete-object to also update the indexes accordingly. Is this a bug or the intended behaviour? Also, shouldn't tx-create-object (or tx-change-object-slots) signal a condition when the addition of an object or the change of a slot results in duplicate slot values for an indexed slot? Thanks. Nico From scaekenberghe at common-lisp.net Wed Apr 18 11:59:57 2007 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Wed, 18 Apr 2007 13:59:57 +0200 Subject: [cl-prevalence-devel] Removal of managed prevalence object does not delete corresponding non-id indexes' entries and duplicate index values allowed In-Reply-To: <200704181133.52037.ndj@hivsa.com> References: <200704181133.52037.ndj@hivsa.com> Message-ID: Nico, On 18 Apr 2007, at 11:33, Nico de Jager wrote: > When one deletes an object with tx-delete-object after creating an > index on a slot (other than id) of a managed prevalence class, the > corresponding index entry is not deleted. Because adding a new > object with tx-create-object appears to update all the indexes on > the class, one would expect tx-delete-object to also update the > indexes accordingly. Is this a bug or the intended behaviour? > The 'indexes on arbitrary slots' was contributed at one point. I do think the implementation was quite clear and simple. But you are right, this is a bug (or oversight): when deleting an object, the index entries should be cleaned up. > Also, shouldn't tx-create-object (or tx-change-object-slots) signal > a condition when the addition of an object or the change of a slot > results in duplicate slot values for an indexed slot? > Apart from indexes on keys (like ID), which are necessarily unique, indexes on arbitrary slots could theoretically be unique or not (as in SQL). The current implementation is using hashtables and is overwriting entries: so there too, there is a bug (or oversight): either we should enforce unique indexes by signalling errors, or we should allow multiple objects with the same indexed slot value, but then there should be a list as value there (and we have to manage all that correctly, esp. wrt. deleting objects). What to you think ? How are you using this feature ? So, thanks for reporting this problem. Do you feel like trying to fix this with a patch ? Are there any other people on the list who would like to comment ? Sven From ndj at hivsa.com Wed Apr 18 13:19:30 2007 From: ndj at hivsa.com (Nico de Jager) Date: Wed, 18 Apr 2007 15:19:30 +0200 Subject: [cl-prevalence-devel] Removal of managed prevalence object does not delete corresponding non-id indexes' entries and duplicate index values allowed In-Reply-To: <344DC255-1047-4D27-A82F-751C74631730@beta9.be> References: <200704181133.52037.ndj@hivsa.com> <344DC255-1047-4D27-A82F-751C74631730@beta9.be> Message-ID: <200704181519.30642.ndj@hivsa.com> On Wednesday 18 April 2007 13:45, Sven Van Caekenberghe wrote: > Nico, > > On 18 Apr 2007, at 11:33, Nico de Jager wrote: > > > When one deletes an object with tx-delete-object after creating an > > index on a slot (other than id) of a managed prevalence class, the > > corresponding index entry is not deleted. Because adding a new > > object with tx-create-object appears to update all the indexes on > > the class, one would expect tx-delete-object to also update the > > indexes accordingly. Is this a bug or the intended behaviour? > > The 'indexes on arbitrary slots' was contributed at one point. I do > think the implementation was quite clear and simple. > But you are right, this is a bug (or oversight): when deleting an > object, the index entries should be cleaned up. > > > Also, shouldn't tx-create-object (or tx-change-object-slots) signal > > a condition when the addition of an object or the change of a slot > > results in duplicate slot values for an indexed slot? > > Apart from indexes on keys (like ID), which are necessarily unique, > indexes on arbitrary slots could theoretically be unique or not (as > in SQL). The current implementation is using hashtables and is > overwriting entries: so there too, there is a bug (or oversight): > either we should enforce unique indexes by signalling errors, or we > should allow multiple objects with the same indexed slot value, but > then there should be a list as value there (and we have to manage all > that correctly, esp. wrt. deleting objects). > > What to you think ? How are you using this feature ? It would be really nice if INDEX-ON had an optional UNIQUE parameter that allows one to choose between either a unique or non-unique index. I would use both types. B.t.w. why is SLOTS an optional parameter in the definition of INDEX-ON? As it is the function would do nothing if SLOTS is not supplied: (defun index-on (system class &optional slots (test 'equalp)) "Create indexes on each of the slots provided." (dolist (slot slots) (execute-transaction (tx-create-objects-slot-index system class slot test)))) > > So, thanks for reporting this problem. Do you feel like trying to fix > this with a patch ? I would give it a try as soon as I can find the time. I am not an expert Lisper yet, so it will probably take me longer. But it will definitely be worth it, I have replaced all my CLSQL databases with cl-prevalence. > > Are there any other people on the list who would like to comment ? > > Sven > Thanks for your quick feedback. Nico From scaekenberghe at common-lisp.net Thu Apr 19 15:03:45 2007 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Thu, 19 Apr 2007 17:03:45 +0200 Subject: [cl-prevalence-devel] simple-array serialization patch In-Reply-To: References: <77DC7F26-D196-45DE-A390-824FF87E41CD@sharedlogic.ca> <0A9F2735-B147-4EFB-BCC8-853D57C5370B@common-lisp.net> Message-ID: On 06 Apr 2007, at 19:24, Michael J. Forster wrote: >>> BTW, I would like to say that cl-prevalence is fantastic. We've >>> been using >>> it for five non-trivial (>25 classes, avg. 3000 instances per >>> class) webapps >>> without a hitch for almost a year now. >> >> That is very nice to hear: could you give some more details, like: >> >> - what CL implementation you are using ? > > We develop with LW 4.4 and 5.0 on Mac and Windows; we deploy to CMUCL > 19b on FreeBSD and LW 5.0 on Mac. > >> - what serialization you are using ? > > We've tried both and would prefer to use the sexp format for its > greater > readability. However, we started with xml and haven't had an > opportunity to > change it. > >> - machine details ? > > Dell 2U > Intel P4 3.2GHz > 4GB RAM > 160GB usable disk, RAID1 > > Apple Xserve > G5 Dual 2.3GHz > 2GB RAM > 140GB usable disk, RAID 5 > >> - the typical sizes of you transaction and snapshot files ? >> - total number of objects under prevalence, 75000 ? >> - rate of change (transaction log growth per day or so) ? >> - size of the image ? > > I will collect some stats over the next few weeks and post them. > >> - do you have any GC problems ? > > None that we've detected, though, without any outward signs of > memory exhaustion, > dying processes, or poor overall application performance, we > haven't gone looking > for trouble. I will start recording the GC stats as well. This sounds like an awesome application! I am glad CL-PREVALENCE helped you in achieving your goals. Regards, Sven