From ch-rucksack at bobobeach.com Mon Jan 8 23:28:25 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Mon, 8 Jan 2007 15:28:25 -0800 Subject: [rucksack-devel] sbcl compile warning Message-ID: I haven't had a chance to look at this in detail yet, but I thought I'd pass it along first: ; file: /Users/sly/src/lisp/rucksack/rucksack/heap.lisp ; in: DEFGENERIC INITIALIZE-BLOCK ; (DEFGENERIC RUCKSACK::INITIALIZE-BLOCK ; (BLOCK RUCKSACK::BLOCK-SIZE RUCKSACK:HEAP) ; (:METHOD ; (BLOCK RUCKSACK::BLOCK-SIZE ; (RUCKSACK:HEAP RUCKSACK:FREE-LIST-HEAP)) ; (DECLARE (IGNORE BLOCK RUCKSACK::BLOCK-SIZE)) BLOCK)) ; --> PROGN PUSH LET* LET* DEFMETHOD PROGN SB-PCL::LOAD-DEFMETHOD LIST* ; --> LET* SB-INT:NAMED-LAMBDA FUNCTION CATCH BLOCK SB-C::%WITHIN- CLEANUP ; --> SYMBOL-MACROLET SB-PCL::FAST-LEXICAL-METHOD-FUNCTIONS ; --> SB-PCL::BIND-FAST-LEXICAL-METHOD-FUNCTIONS LOCALLY SB-PCL::BIND- ARGS ; --> LET* LOCALLY SYMBOL-MACROLET BLOCK ; ==> ; BLOCK ; ; caught STYLE-WARNING: ; reading an ignored variable: BLOCK ; compiling (DEFMETHOD HEAP-INFO ...) ; file: /Users/sly/src/lisp/rucksack/rucksack/heap.lisp ; in: DEFMETHOD HEAP-INFO (FREE-LIST-HEAP) ; (GETF RUCKSACK::PLIST :NR-FREE-OCTETS) ; --> BLOCK DO BLOCK LET TAGBODY RETURN-FROM PROGN ; ==> ; SB-IMPL::DEFAULT ; ; caught WARNING: ; This is not a NUMBER: ; NIL ; See also: ; The SBCL Manual, Node "Handling of Types" Cyrus From ch-rucksack at bobobeach.com Wed Jan 10 07:43:30 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Tue, 9 Jan 2007 23:43:30 -0800 Subject: [rucksack-devel] indexing issue? In-Reply-To: References: <42002F55-6253-465C-BC6F-33CE9A84DA80@bobobeach.com> <15FC081E-4C24-4CE6-8C5D-D6C3AE30AFA6@bobobeach.com> Message-ID: <48E9DB9B-ED5E-4B49-9193-7CF327DED1CC@bobobeach.com> On Nov 30, 2006, at 10:43 AM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> What is the intended behavior of a unique slot? > > If a class has a unique slot, you (the programmer) promise that there > won't be two instances of that class which have a 'similar' value > for that slot. The definition of 'similar' depends on the kind of > index you create (see the predefined index specs in index.lisp for > some examples). > >> I'm able to make-instance new instances of a persistent class with >> a duplicate slot value > > Yes, I think that Rucksack doesn't check for this at the moment. > But it should have signalled an error, I think. > >> which seems fine > > I don't think so ;-) [some months pass...] so, yes, it seems like this should signal an error. and it looks like there's code in there to do that, but yet it doesn't seem to signal an error. Any idea why not? Thanks, Cyrus From ch-rucksack at bobobeach.com Wed Jan 10 21:13:25 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Wed, 10 Jan 2007 13:13:25 -0800 Subject: PATCH Re: [rucksack-devel] indexing issue? In-Reply-To: <48E9DB9B-ED5E-4B49-9193-7CF327DED1CC@bobobeach.com> References: <42002F55-6253-465C-BC6F-33CE9A84DA80@bobobeach.com> <15FC081E-4C24-4CE6-8C5D-D6C3AE30AFA6@bobobeach.com> <48E9DB9B-ED5E-4B49-9193-7CF327DED1CC@bobobeach.com> Message-ID: Ok, I think I've figured out what's wrong with unique slots: --- mop.lisp 28 Nov 2006 15:25:57 -0800 1.11 +++ mop.lisp 10 Jan 2007 13:12:19 -0800 @@ -250,6 +250,17 @@ (setf (slot-value effective-slotdef 'index) (slot-index (car index-slotdefs)))))) + ;; If exactly one direct slot is unique, then the effective one is + ;; too. If more then one is unique, signal an error. + (let ((unique-slotdefs (remove-if-not #'slot-unique persistent- slotdefs))) + (cond ((cdr unique-slotdefs) + (error "Multiple uniques for slot ~S in ~S:~% ~{~S~^, ~}." + slot-name class + (mapcar #'slot-unique unique-slotdefs))) + (unique-slotdefs + (setf (slot-value effective-slotdef 'unique) + (slot-unique (car unique-slotdefs)))))) + ;; Return the effective slot definition. effective-slotdef)) I think c-e-s-d needs the previous patch in order to set the effective-slot-definition's unique slot properly. I'm not sure it's appropriate to signal an error, but it seems better than allowing both :unique t and unique :no-error to be specified. Cyrus On Jan 9, 2007, at 11:43 PM, Cyrus Harmon wrote: > > On Nov 30, 2006, at 10:43 AM, Arthur Lemmens wrote: > >> Cyrus Harmon wrote: >> >>> What is the intended behavior of a unique slot? >> >> If a class has a unique slot, you (the programmer) promise that there >> won't be two instances of that class which have a 'similar' value >> for that slot. The definition of 'similar' depends on the kind of >> index you create (see the predefined index specs in index.lisp for >> some examples). >> >>> I'm able to make-instance new instances of a persistent class with >>> a duplicate slot value >> >> Yes, I think that Rucksack doesn't check for this at the moment. >> But it should have signalled an error, I think. >> >>> which seems fine >> >> I don't think so ;-) > > [some months pass...] so, yes, it seems like this should signal an > error. and it looks like there's code in there to do that, but yet > it doesn't seem to signal an error. Any idea why not? > > Thanks, > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Wed Jan 10 21:40:02 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Wed, 10 Jan 2007 13:40:02 -0800 Subject: PATCH Re: [rucksack-devel] indexing issue? In-Reply-To: References: <42002F55-6253-465C-BC6F-33CE9A84DA80@bobobeach.com> <15FC081E-4C24-4CE6-8C5D-D6C3AE30AFA6@bobobeach.com> <48E9DB9B-ED5E-4B49-9193-7CF327DED1CC@bobobeach.com> Message-ID: <75945995-3018-494A-BE4F-4162C0B40E41@bobobeach.com> Well, this is part of the story anyway. It still doesn't give the behavior I want on trying to make an instance with a duplicate value, but I think this is a step in the right direction. Also, instead of calling these slot-unique, slot-index, etc... shouldn't the convention be slot-definition-unique, etc...? Cyrus On Jan 10, 2007, at 1:13 PM, Cyrus Harmon wrote: > Ok, I think I've figured out what's wrong with unique slots: > > --- mop.lisp 28 Nov 2006 15:25:57 -0800 1.11 > +++ mop.lisp 10 Jan 2007 13:12:19 -0800 > @@ -250,6 +250,17 @@ > (setf (slot-value effective-slotdef 'index) > (slot-index (car index-slotdefs)))))) > > + ;; If exactly one direct slot is unique, then the effective > one is > + ;; too. If more then one is unique, signal an error. > + (let ((unique-slotdefs (remove-if-not #'slot-unique persistent- > slotdefs))) > + (cond ((cdr unique-slotdefs) > + (error "Multiple uniques for slot ~S in ~S:~% ~{~S~^, > ~}." > + slot-name class > + (mapcar #'slot-unique unique-slotdefs))) > + (unique-slotdefs > + (setf (slot-value effective-slotdef 'unique) > + (slot-unique (car unique-slotdefs)))))) > + > ;; Return the effective slot definition. > effective-slotdef)) > > I think c-e-s-d needs the previous patch in order to set the > effective-slot-definition's unique slot properly. I'm not sure it's > appropriate to signal an error, but it seems better than allowing > both :unique t and unique :no-error to be specified. > > Cyrus > > > On Jan 9, 2007, at 11:43 PM, Cyrus Harmon wrote: > >> >> On Nov 30, 2006, at 10:43 AM, Arthur Lemmens wrote: >> >>> Cyrus Harmon wrote: >>> >>>> What is the intended behavior of a unique slot? >>> >>> If a class has a unique slot, you (the programmer) promise that >>> there >>> won't be two instances of that class which have a 'similar' value >>> for that slot. The definition of 'similar' depends on the kind of >>> index you create (see the predefined index specs in index.lisp for >>> some examples). >>> >>>> I'm able to make-instance new instances of a persistent class with >>>> a duplicate slot value >>> >>> Yes, I think that Rucksack doesn't check for this at the moment. >>> But it should have signalled an error, I think. >>> >>>> which seems fine >>> >>> I don't think so ;-) >> >> [some months pass...] so, yes, it seems like this should signal an >> error. and it looks like there's code in there to do that, but yet >> it doesn't seem to signal an error. Any idea why not? >> >> Thanks, >> >> Cyrus >> >> _______________________________________________ >> rucksack-devel mailing list >> rucksack-devel at common-lisp.net >> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Wed Jan 10 21:44:17 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Wed, 10 Jan 2007 13:44:17 -0800 Subject: PATCH Re: [rucksack-devel] indexing issue? In-Reply-To: <75945995-3018-494A-BE4F-4162C0B40E41@bobobeach.com> References: <42002F55-6253-465C-BC6F-33CE9A84DA80@bobobeach.com> <15FC081E-4C24-4CE6-8C5D-D6C3AE30AFA6@bobobeach.com> <48E9DB9B-ED5E-4B49-9193-7CF327DED1CC@bobobeach.com> <75945995-3018-494A-BE4F-4162C0B40E41@bobobeach.com> Message-ID: <3DDC8C36-D079-4F76-ACD5-B06315896EE2@bobobeach.com> Furthermore, while the :no-error option is mentioned in mop.lisp, I can't find any reference to this in the code, even though this seems to be the current behavior. On Jan 10, 2007, at 1:40 PM, Cyrus Harmon wrote: > Well, this is part of the story anyway. It still doesn't give the > behavior I want on trying to make an instance with a duplicate > value, but I think this is a step in the right direction. > > Also, instead of calling these slot-unique, slot-index, etc... > shouldn't the convention be slot-definition-unique, etc...? > > Cyrus > > On Jan 10, 2007, at 1:13 PM, Cyrus Harmon wrote: > >> Ok, I think I've figured out what's wrong with unique slots: >> >> --- mop.lisp 28 Nov 2006 15:25:57 -0800 1.11 >> +++ mop.lisp 10 Jan 2007 13:12:19 -0800 >> @@ -250,6 +250,17 @@ >> (setf (slot-value effective-slotdef 'index) >> (slot-index (car index-slotdefs)))))) >> >> + ;; If exactly one direct slot is unique, then the effective >> one is >> + ;; too. If more then one is unique, signal an error. >> + (let ((unique-slotdefs (remove-if-not #'slot-unique >> persistent-slotdefs))) >> + (cond ((cdr unique-slotdefs) >> + (error "Multiple uniques for slot ~S in ~S:~% ~ >> {~S~^, ~}." >> + slot-name class >> + (mapcar #'slot-unique unique-slotdefs))) >> + (unique-slotdefs >> + (setf (slot-value effective-slotdef 'unique) >> + (slot-unique (car unique-slotdefs)))))) >> + >> ;; Return the effective slot definition. >> effective-slotdef)) >> >> I think c-e-s-d needs the previous patch in order to set the >> effective-slot-definition's unique slot properly. I'm not sure >> it's appropriate to signal an error, but it seems better than >> allowing both :unique t and unique :no-error to be specified. >> >> Cyrus >> >> >> On Jan 9, 2007, at 11:43 PM, Cyrus Harmon wrote: >> >>> >>> On Nov 30, 2006, at 10:43 AM, Arthur Lemmens wrote: >>> >>>> Cyrus Harmon wrote: >>>> >>>>> What is the intended behavior of a unique slot? >>>> >>>> If a class has a unique slot, you (the programmer) promise that >>>> there >>>> won't be two instances of that class which have a 'similar' value >>>> for that slot. The definition of 'similar' depends on the kind of >>>> index you create (see the predefined index specs in index.lisp for >>>> some examples). >>>> >>>>> I'm able to make-instance new instances of a persistent class with >>>>> a duplicate slot value >>>> >>>> Yes, I think that Rucksack doesn't check for this at the moment. >>>> But it should have signalled an error, I think. >>>> >>>>> which seems fine >>>> >>>> I don't think so ;-) >>> >>> [some months pass...] so, yes, it seems like this should signal >>> an error. and it looks like there's code in there to do that, but >>> yet it doesn't seem to signal an error. Any idea why not? >>> >>> Thanks, >>> >>> Cyrus >>> >>> _______________________________________________ >>> rucksack-devel mailing list >>> rucksack-devel at common-lisp.net >>> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel >> >> _______________________________________________ >> rucksack-devel mailing list >> rucksack-devel at common-lisp.net >> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Thu Jan 11 04:01:49 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Wed, 10 Jan 2007 20:01:49 -0800 Subject: [rucksack-devel] many small xacts vs. one large xact Message-ID: <688524FC-5371-4BC6-B9A4-3914DD4523F4@bobobeach.com> Ok, so I'm trying to create about .5M persistent objects. If I use a single transaction, performance starts to degrade after 50k objects or so. If I use a transaction per object, things are ok in the beginning, but after 1400 objects or so, I get the following: There is no applicable method for the generic function # when called with arguments (#>). [Condition of type SIMPLE-ERROR] Restarts: 0: [ABORT] Abort # 1: [RETRY] Retry # 2: [ABORT-REQUEST] Abort handling SLIME request. 3: [ABORT] Exit debugger, returning to top level. Backtrace: 0: ((SB-PCL::FAST-METHOD NO-APPLICABLE-METHOD (T)) # # # #>) 1: ((SB-PCL::FAST-METHOD NO-APPLICABLE-METHOD (T)) # # #) 2: (RUCKSACK::FIND-BINDING-IN-NODE 1413 # #>) 3: (RUCKSACK::LEAF-INSERT #> # (# NIL) 1413 2646 :OVERWRITE) 4: ((SB-PCL::FAST-METHOD RUCKSACK:BTREE-INSERT (RUCKSACK:BTREE #1="#<...>" . #1#)) (#(NIL) . #()) # #> 1413 2646 :IF-EXISTS :OVERWRITE) 5: ((SB-PCL::FAST-METHOD INITIALIZE-INSTANCE :AROUND (#1="#<...>" . #1#)) (#(NIL NIL) . #()) #S(SB-PCL::FAST-METHOD- CALL :FUNCTION # :PV-CELL NIL :NEXT-METHOD-CALL NIL :ARG- INFO (1 . T)) #> :TAX-ID 1413 :PARENT-ID 1386 :RANK "species" :EMBL-CODE "BC" :DIVISION-ID 0 :DIVISION-INHERITED T :GENETIC-CODE-ID 11 :GENETIC-CODE-INHERITED T :MITOCHONDRIAL- GENETIC-CODE-ID 0 :MITOCHONDRIAL-GENETIC-CODE-INHERITED T :GENBANK- HIDDEN T :HIDDEN-SUBTREE NIL :COMMENTS "" :RUCKSACK #) 6: ((SB-PCL::FAST-METHOD MAKE-INSTANCE (CLASS)) # # # #) 7: (PARSE-TAX-NODES :FILE NIL) 8: (SB-INT:SIMPLE-EVAL-IN-LEXENV (PARSE-TAX-NODES) #) 9: (SWANK::EVAL-REGION "(parse-tax-nodes) " T) 10: ((LAMBDA ())) 11: ((LAMBDA (SWANK-BACKEND::FN)) #) 12: (SWANK::CALL-WITH-BUFFER-SYNTAX #) 13: (SWANK:LISTENER-EVAL "(parse-tax-nodes) ") 14: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:LISTENER-EVAL "(parse-tax-nodes) ") #) has anyone done any stress testing to see how rucksack works with lots of objects and/or have any advice on storing large numbers of objects? Thanks, Cyrus From ch-rucksack at bobobeach.com Thu Jan 11 18:27:19 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 10:27:19 -0800 Subject: [rucksack-devel] rucksack performance Message-ID: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> Following up on my post from yesterday, even when I can coerce the "lots of transactions" into working, at some point performance breaks down rather severely. Initially, creating a persistent object looks something like this: ("1" "1" "no rank" "" "8" "0" "1" "0" "0" "0" "0" "0" " |") Evaluation took: 0.04 seconds of real time 0.029151 seconds of user run time 0.010584 seconds of system run time 0 calls to %EVAL 0 page faults and 417,512 bytes consed. after creating a couple objects, they start to look like this: ("1931" "1883" "species" "SS" "0" "1" "11" "1" "0" "1" "1" "0" " |") Evaluation took: 0.466 seconds of real time 0.313506 seconds of user run time 0.138609 seconds of system run time 0 calls to %EVAL 0 page faults and 3,126,640 bytes consed. yes, along the way there is some fluctuation, as, I imagine, the indices and caches grow, etc... but we reach a threshold where it takes roughly .5 sec and 3M of consing for every object. Needless to say, this just kills performance and makes rucksack rather unusable for large numbers of persistent objects. I'm sure there's probably a better way to do this, but I'm not sure what it is. And it would be nice if this approach worked as well. Thanks, Cyrus From alemmens at xs4all.nl Thu Jan 11 20:00:55 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Thu, 11 Jan 2007 21:00:55 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Following up on my post from yesterday, even when I can coerce the > "lots of transactions" into working, at some point performance breaks > down rather severely. I haven't looked at this in detail, but my first guess would be that the garbage collector settings and/or performance is critical here. > yes, along the way there is some fluctuation, as, I imagine, the > indices and caches grow, etc... but we reach a threshold where it > takes roughly .5 sec and 3M of consing for every object. 3M of consing per object is ridiculously much of course. I'm pretty sure that it should be possible to reduce this a lot by tracing some of the garbage collector routines and looking at how much work they do. One thing you could consider is to turn the garbage collector off during the phase where you're creating very many objects (initializing your database maybe?). In fact, you could just turn it off, period. As long as your disk is big enough, of course... Let me know if turning the GC off doesn't help. > And it would be nice if this approach worked as well. Yes. Having a separate transaction for each created object is not the most efficient way and should not be necessary, but obviously it should work and it shouldn't be ridiculously slow. Arthur From alemmens at xs4all.nl Thu Jan 11 20:49:27 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Thu, 11 Jan 2007 21:49:27 +0100 Subject: PATCH Re: [rucksack-devel] indexing issue? In-Reply-To: References: <42002F55-6253-465C-BC6F-33CE9A84DA80@bobobeach.com> <15FC081E-4C24-4CE6-8C5D-D6C3AE30AFA6@bobobeach.com> <48E9DB9B-ED5E-4B49-9193-7CF327DED1CC@bobobeach.com> Message-ID: Cyrus Harmon, > Ok, I think I've figured out what's wrong with unique slots: > > --- mop.lisp 28 Nov 2006 15:25:57 -0800 1.11 > +++ mop.lisp 10 Jan 2007 13:12:19 -0800 > @@ -250,6 +250,17 @@ > (setf (slot-value effective-slotdef 'index) > (slot-index (car index-slotdefs)))))) > > + ;; If exactly one direct slot is unique, then the effective one is > + ;; too. If more then one is unique, signal an error. Right, I forgot to copy the UNIQUE slot to the effective slot definition, so it never treated slots as unique. Thanks for the patch, Arthur From ch-rucksack at bobobeach.com Thu Jan 11 22:15:11 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 14:15:11 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> Message-ID: On Jan 11, 2007, at 12:00 PM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> Following up on my post from yesterday, even when I can coerce the >> "lots of transactions" into working, at some point performance breaks >> down rather severely. > > I haven't looked at this in detail, but my first guess would be that > the garbage collector settings and/or performance is critical here. Hmm... back to the The value #> is not of type (OR NULL RUCKSACK:PERSISTENT-CONS). [Condition of type TYPE-ERROR] error, which is interesting as it looks like we're trying to do a p- car of an array, and it's getting the array be accessing the last element in the other (legitimate) array, but I'm getting distracted... >> yes, along the way there is some fluctuation, as, I imagine, the >> indices and caches grow, etc... but we reach a threshold where it >> takes roughly .5 sec and 3M of consing for every object. > > 3M of consing per object is ridiculously much of course. I'm pretty > sure that it should be possible to reduce this a lot by tracing some > of the garbage collector routines and looking at how much work they > do. > > One thing you could consider is to turn the garbage collector off > during the phase where you're creating very many objects (initializing > your database maybe?). In fact, you could just turn it off, period. > As long as your disk is big enough, of course... > > Let me know if turning the GC off doesn't help. How do I do this? I commented out the collect-some-garbage in transaction, but that didn't seem to fix the problem. >> And it would be nice if this approach worked as well. > > Yes. Having a separate transaction for each created object is not the > most efficient way and should not be necessary, but obviously it > should > work and it shouldn't be ridiculously slow. Agreed. Of course I only went down this route as performance became unacceptable with a really big transaction too, so perhaps the transaction/gc thing is a red herring. Cyrus From ch-rucksack at bobobeach.com Thu Jan 11 23:13:08 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 15:13:08 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> Message-ID: <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> you probably already knew this, but it seems as though most of the allocation is happening down in btree-insert. Cyrus On Jan 11, 2007, at 2:15 PM, Cyrus Harmon wrote: > > On Jan 11, 2007, at 12:00 PM, Arthur Lemmens wrote: > >> Cyrus Harmon wrote: >> >>> Following up on my post from yesterday, even when I can coerce the >>> "lots of transactions" into working, at some point performance >>> breaks >>> down rather severely. >> >> I haven't looked at this in detail, but my first guess would be that >> the garbage collector settings and/or performance is critical here. > > Hmm... back to the > > The value > # 10000, heap #P"/Users/sly/projects/cyrusharmon.org/cl-bio/rucksack/ > heap" and 7481 objects in memory.>> > is not of type > (OR NULL RUCKSACK:PERSISTENT-CONS). > [Condition of type TYPE-ERROR] > > error, which is interesting as it looks like we're trying to do a p- > car of an array, and it's getting the array be accessing the last > element in the other (legitimate) array, but I'm getting distracted... > >>> yes, along the way there is some fluctuation, as, I imagine, the >>> indices and caches grow, etc... but we reach a threshold where it >>> takes roughly .5 sec and 3M of consing for every object. >> >> 3M of consing per object is ridiculously much of course. I'm pretty >> sure that it should be possible to reduce this a lot by tracing some >> of the garbage collector routines and looking at how much work >> they do. >> >> One thing you could consider is to turn the garbage collector off >> during the phase where you're creating very many objects >> (initializing >> your database maybe?). In fact, you could just turn it off, period. >> As long as your disk is big enough, of course... >> >> Let me know if turning the GC off doesn't help. > > How do I do this? I commented out the collect-some-garbage in > transaction, but that didn't seem to fix the problem. > > >>> And it would be nice if this approach worked as well. >> >> Yes. Having a separate transaction for each created object is not >> the >> most efficient way and should not be necessary, but obviously it >> should >> work and it shouldn't be ridiculously slow. > > Agreed. Of course I only went down this route as performance became > unacceptable with a really big transaction too, so perhaps the > transaction/gc thing is a red herring. > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Thu Jan 11 23:40:18 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 15:40:18 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> Message-ID: so it looks like the p-find call down inside leaf-insert is slow and conses a lot. what's the point of doing a sequential scan every time we add something to the index? sounds like something is screwy here. Cyrus On Jan 11, 2007, at 3:13 PM, Cyrus Harmon wrote: > you probably already knew this, but it seems as though most of the > allocation is happening down in btree-insert. > > Cyrus > > On Jan 11, 2007, at 2:15 PM, Cyrus Harmon wrote: > >> >> On Jan 11, 2007, at 12:00 PM, Arthur Lemmens wrote: >> >>> Cyrus Harmon wrote: >>> >>>> Following up on my post from yesterday, even when I can coerce the >>>> "lots of transactions" into working, at some point performance >>>> breaks >>>> down rather severely. >>> >>> I haven't looked at this in detail, but my first guess would be that >>> the garbage collector settings and/or performance is critical here. >> >> Hmm... back to the >> >> The value >> #> 10000, heap #P"/Users/sly/projects/cyrusharmon.org/cl-bio/rucksack/ >> heap" and 7481 objects in memory.>> >> is not of type >> (OR NULL RUCKSACK:PERSISTENT-CONS). >> [Condition of type TYPE-ERROR] >> >> error, which is interesting as it looks like we're trying to do a >> p-car of an array, and it's getting the array be accessing the >> last element in the other (legitimate) array, but I'm getting >> distracted... >> >>>> yes, along the way there is some fluctuation, as, I imagine, the >>>> indices and caches grow, etc... but we reach a threshold where it >>>> takes roughly .5 sec and 3M of consing for every object. >>> >>> 3M of consing per object is ridiculously much of course. I'm pretty >>> sure that it should be possible to reduce this a lot by tracing some >>> of the garbage collector routines and looking at how much work >>> they do. >>> >>> One thing you could consider is to turn the garbage collector off >>> during the phase where you're creating very many objects >>> (initializing >>> your database maybe?). In fact, you could just turn it off, period. >>> As long as your disk is big enough, of course... >>> >>> Let me know if turning the GC off doesn't help. >> >> How do I do this? I commented out the collect-some-garbage in >> transaction, but that didn't seem to fix the problem. >> >> >>>> And it would be nice if this approach worked as well. >>> >>> Yes. Having a separate transaction for each created object is >>> not the >>> most efficient way and should not be necessary, but obviously it >>> should >>> work and it shouldn't be ridiculously slow. >> >> Agreed. Of course I only went down this route as performance >> became unacceptable with a really big transaction too, so perhaps >> the transaction/gc thing is a red herring. >> >> Cyrus >> >> _______________________________________________ >> rucksack-devel mailing list >> rucksack-devel at common-lisp.net >> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Fri Jan 12 00:03:39 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 16:03:39 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> Message-ID: <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> Ok, well there are probably still some issues here, but removing the string-index on the slot that had about 12 distinct values (over .5M objects) really seems to help performance! careful with that axe, Eugene! crossing my fingers, Cyrus On Jan 11, 2007, at 3:13 PM, Cyrus Harmon wrote: > you probably already knew this, but it seems as though most of the > allocation is happening down in btree-insert. > > Cyrus > > On Jan 11, 2007, at 2:15 PM, Cyrus Harmon wrote: > >> >> On Jan 11, 2007, at 12:00 PM, Arthur Lemmens wrote: >> >>> Cyrus Harmon wrote: >>> >>>> Following up on my post from yesterday, even when I can coerce the >>>> "lots of transactions" into working, at some point performance >>>> breaks >>>> down rather severely. >>> >>> I haven't looked at this in detail, but my first guess would be that >>> the garbage collector settings and/or performance is critical here. >> >> Hmm... back to the >> >> The value >> #> 10000, heap #P"/Users/sly/projects/cyrusharmon.org/cl-bio/rucksack/ >> heap" and 7481 objects in memory.>> >> is not of type >> (OR NULL RUCKSACK:PERSISTENT-CONS). >> [Condition of type TYPE-ERROR] >> >> error, which is interesting as it looks like we're trying to do a >> p-car of an array, and it's getting the array be accessing the >> last element in the other (legitimate) array, but I'm getting >> distracted... >> >>>> yes, along the way there is some fluctuation, as, I imagine, the >>>> indices and caches grow, etc... but we reach a threshold where it >>>> takes roughly .5 sec and 3M of consing for every object. >>> >>> 3M of consing per object is ridiculously much of course. I'm pretty >>> sure that it should be possible to reduce this a lot by tracing some >>> of the garbage collector routines and looking at how much work >>> they do. >>> >>> One thing you could consider is to turn the garbage collector off >>> during the phase where you're creating very many objects >>> (initializing >>> your database maybe?). In fact, you could just turn it off, period. >>> As long as your disk is big enough, of course... >>> >>> Let me know if turning the GC off doesn't help. >> >> How do I do this? I commented out the collect-some-garbage in >> transaction, but that didn't seem to fix the problem. >> >> >>>> And it would be nice if this approach worked as well. >>> >>> Yes. Having a separate transaction for each created object is >>> not the >>> most efficient way and should not be necessary, but obviously it >>> should >>> work and it shouldn't be ridiculously slow. >> >> Agreed. Of course I only went down this route as performance >> became unacceptable with a really big transaction too, so perhaps >> the transaction/gc thing is a red herring. >> >> Cyrus >> >> _______________________________________________ >> rucksack-devel mailing list >> rucksack-devel at common-lisp.net >> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Fri Jan 12 02:50:02 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 18:50:02 -0800 Subject: [rucksack-devel] continuing the rucksack saga... In-Reply-To: <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> Message-ID: <52986FF8-CEB0-4A52-8211-F2FA1346BE12@bobobeach.com> I'm not sure whether to lay the blame at rucksacks' feet or at SBCLs, but now that i've resolved my performance problems to the point where I can make-instance over 150k objects, I run out of heap space: Heap exhausted during garbage collection: 0 bytes available, 8 requested. Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age 0: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 1: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 2: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 4: 251638 251641 0 0 192577 3687 1134 1603 0 814849768 258328 488442912 0 1 1.4186 5: 393214 388673 7009 0 182649 441 1372 4015 266 771849320 152472 2000000 0 0 0.0000 6: 0 0 0 0 5734 0 0 0 0 23486464 0 2000000 5340 0 0.0000 Total bytes allocated=1610185552 fatal error encountered in SBCL pid 289: Heap exhausted, game over. LDB monitor ldb> backtrace Backtrace: 0: Foreign function ldb_monitor, fp = 0x2205698, ra = 0x70d7 1: Foreign function lose, fp = 0x22056c8, ra = 0x586e 2: Foreign function gc_heap_exhausted_error_or_lose, fp = 0x2205708, ra = 0xf74e 3: Foreign function gc_find_freeish_pages, fp = 0x2205778, ra = 0xf815 4: Foreign function gc_alloc_large, fp = 0x22057e8, ra = 0xfebe 5: Foreign function gc_alloc_with_region, fp = 0x2205828, ra = 0x101ef 6: Foreign function gc_general_alloc, fp = 0x2205848, ra = 0x102a9 7: Foreign function scan_weak_pointers, fp = 0x22058c8, ra = 0x408e 8: Foreign function scavenge, fp = 0x2205938, ra = 0x3bc5 9: Foreign function zero_pages_with_mmap, fp = 0x2205998, ra = 0xec9a 10: Foreign function collect_garbage, fp = 0x2205ad8, ra = 0x11ebc 11: (SB-C::TL-XEP SB-KERNEL::COLLECT-GARBAGE) 12: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-56) 13: (SB-C::TL-XEP SB-KERNEL::SUB-GC) 14: Foreign function call_into_lisp, fp = 0x2205c88, ra = 0x13ba1 15: Foreign function funcall0, fp = 0x2205ca8, ra = 0xd83f 16: Foreign function interrupt_maybe_gc_int, fp = 0x2205cc8, ra = 0x63f0 17: Foreign function interrupt_handle_pending, fp = 0x2205cf8, ra = 0x64bc 18: Foreign function _sigtramp, fp = 0x2205d18, ra = 0x9011110c 19: Foreign fp = 0x2206028, ra = 0xffffffff 20: (SB-C::TL-XEP SB-KERNEL::MAKE-RESTART) Heap exhausted during garbage collection: 0 bytes available, 16 requested. Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age 0: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 1: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 2: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 4: 251638 251641 0 0 192577 3687 1134 1603 0 814849768 258328 488442912 0 1 1.4186 5: 393214 388673 7009 0 182649 441 1372 4015 266 771849320 152472 2000000 0 0 0.0000 6: 0 0 0 0 5734 0 0 0 0 23486464 0 2000000 5340 0 0.0000 Total bytes allocated=1610185552 fatal error encountered in SBCL pid 289: and this happens when trying to allocate all .5M objects in a single transaction, also, perhaps, a bad idea. But, still, this seems like a rather catastrophic failure. I think I got basically the same error when trying 1 obj/transaction, and figured that using only one transaction for the whole lot would make things better, but that doesn't seem to be the case. Cyrus On Jan 11, 2007, at 4:03 PM, Cyrus Harmon wrote: > Ok, well there are probably still some issues here, but removing > the string-index on the slot that had about 12 distinct values > (over .5M objects) really seems to help performance! careful with > that axe, Eugene! > > crossing my fingers, > > Cyrus > > On Jan 11, 2007, at 3:13 PM, Cyrus Harmon wrote: > >> you probably already knew this, but it seems as though most of the >> allocation is happening down in btree-insert. >> >> Cyrus >> >> On Jan 11, 2007, at 2:15 PM, Cyrus Harmon wrote: >> >>> >>> On Jan 11, 2007, at 12:00 PM, Arthur Lemmens wrote: >>> >>>> Cyrus Harmon wrote: >>>> >>>>> Following up on my post from yesterday, even when I can coerce the >>>>> "lots of transactions" into working, at some point performance >>>>> breaks >>>>> down rather severely. >>>> >>>> I haven't looked at this in detail, but my first guess would be >>>> that >>>> the garbage collector settings and/or performance is critical here. >>> >>> Hmm... back to the >>> >>> The value >>> #>> 10000, heap #P"/Users/sly/projects/cyrusharmon.org/cl-bio/ >>> rucksack/heap" and 7481 objects in memory.>> >>> is not of type >>> (OR NULL RUCKSACK:PERSISTENT-CONS). >>> [Condition of type TYPE-ERROR] >>> >>> error, which is interesting as it looks like we're trying to do a >>> p-car of an array, and it's getting the array be accessing the >>> last element in the other (legitimate) array, but I'm getting >>> distracted... >>> >>>>> yes, along the way there is some fluctuation, as, I imagine, the >>>>> indices and caches grow, etc... but we reach a threshold where it >>>>> takes roughly .5 sec and 3M of consing for every object. >>>> >>>> 3M of consing per object is ridiculously much of course. I'm >>>> pretty >>>> sure that it should be possible to reduce this a lot by tracing >>>> some >>>> of the garbage collector routines and looking at how much work >>>> they do. >>>> >>>> One thing you could consider is to turn the garbage collector off >>>> during the phase where you're creating very many objects >>>> (initializing >>>> your database maybe?). In fact, you could just turn it off, >>>> period. >>>> As long as your disk is big enough, of course... >>>> >>>> Let me know if turning the GC off doesn't help. >>> >>> How do I do this? I commented out the collect-some-garbage in >>> transaction, but that didn't seem to fix the problem. >>> >>> >>>>> And it would be nice if this approach worked as well. >>>> >>>> Yes. Having a separate transaction for each created object is >>>> not the >>>> most efficient way and should not be necessary, but obviously it >>>> should >>>> work and it shouldn't be ridiculously slow. >>> >>> Agreed. Of course I only went down this route as performance >>> became unacceptable with a really big transaction too, so perhaps >>> the transaction/gc thing is a red herring. >>> >>> Cyrus >>> >>> _______________________________________________ >>> rucksack-devel mailing list >>> rucksack-devel at common-lisp.net >>> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel >> >> _______________________________________________ >> rucksack-devel mailing list >> rucksack-devel at common-lisp.net >> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From alemmens at xs4all.nl Fri Jan 12 06:41:21 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 07:41:21 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> Message-ID: Cyrus Harmon wrote: > so it looks like the p-find call down inside leaf-insert is slow and > conses a lot. Yes, that sounds plausible. > what's the point of doing a sequential scan every time we add something > to the index? That's a sequential scan through the elements of a btree node, not through the entire index, right? Yes, that's on the top of my list of things to speed up once I have everything working. It could easily be replaced by a binary search through the elements of the node, for example. And if that doesn't help enough, my idea was to create some specialized btree data structures for some special cases that occur often in practice (like string indexes). Arthur From ch-rucksack at bobobeach.com Fri Jan 12 06:46:16 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 22:46:16 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> Message-ID: <72CD18DD-4D2F-4880-BA5F-68BE46826FE6@bobobeach.com> On Jan 11, 2007, at 10:41 PM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> so it looks like the p-find call down inside leaf-insert is slow and >> conses a lot. > > Yes, that sounds plausible. Yes, and see my later email about my bone-headed use of an index here :) but, yeah, it's still one of the areas of concern. >> what's the point of doing a sequential scan every time we add >> something >> to the index? > > That's a sequential scan through the elements of a btree node, not > through > the entire index, right? Yes, that's on the top of my list of > things to > speed up once I have everything working. It could easily be > replaced by > a binary search through the elements of the node, for example. Hmm... you could be right, but, yeah, the binary search should help here. > And if that doesn't help enough, my idea was to create some > specialized btree > data structures for some special cases that occur often in practice > (like > string indexes). yes, these would probably be good, but binary search would be a good place to start. Cyrus From alemmens at xs4all.nl Fri Jan 12 06:47:12 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 07:47:12 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Ok, well there are probably still some issues here, but removing the > string-index on the slot that had about 12 distinct values (over .5M > objects) really seems to help performance! Ah, interesting. The 12 distinct values will map to 12 distinct entries in a btree node, and each entry will be a plain persistent list with on average 500,000/12 elements. Rucksack will probably need to walk across one such list each time you create a new object with that slot. Yes, that could explain a lot ;-) Thanks for the report. Very interesting to hear about some real-life experience with Rucksack. Arthur From alemmens at xs4all.nl Fri Jan 12 07:13:13 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 08:13:13 +0100 Subject: [rucksack-devel] continuing the rucksack saga... In-Reply-To: <52986FF8-CEB0-4A52-8211-F2FA1346BE12@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> <52986FF8-CEB0-4A52-8211-F2FA1346BE12@bobobeach.com> Message-ID: Cyrus Harmon wrote: > I'm not sure whether to lay the blame at rucksacks' feet or at SBCLs, > but now that i've resolved my performance problems to the point where > I can make-instance over 150k objects, I run out of heap space: OK, it sounds like you're now reaching the interesting part, where all your data doesn't fit in memory anymore and you really need Rucksack's cache. > and this happens when trying to allocate all .5M objects in a single > transaction, also, perhaps, a bad idea. Yes, that's a bad idea because in the current design Rucksack has to keep all those objects in memory until the transaction has committed. > I think I got basically the same error when trying 1 obj/transaction If you do, I'd say that's a bug (or at least a misunderstanding) somewhere. But it could be anywhere: in Rucksack's cache or transactions, in SBCL's garbage collector, or in your own objects. If it's possible to somehow reach all your other objects from something that's 'active' in memory, then SBCL's garbage collector can't collect any garbage. Here's what I would do: first I'd have a serious look at your objects (and data structures), then I'd switch to something like 1000 objects per transaction, and if that doesn't solve anything I'd want to see if Rucksack's cache or transactions don't keep in-memory object references longer than it should. Arthur From alemmens at xs4all.nl Fri Jan 12 07:23:23 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 08:23:23 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: [Replying to the list.] > Oh, while I'm at it, one thing i was thinking about is the nature of > indices. One of the nice things in RDBMSes is that you can, say, > create a whole bunch of rows and then index them after the fact, > rather than having the index in place from the get go (and, more > importantly, while doing the insertion). Have you thought about the > implications of such a strategy for rucksack? would it make sense to > have unindexed slots and then redefine the class to have indices > after the instances have been created? This should already work now. Have a look at the function REPLACE-SLOT-INDEX, for example. Just add an :INDEX foo option to a slot, and existing instances will be re-indexed automatically. At least that's the idea. If it doesn't work, I'd like to hear about it. Arthur From ch-rucksack at bobobeach.com Fri Jan 12 07:28:10 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 23:28:10 -0800 Subject: [rucksack-devel] continuing the rucksack saga... In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> <52986FF8-CEB0-4A52-8211-F2FA1346BE12@bobobeach.com> Message-ID: On Jan 11, 2007, at 11:13 PM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> I'm not sure whether to lay the blame at rucksacks' feet or at SBCLs, >> but now that i've resolved my performance problems to the point where >> I can make-instance over 150k objects, I run out of heap space: > > OK, it sounds like you're now reaching the interesting part, where all > your data doesn't fit in memory anymore and you really need Rucksack's > cache. Well, I'm not convinced I'm really at that point as these objects are fairly small, but the combination of the memory for the objects and whatever other junks gets consed along the way doesn't seem to fit in memory. >> and this happens when trying to allocate all .5M objects in a single >> transaction, also, perhaps, a bad idea. > > Yes, that's a bad idea because in the current design Rucksack has > to keep > all those objects in memory until the transaction has committed. yes, i figured as much. again, this was a workaround for the 1 obj/ transaction problem. >> I think I got basically the same error when trying 1 obj/transaction > > If you do, I'd say that's a bug (or at least a misunderstanding) > somewhere. > But it could be anywhere: in Rucksack's cache or transactions, in > SBCL's > garbage collector, or in your own objects. If it's possible to > somehow > reach all your other objects from something that's 'active' in memory, > then SBCL's garbage collector can't collect any garbage. > > Here's what I would do: first I'd have a serious look at your objects > (and data structures), then I'd switch to something like 1000 objects > per transaction, and if that doesn't solve anything I'd want to see if > Rucksack's cache or transactions don't keep in-memory object > references > longer than it should. what's the best way to do this? can I just drop a (rucksack:transaction-commit rucksack:*transaction*) into my loop and have the right thing happen or do I need to break up the loop to explicitly do a (with-transaction ... for batches of 1000 or so? Thanks, Cyrus From alemmens at xs4all.nl Fri Jan 12 07:28:05 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 08:28:05 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: [replying to list] Cyrus Harmon wrote: > ok, well, this heap exhaustion thing seems to be the biggest problem > ATM. the fact that it happens both in the "one giant transaction" and > "one transaction per object instantiation" scenarios is troubling. > For some reason, we're holding on to references to objects in memory > that aren't getting gc'ed. I don't know if SBCL or rucksack is to > blame here, to be honest, and you could probably make a decent > argument either way, but I have a feeling that it's going to be > easier to fix rucksack's memory usage patterns than it will be to fix > SBCL gc strategy to deal with the offending scenario (lots of small > objects that take up over half of the available heap space). Yes, I agree. That's why Rucksack has a cache instead of just keeping everything in memory. You could try playing with some cache settings to see if that helps. Arthur From ch-rucksack at bobobeach.com Fri Jan 12 07:55:35 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Thu, 11 Jan 2007 23:55:35 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: On Jan 11, 2007, at 11:23 PM, Arthur Lemmens wrote: > [Replying to the list.] > >> Oh, while I'm at it, one thing i was thinking about is the nature of >> indices. One of the nice things in RDBMSes is that you can, say, >> create a whole bunch of rows and then index them after the fact, >> rather than having the index in place from the get go (and, more >> importantly, while doing the insertion). Have you thought about the >> implications of such a strategy for rucksack? would it make sense to >> have unindexed slots and then redefine the class to have indices >> after the instances have been created? > > This should already work now. Have a look at the function REPLACE- > SLOT-INDEX, > for example. Just add an :INDEX foo option to a slot, and existing > instances will be re-indexed automatically. Ok, now we're getting somewhere. removing the indices seems to help greatly. We'll see how it goes adding them back once all is said and done, but this looks promising. > At least that's the idea. If it doesn't work, I'd like to hear about > it. I'll certainly post the results when this is done. Since the index code seems to be the bottleneck here, it would be good to spend some time optimizing the index code as well. Also, some API support for disabling indexing might be nice. Redefining the defclass forms seems to be a bit "extreme". Cyrus From ch-rucksack at bobobeach.com Fri Jan 12 08:17:33 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 00:17:33 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: Ok, so if I remove the (:index t) line from the defclass form, it's fast, and works, but I can't seem to get the indices to appear by redefining the class. if, OTOH, I delete the :index forms from the slot definitions, but leave the (:index t) in the class definition, things start of ok, but then I get this familiar error message (oh, and this is when batching 500 make-instances at a time into a transaction; or at least I'm doing a transaction-commit every 500 items, hoping that that's doing what I think it is): There is no applicable method for the generic function # when called with arguments (# 1: [RETRY] Retry # 2: [ABORT-REQUEST] Abort handling SLIME request. 3: [ABORT] Exit debugger, returning to top level. Backtrace: 0: ((SB-PCL::FAST-METHOD NO-APPLICABLE-METHOD (T)) # # # # # # # # # #<$ 7: (PARSE-TAX-NODES :FILE NIL) 8: (SB-INT:SIMPLE-EVAL-IN-LEXENV (PARSE-TAX-NODES) #) 9: (SWANK::EVAL-REGION "(parse-tax-nodes) " T) which is reminiscent of the error I was seeing before when we were getting an array back as the last element in another p-array, where it really expected a p-cons. Not sure where the underlying bug is. Thanks, Cyrus On Jan 11, 2007, at 11:55 PM, Cyrus Harmon wrote: > > On Jan 11, 2007, at 11:23 PM, Arthur Lemmens wrote: > >> [Replying to the list.] >> >>> Oh, while I'm at it, one thing i was thinking about is the nature of >>> indices. One of the nice things in RDBMSes is that you can, say, >>> create a whole bunch of rows and then index them after the fact, >>> rather than having the index in place from the get go (and, more >>> importantly, while doing the insertion). Have you thought about the >>> implications of such a strategy for rucksack? would it make sense to >>> have unindexed slots and then redefine the class to have indices >>> after the instances have been created? >> >> This should already work now. Have a look at the function REPLACE- >> SLOT-INDEX, >> for example. Just add an :INDEX foo option to a slot, and existing >> instances will be re-indexed automatically. > > Ok, now we're getting somewhere. removing the indices seems to help > greatly. We'll see how it goes adding them back once all is said > and done, but this looks promising. > >> At least that's the idea. If it doesn't work, I'd like to hear about >> it. > > I'll certainly post the results when this is done. > > Since the index code seems to be the bottleneck here, it would be > good to spend some time optimizing the index code as well. Also, > some API support for disabling indexing might be nice. Redefining > the defclass forms seems to be a bit "extreme". > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From edi at agharta.de Fri Jan 12 08:17:49 2007 From: edi at agharta.de (Edi Weitz) Date: Fri, 12 Jan 2007 09:17:49 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: (Cyrus Harmon's message of "Thu, 11 Jan 2007 23:55:35 -0800") References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: On Thu, 11 Jan 2007 23:55:35 -0800, Cyrus Harmon wrote: > Also, some API support for disabling indexing might be > nice. Redefining the defclass forms seems to be a bit "extreme". Hmm, so you disable indexing, insert or delete a couple of objects, and then enable it again? And then you expect the index to still be correct? How would you do that with an SQL database? Or am I misunderstanding and you just want disabling but not re-enabling? From ch-rucksack at bobobeach.com Fri Jan 12 08:19:26 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 00:19:26 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: Yes, that's basically right. In sql-speak (roughly), you would insert your rows and then create the index. or if doing a big bulk update, drop the index, do your inserts and then rebuild the index. I think that's what Arthur implied should happen here when you redefine the class to have the indexed slots, or not. Cyrus On Jan 12, 2007, at 12:17 AM, Edi Weitz wrote: > On Thu, 11 Jan 2007 23:55:35 -0800, Cyrus Harmon rucksack at bobobeach.com> wrote: > >> Also, some API support for disabling indexing might be >> nice. Redefining the defclass forms seems to be a bit "extreme". > > Hmm, so you disable indexing, insert or delete a couple of objects, > and then enable it again? And then you expect the index to still be > correct? How would you do that with an SQL database? Or am I > misunderstanding and you just want disabling but not re-enabling? From edi at agharta.de Fri Jan 12 08:23:10 2007 From: edi at agharta.de (Edi Weitz) Date: Fri, 12 Jan 2007 09:23:10 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: (Cyrus Harmon's message of "Fri, 12 Jan 2007 00:19:26 -0800") References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: On Fri, 12 Jan 2007 00:19:26 -0800, Cyrus Harmon wrote: > Yes, that's basically right. In sql-speak (roughly), you would > insert your rows and then create the index. or if doing a big bulk > update, drop the index, do your inserts and then rebuild the > index. I think that's what Arthur implied should happen here when > you redefine the class to have the indexed slots, or not. Ah, OK, you want it to be rebuilt. I thought you just wanted to keep it around in some kind of passive state. Sorry for the noise. From alemmens at xs4all.nl Fri Jan 12 08:51:04 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 09:51:04 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Ok, so if I remove the (:index t) line from the defclass form, it's > fast, and works, but I can't seem to get the indices to appear by > redefining the class. That's right. If you remove the (:INDEX T) class option, it becomes impossible to find all instances of the class. So that makes it impossible to do a re-indexing step later. > if, OTOH, I delete the :index forms from the slot definitions, but > leave the (:index t) in the class definition, things start of ok, but > then I get this familiar error message (oh, and this is when batching > 500 make-instances at a time into a transaction; or at least I'm > doing a transaction-commit every 500 items, hoping that that's doing > what I think it is): > > There is no applicable method for the generic function > # > when called with arguments > (# 10000, heap #P"/Users/sly/projects/cyrusharmon.org/cl-bio/ruc$ > [Condition of type SIMPLE-ERROR] I'd say this is either a bug in the btree code or a bug in the garbage collector. Could you turn off garbage collection, and see if you still get this bug? Arthur From alemmens at xs4all.nl Fri Jan 12 10:22:06 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 11:22:06 +0100 Subject: [rucksack-devel] continuing the rucksack saga... In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <7A8E16F3-9FD1-488A-8B98-9B3A35EF4E7C@bobobeach.com> <52986FF8-CEB0-4A52-8211-F2FA1346BE12@bobobeach.com> Message-ID: Cyrus Harmon wrote: > what's the best way to do this? can I just drop a > (rucksack:transaction-commit rucksack:*transaction*) into my loop and > have the right thing happen or do I need to break up the loop to > explicitly do a (with-transaction ... for batches of 1000 or so? I would break up the loop. Arthur From alemmens at xs4all.nl Fri Jan 12 11:32:01 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 12:32:01 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Since the index code seems to be the bottleneck here, it would be > good to spend some time optimizing the index code as well. Also, some > API support for disabling indexing might be nice. It's not clear to me what disabling indexing would buy (assuming you want to re-enable it later on). Maybe it helps to debug performance problems, but I don't think that's enough reason to add API support. Or are there other reasons to disable/re-enable indexing? Arthur From alemmens at xs4all.nl Fri Jan 12 11:33:08 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 12:33:08 +0100 Subject: [rucksack-devel] sbcl compile warning In-Reply-To: References: Message-ID: Cyrus Harmon wrote: > ; caught STYLE-WARNING: > ; reading an ignored variable: BLOCK Thanks, I'll fix this. Arthur From alemmens at xs4all.nl Fri Jan 12 13:38:10 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 14:38:10 +0100 Subject: [rucksack-devel] Fwd: Re: [Sbcl-devel] Possibly memory leak In-Reply-To: <87mz4o5qdz.fsf@vasara.proghammer.com> References: <200701121159.36348.mika.pihlajamaki@selanpohja.fi> <87mz4o5qdz.fsf@vasara.proghammer.com> Message-ID: Cyrus, Maybe using a two generation GC would be a good solution in your case too? I'd think that you want to have as little obsolete data in memory as possible. Arthur ------- Doorgestuurd bericht ------- Van: "Juho Snellman" Aan: "Mika Pihlajam?ki" Onderwerp: Re: [Sbcl-devel] Possibly memory leak Datum: Fri, 12 Jan 2007 14:24:56 +0100 Mika Pihlajam?ki writes: > This seems to reserve memory without releasing it to gc. > (Using sbcl 1.0 in Linux x86 with 2GB ram) > > $ dd if=/dev/zero of=/tmp/10MB.txt bs=1M count=10 > $ sbcl > * (defun file->list (filename) > (with-open-file (in filename :direction :input) > (loop for line = (read-line in nil nil) > while line ; not eof > collect line))) > > FILE->LIST > * (loop for i from 1 do > (format t "~A0 MB~%" i) > (file->list "/tmp/10MB.txt")) > > Then after about 250 MB -> Heap exhausted. This terminates my program > frequently as it uses strings and files a lot. This particular example looks like a bad interaction between the generational and conservative aspects of the GC. If it really is representative of your program, doing the following would fix it: (define-alien-variable gencgc-oldest-gen-to-gc int) (setf gencgc-oldest-gen-to-gc 1) This makes the GC function as a two-generation GC, rather than a 6-generation one, meaning that there will be fewer higher generations holding onto obsolete data. From ch-rucksack at bobobeach.com Fri Jan 12 15:49:38 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 07:49:38 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> Message-ID: <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> I guess performance is the only issue I can think of. Yes, you have to pay the cost of indexing either way, but, at least in many systems, it can be faster to do a bunch of "inserts" and then index the table, using rdbms-speak. It's not so much an issue of debugging performance problems, as working around the performance bottleneck of inserting into an index. I guess in an ideal world we wouldn't need to disable indexing during a bulk creation phase. On Jan 12, 2007, at 3:32 AM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> Since the index code seems to be the bottleneck here, it would be >> good to spend some time optimizing the index code as well. Also, some >> API support for disabling indexing might be nice. > > It's not clear to me what disabling indexing would buy (assuming you > want to re-enable it later on). Maybe it helps to debug performance > problems, but I don't think that's enough reason to add API support. > > Or are there other reasons to disable/re-enable indexing? > > Arthur > From alemmens at xs4all.nl Fri Jan 12 15:53:01 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 16:53:01 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: Cyrus Harmon wrote: > I guess performance is the only issue I can think of. Yes, you have > to pay the cost of indexing either way, but, at least in many > systems, it can be faster to do a bunch of "inserts" and then index > the table, using rdbms-speak. It's not so much an issue of debugging > performance problems, as working around the performance bottleneck of > inserting into an index. I guess in an ideal world we wouldn't need > to disable indexing during a bulk creation phase. I don't see why that would be any faster in Rucksack. As far as I can see you just move the indexing work to a later stage, but I don't think you optimize it. But maybe I'm missing something. Arthur From ch-rucksack at bobobeach.com Fri Jan 12 16:33:17 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 08:33:17 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: <7CAA80A9-9B6D-4BD4-B77E-841A37929FD1@bobobeach.com> Well, I guess I'm willing to buy this argument, in principle. In practice, with no indexing I can load my .5M objects into rucksack. With an (:index t) but no indices on any slot, performance is relatively good, but then I get heap exhaustion (or was it the p-car problem? I can't remember at the moment) after 150k or 200k objects. With two slot indices, performance degrades significantly around 40k objects into the load (this is with the restructured loop, batch transactions into groups of 500 objects), and the load process eventually falls over at some point, although it's still running on my latest attempt, with around 55K objects loaded so far. For my next attempt, I'll disable the GC. this is done by commenting the call in the with-transaction form, right? Cyrus On Jan 12, 2007, at 7:53 AM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> I guess performance is the only issue I can think of. Yes, you have >> to pay the cost of indexing either way, but, at least in many >> systems, it can be faster to do a bunch of "inserts" and then index >> the table, using rdbms-speak. It's not so much an issue of debugging >> performance problems, as working around the performance bottleneck of >> inserting into an index. I guess in an ideal world we wouldn't need >> to disable indexing during a bulk creation phase. > > I don't see why that would be any faster in Rucksack. As far as I can > see you just move the indexing work to a later stage, but I don't > think > you optimize it. But maybe I'm missing something. > > Arthur > > From alemmens at xs4all.nl Fri Jan 12 16:49:51 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 17:49:51 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <7CAA80A9-9B6D-4BD4-B77E-841A37929FD1@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> <7CAA80A9-9B6D-4BD4-B77E-841A37929FD1@bobobeach.com> Message-ID: Cyrus Harmon wrote: > For my next attempt, I'll disable the GC. this is done by commenting > the call in the with-transaction form, right? Yes, I think that's the quick and dirty way. A better way might be to add a COLLECT-GARBAGE-P flag to WITH-TRANSACTION and look at that before calling COLLECT-SOME-GARBAGE. Then you can submit that as a patch ;-) Arthur From ch-rucksack at bobobeach.com Fri Jan 12 16:54:58 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 08:54:58 -0800 Subject: [rucksack-devel] rucksack performance In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: <3C9E3E2F-E66C-400A-BFAD-3D600B4443BA@bobobeach.com> Well, one other way that this could be faster (to do the indexing in bulk at the end) is that you would only need to lock the index once, rather than once per object. But since rucksack doesn't lock the btree yet (at least not according to the comment at the top of p- btrees.lisp), this probably isn't an issue, but it could be as rucksack acquires btree locking facilities. On Jan 12, 2007, at 7:53 AM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> I guess performance is the only issue I can think of. Yes, you have >> to pay the cost of indexing either way, but, at least in many >> systems, it can be faster to do a bunch of "inserts" and then index >> the table, using rdbms-speak. It's not so much an issue of debugging >> performance problems, as working around the performance bottleneck of >> inserting into an index. I guess in an ideal world we wouldn't need >> to disable indexing during a bulk creation phase. > > I don't see why that would be any faster in Rucksack. As far as I can > see you just move the indexing work to a later stage, but I don't > think > you optimize it. But maybe I'm missing something. > > Arthur > > From alemmens at xs4all.nl Fri Jan 12 16:56:49 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 17:56:49 +0100 Subject: [rucksack-devel] rucksack performance In-Reply-To: <3C9E3E2F-E66C-400A-BFAD-3D600B4443BA@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> <3C9E3E2F-E66C-400A-BFAD-3D600B4443BA@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Well, one other way that this could be faster (to do the indexing in > bulk at the end) is that you would only need to lock the index once, > rather than once per object. Ah yes, good point. Arthur From attila.lendvai at gmail.com Fri Jan 12 17:03:13 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Fri, 12 Jan 2007 18:03:13 +0100 Subject: [rucksack-devel] tests Message-ID: dear list, i'm planning to implement some new features in the 5am testsuite and i'm also planning to play with rucksack. so it came as a natural idea to test-drive the new 5am features by converting/extending the rucksack test-suite. (optionally loadable rucksack-test system, asdf:test-op, etc) i wonder if you like this idea? or the idea of using bordeaux-threads (for the crossplatform locking primitives)? or depending on alexandria and anaphora for useful common utils? i can voulenteer to do these if you are not against them. on a different note it seems like carriage return (0x0d) chars are checked in the cvs repo. also, i've created a darcs conversion of the cvs repo. if for any reason you prefer darcs by now then feel free to grab the converted darcs repo and use it as the official. i know vcs wars are annoying, so i won't come up with this again. if you prefer to stay with cvs i'll quietly resort to sending text diffs and keep the repo below in sync with the official while holding our (cl-wdim guys) own work. the darcs repo is available at: darcs get USER at common-lisp.net:/project/cl-wdim/darcs/rucksack and will be listed at: http://common-lisp.net/cgi-bin/darcsweb/darcsweb.cgi looking forward to a fruitful cooperation, -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From levente.meszaros at gmail.com Fri Jan 12 18:18:57 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Fri, 12 Jan 2007 19:18:57 +0100 Subject: [rucksack-devel] tests In-Reply-To: References: Message-ID: In case anybody interesed: Here is an simple export machinery for rucksack. It transforms the rucksack database into a lambda form which when called imports it into another rucksack. Some tests included. levy On 1/12/07, Attila Lendvai wrote: > dear list, > > i'm planning to implement some new features in the 5am testsuite and > i'm also planning to play with rucksack. so it came as a natural idea > to test-drive the new 5am features by converting/extending the > rucksack test-suite. (optionally loadable rucksack-test system, > asdf:test-op, etc) > > i wonder if you like this idea? > > or the idea of using bordeaux-threads (for the crossplatform locking > primitives)? > > or depending on alexandria and anaphora for useful common utils? > > i can voulenteer to do these if you are not against them. > > on a different note it seems like carriage return (0x0d) chars are > checked in the cvs repo. > > also, i've created a darcs conversion of the cvs repo. if for any > reason you prefer darcs by now then feel free to grab the converted > darcs repo and use it as the official. i know vcs wars are annoying, > so i won't come up with this again. if you prefer to stay with cvs > i'll quietly resort to sending text diffs and keep the repo below in > sync with the official while holding our (cl-wdim guys) own work. > > the darcs repo is available at: darcs get > USER at common-lisp.net:/project/cl-wdim/darcs/rucksack > > and will be listed at: http://common-lisp.net/cgi-bin/darcsweb/darcsweb.cgi > > looking forward to a fruitful cooperation, > > -- > - attila > > "- The truth is that I've been too considerate, and so became > unintentionally cruel... > - I understand. > - No, you don't understand! We don't speak the same language!" > (Ingmar Bergman - Smultronst?llet) > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > > -- There's no perfectoin -------------- next part -------------- A non-text attachment was scrubbed... Name: export.lisp Type: application/octet-stream Size: 6847 bytes Desc: not available URL: From ch-rucksack at bobobeach.com Fri Jan 12 19:45:58 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 11:45:58 -0800 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: The following patch defines a macro without-rucksack-gcing that can be used to cause the enclosed form to be executed without calling collect-some-garbage upon commit. Using this patch allows me to load 400k objects into rucksack in a reasonable amount of time and seems to prevent (or reduce, at a minimum) the heap exhaustion errors on SBCL. Cyrus --- transactions.lisp 24 Aug 2006 08:21:25 -0700 1.11 +++ transactions.lisp 12 Jan 2007 11:41:07 -0800 @@ -171,6 +171,16 @@ ;; Committing a transaction ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;; +(defparameter *collect-garbage-on-commit* t + "A flag to indicate whether or not transaction-commit collects garbage") + +;;; use without-rucksack-gcing to locally set +;;; *collect-garbage-on-commit* to nil in order to supress rucksack +;;; garbage collection on commit +(defmacro without-rucksack-gcing (&body body) + `(let ((*collect-garbage-on-commit* nil)) + , at body)) + (defun transaction-commit (transaction &key (rucksack (current- rucksack))) "Call transaction-commit-1 to do the real work." (transaction-commit-1 transaction (rucksack-cache rucksack) rucksack)) @@ -216,8 +226,9 @@ (delete-commit-file transaction cache) ;; 5. Let the garbage collector do an amount of work proportional ;; to the number of octets that were allocated during the commit. - (collect-some-garbage heap - (gc-work-for-size heap nr-allocated- octets)) + (when *collect-garbage-on-commit* + (collect-some-garbage heap + (gc-work-for-size heap nr-allocated- octets))) ;; 6. Make sure that all changes are actually on disk before ;; we continue. (finish-all-output rucksack))))) From alemmens at xs4all.nl Fri Jan 12 20:09:05 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 21:09:05 +0100 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: Cyrus Harmon wrote: > The following patch defines a macro without-rucksack-gcing that can > be used to cause the enclosed form to be executed without calling > collect-some-garbage upon commit. Thanks. I try to keep special variables to a minimum, so I'll probably move this flag to a slot in TRANSACTION that gets initialized by WITH-TRANSACTION. > Using this patch allows me to load 400k objects into rucksack in a > reasonable amount of time That's good news. With or without indexes? And can you be more specific about 'reasonable amount of time'? Arthur From ch-rucksack at bobobeach.com Fri Jan 12 20:25:40 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 12:25:40 -0800 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: On Jan 12, 2007, at 12:09 PM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> The following patch defines a macro without-rucksack-gcing that can >> be used to cause the enclosed form to be executed without calling >> collect-some-garbage upon commit. > > Thanks. I try to keep special variables to a minimum, so I'll > probably > move this flag to a slot in TRANSACTION that gets initialized by > WITH-TRANSACTION. Yes, that sounds like a reasonable approach. I thought about making this an argument to with-transaction like: (with-transaction (:gc nil) ...) >> Using this patch allows me to load 400k objects into rucksack in a >> reasonable amount of time > > That's good news. With or without indexes? And can you be more > specific > about 'reasonable amount of time'? with indices. ah, yes, reasonable. I suppose that's somewhat subjective. The "unreasonable" efforts before were characterized by progressive slowdowns, such that things would start fast (say ~50-100 obj/second) and after 50-100k objects or so, slow down to 1/obj second. I left for breakfast and didn't time the last big insert, but the one I've been working on now has been going for about 15 minutes and has done over 150k objects. 10k obj/min is fine for my purposes. 100 obj/min wouldn't be. One thing to note is that eventually you do have to pay the GC cost, and on the first transaction with GC after my big insert, there was a multiple minute delay while it GCed the rucksack. I can live with that and it's certainly better than not working at all! Cyrus From ch-rucksack at bobobeach.com Fri Jan 12 20:39:48 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 12:39:48 -0800 Subject: [rucksack-devel] Persistent LISP: Storing Interobject References in a Database Message-ID: <7CBDFF37-A03D-4461-A45D-D993601AA17C@bobobeach.com> (sorry about the old school casing, but that's the title) I came across this while looking for something else and I thought y'all might be interested in it: http://techreports.lib.berkeley.edu/accessPages/CSD-88-401 Cyrus From alemmens at xs4all.nl Fri Jan 12 20:49:03 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 21:49:03 +0100 Subject: [rucksack-devel] tests In-Reply-To: References: Message-ID: Attila Lendvai wrote: > i'm planning to implement some new features in the 5am testsuite and > i'm also planning to play with rucksack. so it came as a natural idea > to test-drive the new 5am features by converting/extending the > rucksack test-suite. (optionally loadable rucksack-test system, > asdf:test-op, etc) > > i wonder if you like this idea? > > or the idea of using bordeaux-threads (for the crossplatform locking > primitives)? > > or depending on alexandria and anaphora for useful common utils? > > i can voulenteer to do these if you are not against them. I'm against all of them, to tell you the truth. I try to avoid dependencies on other libraries for Rucksack, unless such libraries have very big and obvious advantages over the way that things are currently done. The most important thing for Rucksack at the moment is to make it usable for real-world projects. That means that it must become more reliable, that it must be made faster in some spots and that it must get more and better documentation. Any help with that is very welcome. Reports about actual experiences of using Rucksack (like the messages that Cyrus sent recently) are also very welcome. But introducing more complications without big improvements to compensate for them is not. > also, i've created a darcs conversion of the cvs repo. if for any > reason you prefer darcs by now then feel free to grab the converted > darcs repo and use it as the official. No thanks. Arthur From alemmens at xs4all.nl Fri Jan 12 20:52:59 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 21:52:59 +0100 Subject: [rucksack-devel] Persistent LISP: Storing Interobject References in a Database In-Reply-To: <7CBDFF37-A03D-4461-A45D-D993601AA17C@bobobeach.com> References: <7CBDFF37-A03D-4461-A45D-D993601AA17C@bobobeach.com> Message-ID: Cyrus Harmon wrote: > I came across this while looking for something else and I thought > y'all might be interested in it: > > http://techreports.lib.berkeley.edu/accessPages/CSD-88-401 Interesting, thanks! Arthur From ch-rucksack at bobobeach.com Fri Jan 12 21:08:58 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 13:08:58 -0800 Subject: [rucksack-devel] tests In-Reply-To: References: Message-ID: On Jan 12, 2007, at 12:49 PM, Arthur Lemmens wrote: > Attila Lendvai wrote: > >> i'm planning to implement some new features in the 5am testsuite and >> i'm also planning to play with rucksack. so it came as a natural idea >> to test-drive the new 5am features by converting/extending the >> rucksack test-suite. (optionally loadable rucksack-test system, >> asdf:test-op, etc) >> >> i wonder if you like this idea? >> >> or the idea of using bordeaux-threads (for the crossplatform locking >> primitives)? >> >> or depending on alexandria and anaphora for useful common utils? >> >> i can voulenteer to do these if you are not against them. > > I'm against all of them, to tell you the truth. I try to avoid > dependencies on other libraries for Rucksack, unless such libraries > have very big and obvious advantages over the way that things are > currently done. While I share Arthur's aversion to unnecessary package dependency creep, if using 5am enables someone like Attila to develop a nice test suite for rucksack, I'm all for it. It doesn't have to 1) be the canonical test suite or 2) live in the rucksack distribution, but if it exercises the system and points in the right direction to make improvements, that's a good thing in my book. (just as an aside I'm not a fan of anaphoric macros and would be non- plussed if the code were gratuitously changed to this style.) Locking is a big deal, but it's not clear to me that bx-threads buys us much. Thinking about a general strategy for concurrent access to rucksacks would seem to be a better place to start. One area in which other dependencies could be interesting to explore are not so much in the core rucksack, but rather as extensions to rucksack. In particular, I'm thinking of indexing. The current indexing strategy appears to be limited to b-trees. It would be interesting to consider how to do extensible a la GiST and to use things like spatial-trees for indexing of spatial data in rucksacks, for example. This isn't to say that i want spatial trees to be a requirement for rucksack, but rather that if there were a way for users/developers to make new index types, that could be interesting. Cyrus From ch-rucksack at bobobeach.com Fri Jan 12 21:23:16 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 13:23:16 -0800 Subject: [rucksack-devel] string-index range searching? Message-ID: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> is it possible to do range searching on a string-index? full substring searching might be a bit much to ask, but it should be fairly trivial to support "starts with" queries over the existing indices. thanks, Cyrus From alemmens at xs4all.nl Fri Jan 12 21:30:55 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 22:30:55 +0100 Subject: [rucksack-devel] tests In-Reply-To: References: Message-ID: Cyrus Harmon wrote: > While I share Arthur's aversion to unnecessary package dependency > creep, if using 5am enables someone like Attila to develop a nice > test suite for rucksack, I'm all for it. Me too ;-) > One area in which other dependencies could be interesting to explore > are not so much in the core rucksack, but rather as extensions to > rucksack. In particular, I'm thinking of indexing. The current > indexing strategy appears to be limited to b-trees. It would be > interesting to consider how to do extensible a la GiST and to use > things like spatial-trees for indexing of spatial data in rucksacks, > for example. Yes, I agree that would be interesting. And I tried to design Rucksack in such a way that such extensions can be integrated well with the rest of the system. Arthur From alemmens at xs4all.nl Fri Jan 12 21:38:53 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 22:38:53 +0100 Subject: [rucksack-devel] string-index range searching? In-Reply-To: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> References: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> Message-ID: Cyrus Harmon wrote: > is it possible to do range searching on a string-index? full > substring searching might be a bit much to ask, but it should be > fairly trivial to support "starts with" queries over the existing > indices. I suppose you could implement "starts with" queries with some creative use of the :MIN, :MAX, :INCLUDE-MIN and :INCLUDE-MAX arguments for RUCKSACK-MAP-SLOT. Efficient substring searching is a different matter. I suppose you could define your own substring-searchable-string-index data structures for that, but I haven't thought about it very much. Arthur From ch-rucksack at bobobeach.com Fri Jan 12 21:44:13 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 13:44:13 -0800 Subject: [rucksack-devel] string-index range searching? In-Reply-To: References: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> Message-ID: On Jan 12, 2007, at 1:38 PM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> is it possible to do range searching on a string-index? full >> substring searching might be a bit much to ask, but it should be >> fairly trivial to support "starts with" queries over the existing >> indices. > > I suppose you could implement "starts with" queries with some creative > use of the :MIN, :MAX, :INCLUDE-MIN and :INCLUDE-MAX arguments for > RUCKSACK-MAP-SLOT. Yes, and I was supplying the wrong slot name in my rucksack-map-slot form, which is why this wasn't working. getting creative with :min and :max basically seems to work. > Efficient substring searching is a different matter. I suppose you > could define your own substring-searchable-string-index data > structures > for that, but I haven't thought about it very much. Certainly. I'm considering exploring montezuma's text indexing capabilities here. OTOH, mapping over the slots and using something like cl-ppcre, while not something you would want to do a lot of with a million objects, should work with the existing API. Cyrus From ch-rucksack at bobobeach.com Fri Jan 12 21:45:15 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 13:45:15 -0800 Subject: [rucksack-devel] string-index range searching? In-Reply-To: References: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> Message-ID: Oh, one thing this does bring up is that we should do some error checking in rucksack-map-slot to make sure that the slot actually exists. Cyrus On Jan 12, 2007, at 1:44 PM, Cyrus Harmon wrote: > > On Jan 12, 2007, at 1:38 PM, Arthur Lemmens wrote: > >> Cyrus Harmon wrote: >> >>> is it possible to do range searching on a string-index? full >>> substring searching might be a bit much to ask, but it should be >>> fairly trivial to support "starts with" queries over the existing >>> indices. >> >> I suppose you could implement "starts with" queries with some >> creative >> use of the :MIN, :MAX, :INCLUDE-MIN and :INCLUDE-MAX arguments for >> RUCKSACK-MAP-SLOT. > > Yes, and I was supplying the wrong slot name in my rucksack-map- > slot form, which is why this wasn't working. getting creative > with :min and :max basically seems to work. > >> Efficient substring searching is a different matter. I suppose you >> could define your own substring-searchable-string-index data >> structures >> for that, but I haven't thought about it very much. > > Certainly. I'm considering exploring montezuma's text indexing > capabilities here. OTOH, mapping over the slots and using something > like cl-ppcre, while not something you would want to do a lot of > with a million objects, should work with the existing API. > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From alemmens at xs4all.nl Fri Jan 12 21:47:44 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 22:47:44 +0100 Subject: [rucksack-devel] string-index range searching? In-Reply-To: References: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Oh, one thing this does bring up is that we should do some error > checking in rucksack-map-slot to make sure that the slot actually > exists. Yes, good idea ;-) Arthur From alemmens at xs4all.nl Fri Jan 12 21:52:04 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 22:52:04 +0100 Subject: [rucksack-devel] string-index range searching? In-Reply-To: References: <382B2413-E612-4CA7-BFF1-5178DFC5AE42@bobobeach.com> Message-ID: Cyrus Harmon wrote: > I'm considering exploring montezuma's text indexing capabilities here. Yes, a heavy-duty text indexer could be a very interesting application for Rucksack, I think. > Mapping over the slots and using something like cl-ppcre, while not > something you would want to do a lot of with a million objects, should > work with the existing API. Of course. Arthur From alemmens at xs4all.nl Fri Jan 12 21:56:43 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 22:56:43 +0100 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: Cyrus Harmon wrote: > Yes, that sounds like a reasonable approach. I thought about making > this an argument to with-transaction like: > > (with-transaction (:gc nil) > ...) Yes, I'll probably do something like that. > One thing to note is that eventually you do have to pay the GC cost, > and on the first transaction with GC after my big insert, there was a > multiple minute delay while it GCed the rucksack. Just curious: how big is the resulting rucksack, approximately? Arthur From ch-rucksack at bobobeach.com Fri Jan 12 22:02:00 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 14:02:00 -0800 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: On Jan 12, 2007, at 1:56 PM, Arthur Lemmens wrote: > Just curious: how big is the resulting rucksack, approximately? about 500 MB, ATM. but it's in the middle of reloading it, which is actually a fairly expensive operation. I think re-evaluating the defclass form when I restart lisp and load my package is causing some rather expensive operations. It takes a few minutes to load. While I like the whole class re-definition thing, perhaps we need a way to distinguish an attempt to defclass a class that matches what's already on disk from a changed class (or perhaps there's something else going on here). Cyrus From alemmens at xs4all.nl Fri Jan 12 22:13:51 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 23:13:51 +0100 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: > about 500 MB, ATM. but it's in the middle of reloading it, which is > actually a fairly expensive operation. I think re-evaluating the > defclass form when I restart lisp and load my package is causing some > rather expensive operations. It takes a few minutes to load. Mmm. How big is your roots file? > While I like the whole class re-definition thing, perhaps we need a > way to distinguish an attempt to defclass a class that matches what's > already on disk from a changed class I would think that the schema table already takes care of that. But I could be missing something. > (or perhaps there's something else going on here) I'm wondering about that. Can you interrupt it and have a look at the stack? From alemmens at xs4all.nl Fri Jan 12 22:23:33 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 23:23:33 +0100 Subject: [rucksack-devel] tests In-Reply-To: References: Message-ID: Levente M?sz?ros wrote: > Here is an simple export machinery for rucksack. It transforms the > rucksack database into a lambda form which when called imports it into > another rucksack. Some tests included. Thanks, interesting. The most important reason I see for an import/export facility is that this may make it easier to move between different versions of Rucksack's file format. To make that possible (and to make sure that import/export version also works for big rucksacks), you'd need to add some features to your current implementation: - In general, you can't assume that all persistent objects fit in memory. (If they do, probably more than 50% of Rucksack's code is not necessary.) This means that using a hash-table to map from object ids to objects will not work in general. You'll need something disk-based instead. The object-table file, for example... - For big rucksacks, you'll create an enormously big lambda form. You can't expect that your Lisp implementation will be able to compile such a form (it may not even be able to read it). - You'd need to export (and import) everything that Rucksack saves on disk. This includes the schema table, for example. And everything that Rucksack can serialize, including hash tables and arrays. Arthur From ch-rucksack at bobobeach.com Fri Jan 12 22:25:55 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 14:25:55 -0800 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: On Jan 12, 2007, at 2:13 PM, Arthur Lemmens wrote: >> about 500 MB, ATM. but it's in the middle of reloading it, which is >> actually a fairly expensive operation. I think re-evaluating the >> defclass form when I restart lisp and load my package is causing some >> rather expensive operations. It takes a few minutes to load. > > Mmm. How big is your roots file? (sly at cassis):~/projects/cyrusharmon.org/cl-bio/rucksack$ du -sk * 443492 heap 62148 objects 4 roots 4 schemas >> While I like the whole class re-definition thing, perhaps we need a >> way to distinguish an attempt to defclass a class that matches what's >> already on disk from a changed class > > I would think that the schema table already takes care of that. But > I could be missing something. > >> (or perhaps there's something else going on here) > > I'm wondering about that. Can you interrupt it and have a look at > the stack? looks like it's gc'ing, although this (or something else at least) has been going on for quite a while now: 0: (SB-UNIX::SIGINT-HANDLER # # #.(SB-SYS:INT-SAP #X0220613C)) 1: (SB-SYS:INVOKE-INTERRUPTION #) 2: ("foreign function: call_into_lisp") 3: ("foreign function: funcall3") 4: ("foreign function: interrupt_handle_now") 5: ("foreign function: _sigtramp") 6: ("bogus stack frame") 7: (SB-IMPL::REFILL-BUFFER/FD #) 8: (SB-IMPL::INPUT-UNSIGNED-8BIT-BYTE # NIL NIL) 9: ((SB-PCL::FAST-METHOD RUCKSACK::DESERIALIZE-BYTE (STREAM)) # # # NIL) 10: (RUCKSACK::READ-NEXT-MARKER #) 11: (RUCKSACK::DESERIALIZE # T NIL) 12: ((SB-PCL::FAST-METHOD RUCKSACK::LOAD-BLOCK (RUCKSACK:MARK-AND- SWEEP-HEAP #1="#<...>" . #1#)) # # # 147468424 :BUFFER NIL :SKIP-HEADER T) 13: ((SB-PCL::FAST-METHOD RUCKSACK::MARK-ROOT (RUCKSACK:MARK-AND- SWEEP-HEAP INTEGER)) # # # 960964) 14: ((SB-PCL::FAST-METHOD RUCKSACK::MARK-SOME-ROOTS (RUCKSACK:MARK- AND-SWEEP-HEAP T)) # # # 971898880) 15: ((SB-PCL::FAST-METHOD RUCKSACK::COLLECT-SOME-GARBAGE (RUCKSACK:MARK-AND-SWEEP-HEAP T)) # # # 971898880) 16: ((SB-PCL::FAST-METHOD RUCKSACK:TRANSACTION-COMMIT-1 (RUCKSACK:STANDARD-TRANSACTION RUCKSACK:STANDARD-CACHE RUCKSACK:STANDARD-RUCKSACK)) # # # # #) 17: ((LAMBDA (SB-PCL::.PV-CELL. SB-PCL::.NEXT-METHOD-CALL. SB- PCL::.ARG0. SB-PCL::.ARG1. SB-PCL::.ARG2.)) # # # # #) 18: (NIL) 19: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LET* ((RUCKSACK:*RUCKSACK* RUCKSACK:*RUCKSACK*) (CL- BIO::RUCKSACK #)) (UNWIND-PROTECT (PROGN #) (RUCKSACK:CLOSE-RUCKSACK CL- BIO::RUCKSACK))) #) 20: ((FLET SB-C::DEFAULT-PROCESSOR) (LET* ((RUCKSACK:*RUCKSACK* RUCKSACK:*RUCKSACK*) (CL- BIO::RUCKSACK #)) (UNWIND-PROTECT (PROGN #) (RUCKSACK:CLOSE-RUCKSACK CL- BIO::RUCKSACK)))) 21: (SB-C::PROCESS-TOPLEVEL-FORM (LET* ((RUCKSACK:*RUCKSACK* RUCKSACK:*RUCKSACK*) (CL- BIO::RUCKSACK #)) (UNWIND-PROTECT (PROGN #) (RUCKSACK:CLOSE-RUCKSACK CL- BIO::RUCKSACK))) (SB-C::ORIGINAL-SOURCE-START 2 2 4) (:COMPILE-TOPLEVEL)) 22: ((FLET SB-C::DEFAULT-PROCESSOR) (RUCKSACK:WITH-RUCKSACK (CL- BIO::RUCKSACK CL-BIO::*BIO-RUCKSACK*) (RUCKSACK:WITH-TRANSACTION NIL (DEFCLASS CL-BIO::P-TAX-NAME # # # #)))) 23: (SB-C::PROCESS-TOPLEVEL-FORM (RUCKSACK:WITH-RUCKSACK (CL- BIO::RUCKSACK CL-BIO::*BIO-RUCKSACK*) (RUCKSACK:WITH-TRANSACTION NIL (DEFCLASS CL-BIO::P-TAX-NAME # # # #))) (SB-C::ORIGINAL-SOURCE-START 0 4) (:COMPILE-TOPLEVEL)) 24: (SB-C::PROCESS-TOPLEVEL-PROGN ((RUCKSACK:WITH-RUCKSACK (CL- BIO::RUCKSACK CL-BIO::*BIO-RUCKSACK*) (RUCKSACK:WITH-TRANSACTION NIL #))) (SB-C::ORIGINAL-SOURCE-START 0 4) (:COMPILE-TOPLEVEL)) 25: (SB-C::PROCESS-TOPLEVEL-FORM (EVAL-WHEN (:COMPILE-TOPLEVEL :LOAD- TOPLEVEL :EXECUTE) (RUCKSACK:WITH-RUCKSACK (CL-BIO::RUCKSACK CL- BIO::*BIO-RUCKSACK*) (RUCKSACK:WITH-TRANSACTION NIL #))) (SB- C::ORIGINAL-SOURCE-START 0 4) NIL) From alemmens at xs4all.nl Fri Jan 12 22:32:31 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Fri, 12 Jan 2007 23:32:31 +0100 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: Cyrus Harmon wrote: >> I'm wondering about that. Can you interrupt it and have a look at >> the stack? > > looks like it's gc'ing That sounds plausible to me. > although this (or something else at least) has been going on for quite > a while now: Fine tuning the settings for when and how much to GC could probably help a lot here. No, I don't have specific suggestions at the moment. But I'm sure you can come up with some improvements if you play with this for a while. Arthur From ch-rucksack at bobobeach.com Fri Jan 12 22:55:18 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 14:55:18 -0800 Subject: [rucksack-devel] PATCH: without-rucksack-gcing In-Reply-To: References: <75B238A7-74DF-4979-B3D6-DF16D45D0350@bobobeach.com> <25B50112-D12D-4736-B0B9-6A0E90F9B2B6@bobobeach.com> <9BC7E70D-D688-44C8-9207-FDD8E8AC1F78@bobobeach.com> <9D070E23-604C-433D-909C-5445FD9D4FBE@bobobeach.com> <14D26DDE-6F2B-40A9-B3FB-D3246C236646@bobobeach.com> Message-ID: On Jan 12, 2007, at 2:32 PM, Arthur Lemmens wrote: > Fine tuning the settings for when and how much to GC could probably > help a lot here. No, I don't have specific suggestions at the moment. > But I'm sure you can come up with some improvements if you play with > this for a while. Hmm... my knowledge of the garbage collector and what it's doing is pretty limited. I know how to turn it off. Eventually, it seems, there is a GC bill we must pay. I don't see how tweaking the settings is going to change that, but perhaps I'm just not aware of the proper settings to tweak. Cyrus From ch-rucksack at bobobeach.com Fri Jan 12 23:39:42 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 15:39:42 -0800 Subject: [rucksack-devel] map-slot performance issues Message-ID: So it seems as if map-slot over big indices is rather slow. We seem to be spending a lot of time down inside map-btree-keys-for-node. I could, sort of, understand this for the first access to the rucksack, but even subsequent accesses can take a while. For reference the "query" is map-slot with :min "Canis" and :max "Canisz". I'm guessing that the sequential scan through all the nodes of the b- tree is killing performance, BICBW, and even if I'm right there may be other things going on here... Cyrus From ch-rucksack at bobobeach.com Fri Jan 12 23:54:55 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 15:54:55 -0800 Subject: [rucksack-devel] SBCL fake_foreign_call falling through Message-ID: <25F4438C-BA27-4331-A3E5-4B8215D93741@bobobeach.com> I'm seeing a fair number of these kind of errors, which are not pleasant: fatal error encountered in SBCL pid 29896: fake_foreign_call fell through LDB monitor ldb> backtrace Backtrace: 0: Foreign function ldb_monitor, fp = 0x2206948, ra = 0x70d7 1: Foreign function lose, fp = 0x2206978, ra = 0x586e 2: Foreign function arch_install_interrupt_handlers, fp = 0x22069b8, ra = 0xd7e0 3: Foreign function _sigtramp, fp = 0x22069d8, ra = 0x9011110c 4: Foreign fp = 0x2206d2c, ra = 0xffffffff 5: (SB-C::TL-XEP SB-IMPL::FD-SOUT) (this is with ~1M objects in the rucksack). I think we're putting too many objects in the cache when we scan the b-trees to find nodes, but, still that should cause SBCL to blow up so horribly. Cyrus From ch-rucksack at bobobeach.com Sat Jan 13 03:19:17 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 19:19:17 -0800 Subject: [rucksack-devel] map-slot performance issues In-Reply-To: References: Message-ID: Well, this is a bit of a hack and certainly a better approach would be to do binary search, but changing the max b-tree node size to 32 instead of 100 greatly improves performance. Cyrus On Jan 12, 2007, at 3:39 PM, Cyrus Harmon wrote: > > So it seems as if map-slot over big indices is rather slow. We seem > to be spending a lot of time down inside map-btree-keys-for-node. I > could, sort of, understand this for the first access to the > rucksack, but even subsequent accesses can take a while. For > reference the "query" is map-slot with :min "Canis" and :max "Canisz". > > I'm guessing that the sequential scan through all the nodes of the > b-tree is killing performance, BICBW, and even if I'm right there > may be other things going on here... > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Sat Jan 13 05:50:25 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 21:50:25 -0800 Subject: [rucksack-devel] losing data? Message-ID: <96B7FCBE-F6A4-4ADF-91B0-A9B239768916@bobobeach.com> so I don't have a reproducible test case for this yet, but I figured I'd mention it. It seems as though a very small number of objects are either 1) not getting created when I think they should be or 2) lost at some later point (gc?). I'll keep an eye on this and see if I can reproduce it, but if it is reproducible that would be a bit troubling. Cyrus From ch-rucksack at bobobeach.com Sat Jan 13 07:49:25 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Fri, 12 Jan 2007 23:49:25 -0800 Subject: [rucksack-devel] losing data? In-Reply-To: <96B7FCBE-F6A4-4ADF-91B0-A9B239768916@bobobeach.com> References: <96B7FCBE-F6A4-4ADF-91B0-A9B239768916@bobobeach.com> Message-ID: never mind. I think I've found the bug in my code. On Jan 12, 2007, at 9:50 PM, Cyrus Harmon wrote: > so I don't have a reproducible test case for this yet, but I > figured I'd mention it. It seems as though a very small number of > objects are either 1) not getting created when I think they should > be or 2) lost at some later point (gc?). I'll keep an eye on this > and see if I can reproduce it, but if it is reproducible that would > be a bit troubling. > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From alemmens at xs4all.nl Sat Jan 13 10:50:03 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Sat, 13 Jan 2007 11:50:03 +0100 Subject: [rucksack-devel] map-slot performance issues In-Reply-To: References: Message-ID: Cyrus Harmon wrote: > Well, this is a bit of a hack and certainly a better approach would > be to do binary search, but changing the max b-tree node size to 32 > instead of 100 greatly improves performance. OK, good. Did you try even smaller sizes? Arthur From ch-rucksack at bobobeach.com Sat Jan 13 15:06:04 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sat, 13 Jan 2007 07:06:04 -0800 Subject: [rucksack-devel] map-slot performance issues In-Reply-To: References: Message-ID: On Jan 13, 2007, at 2:50 AM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> Well, this is a bit of a hack and certainly a better approach would >> be to do binary search, but changing the max b-tree node size to 32 >> instead of 100 greatly improves performance. > > OK, good. Did you try even smaller sizes? No, but I do have a patch that does binary search on the nodes, which seems marginally faster, but I haven't measured the difference yet. Cyrus From ch-rucksack at bobobeach.com Sun Jan 14 19:02:34 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sun, 14 Jan 2007 11:02:34 -0800 Subject: [rucksack-devel] binary search in find-subnode Message-ID: <99C8257C-9E39-4169-8EB3-F6FB530DA5A0@bobobeach.com> Well, it took me longer than I'd care to admit to get this right, but here's what I came up with for the binary search routine. I'm sure this could be cleaned up a bit, but if anyone with a fresh set of eyes wants to take a look at this, I'd appreciate it: (defun find-subnode (btree node key) "Returns the subnode that contains more information for the given key." (let ((btree-key< (btree-key< btree)) (last (1- (btree-node-index-count node)))) (labels ((binary-search (start end) (let* ((mid (+ start (truncate (- end start) 2))) (mid-binding (node-binding node mid))) (if (= start mid) (if (not (funcall btree-key< (binding-key mid- binding) key)) (binding-value mid-binding) (binding-value (node-binding node (1+ mid)))) (if (not (funcall btree-key< (binding-key mid- binding) key)) (binary-search start mid) (binary-search mid end)))))) (if (funcall btree-key< (binding-key (node-binding node (1- last))) key) (binding-value (node-binding node last)) (binary-search 0 (1- last))))) ;;; this is the old (linear search) version kept here for reference ;;; for the moment #+nil (progn (loop with btree-key< = (btree-key< btree) with last-index = (1- (btree-node-index-count node)) for i to last-index for binding = (node-binding node i) when (or (= i last-index) (funcall btree-key< key (binding-key binding)) (not (funcall btree-key< (binding-key binding) key))) do (return-from find-subnode (binding-value binding))) (error "This shouldn't happen."))) Thanks, Cyrus From ch-rucksack at bobobeach.com Sun Jan 14 19:19:20 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sun, 14 Jan 2007 11:19:20 -0800 Subject: [rucksack-devel] binary search in find-subnode In-Reply-To: <99C8257C-9E39-4169-8EB3-F6FB530DA5A0@bobobeach.com> References: <99C8257C-9E39-4169-8EB3-F6FB530DA5A0@bobobeach.com> Message-ID: <00C6355E-9537-4C6A-9D29-A1D0F0F699F2@bobobeach.com> And for find-binding-in-node: (defun find-subnode (btree node key) "Returns the subnode that contains more information for the given key." (let ((btree-key< (btree-key< btree)) (last (1- (btree-node-index-count node)))) (labels ((binary-search (start end) (let* ((mid (+ start (truncate (- end start) 2))) (mid-binding (node-binding node mid))) (if (= start mid) (if (not (funcall btree-key< (binding-key mid- binding) key)) (binding-value mid-binding) (binding-value (node-binding node (1+ mid)))) (if (not (funcall btree-key< (binding-key mid- binding) key)) (binary-search start mid) (binary-search mid end)))))) (if (funcall btree-key< (binding-key (node-binding node (1- last))) key) (binding-value (node-binding node last)) (binary-search 0 (1- last))))) ;;; this is the old (linear search) version kept here for reference ;;; for the moment #+nil (progn (loop with btree-key< = (btree-key< btree) with last-index = (1- (btree-node-index-count node)) for i to last-index for binding = (node-binding node i) when (or (= i last-index) (funcall btree-key< key (binding-key binding)) (not (funcall btree-key< (binding-key binding) key))) do (return-from find-subnode (binding-value binding))) (error "This shouldn't happen."))) Cyrus On Jan 14, 2007, at 11:02 AM, Cyrus Harmon wrote: > > Well, it took me longer than I'd care to admit to get this right, > but here's what I came up with for the binary search routine. I'm > sure this could be cleaned up a bit, but if anyone with a fresh set > of eyes wants to take a look at this, I'd appreciate it: > > (defun find-subnode (btree node key) > "Returns the subnode that contains more information for the given > key." > (let ((btree-key< (btree-key< btree)) > (last (1- (btree-node-index-count node)))) > (labels ((binary-search (start end) > (let* ((mid (+ start (truncate (- end start) 2))) > (mid-binding (node-binding node mid))) > (if (= start mid) > (if (not (funcall btree-key< (binding-key mid- > binding) key)) > (binding-value mid-binding) > (binding-value (node-binding node (1+ mid)))) > (if (not (funcall btree-key< (binding-key mid- > binding) key)) > (binary-search start mid) > (binary-search mid end)))))) > (if (funcall btree-key< (binding-key (node-binding node (1- > last))) key) > (binding-value (node-binding node last)) > (binary-search 0 (1- last))))) > ;;; this is the old (linear search) version kept here for reference > ;;; for the moment > #+nil > (progn > (loop with btree-key< = (btree-key< btree) > with last-index = (1- (btree-node-index-count node)) > for i to last-index > for binding = (node-binding node i) > when (or (= i last-index) > (funcall btree-key< key (binding-key binding)) > (not (funcall btree-key< (binding-key binding) key))) > do (return-from find-subnode (binding-value binding))) > (error "This shouldn't happen."))) > > > Thanks, > > Cyrus > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Sun Jan 14 19:35:25 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sun, 14 Jan 2007 11:35:25 -0800 Subject: [rucksack-devel] binary search in find-subnode In-Reply-To: <00C6355E-9537-4C6A-9D29-A1D0F0F699F2@bobobeach.com> References: <99C8257C-9E39-4169-8EB3-F6FB530DA5A0@bobobeach.com> <00C6355E-9537-4C6A-9D29-A1D0F0F699F2@bobobeach.com> Message-ID: <55844ECE-4BD9-4C7C-9550-013D0262A44E@bobobeach.com> Ah, whoops. Pasted wrong function. But the one I have doesn't work either. Will resend in a moment after I (hopefully) fix it. Cyrus On Jan 14, 2007, at 11:19 AM, Cyrus Harmon wrote: > And for find-binding-in-node: > > (defun find-subnode (btree node key) > "Returns the subnode that contains more information for the given > key." > (let ((btree-key< (btree-key< btree)) > (last (1- (btree-node-index-count node)))) > (labels ((binary-search (start end) > (let* ((mid (+ start (truncate (- end start) 2))) > (mid-binding (node-binding node mid))) > (if (= start mid) > (if (not (funcall btree-key< (binding-key mid- > binding) key)) > (binding-value mid-binding) > (binding-value (node-binding node (1+ mid)))) > (if (not (funcall btree-key< (binding-key mid- > binding) key)) > (binary-search start mid) > (binary-search mid end)))))) > (if (funcall btree-key< (binding-key (node-binding node (1- > last))) key) > (binding-value (node-binding node last)) > (binary-search 0 (1- last))))) > ;;; this is the old (linear search) version kept here for reference > ;;; for the moment > #+nil > (progn > (loop with btree-key< = (btree-key< btree) > with last-index = (1- (btree-node-index-count node)) > for i to last-index > for binding = (node-binding node i) > when (or (= i last-index) > (funcall btree-key< key (binding-key binding)) > (not (funcall btree-key< (binding-key binding) key))) > do (return-from find-subnode (binding-value binding))) > (error "This shouldn't happen."))) > > Cyrus > > > > On Jan 14, 2007, at 11:02 AM, Cyrus Harmon wrote: > >> >> Well, it took me longer than I'd care to admit to get this right, >> but here's what I came up with for the binary search routine. I'm >> sure this could be cleaned up a bit, but if anyone with a fresh >> set of eyes wants to take a look at this, I'd appreciate it: >> >> (defun find-subnode (btree node key) >> "Returns the subnode that contains more information for the >> given key." >> (let ((btree-key< (btree-key< btree)) >> (last (1- (btree-node-index-count node)))) >> (labels ((binary-search (start end) >> (let* ((mid (+ start (truncate (- end start) 2))) >> (mid-binding (node-binding node mid))) >> (if (= start mid) >> (if (not (funcall btree-key< (binding-key mid- >> binding) key)) >> (binding-value mid-binding) >> (binding-value (node-binding node (1+ >> mid)))) >> (if (not (funcall btree-key< (binding-key mid- >> binding) key)) >> (binary-search start mid) >> (binary-search mid end)))))) >> (if (funcall btree-key< (binding-key (node-binding node (1- >> last))) key) >> (binding-value (node-binding node last)) >> (binary-search 0 (1- last))))) >> ;;; this is the old (linear search) version kept here for reference >> ;;; for the moment >> #+nil >> (progn >> (loop with btree-key< = (btree-key< btree) >> with last-index = (1- (btree-node-index-count node)) >> for i to last-index >> for binding = (node-binding node i) >> when (or (= i last-index) >> (funcall btree-key< key (binding-key binding)) >> (not (funcall btree-key< (binding-key binding) key))) >> do (return-from find-subnode (binding-value binding))) >> (error "This shouldn't happen."))) >> >> >> Thanks, >> >> Cyrus >> >> _______________________________________________ >> rucksack-devel mailing list >> rucksack-devel at common-lisp.net >> http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel > > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel From ch-rucksack at bobobeach.com Sun Jan 14 22:26:43 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sun, 14 Jan 2007 14:26:43 -0800 Subject: [rucksack-devel] PATCH: b-tree binary search Message-ID: <2977A78A-01EA-45F5-960E-E6DAE8F65033@bobobeach.com> Ok, I think this is working now: --- p-btrees.lisp 26 Aug 2006 05:55:34 -0700 1.10 +++ p-btrees.lisp 14 Jan 2007 11:51:57 -0800 @@ -495,30 +495,69 @@ (defun find-subnode (btree node key) "Returns the subnode that contains more information for the given key." - ;; Find the first binding with a key >= the given key and return - ;; the corresponding subnode. - ;; EFFICIENCY: We should probably use binary search for this. - (loop with btree-key< = (btree-key< btree) - with last-index = (1- (btree-node-index-count node)) - for i to last-index - for binding = (node-binding node i) - when (or (= i last-index) - (funcall btree-key< key (binding-key binding)) - (not (funcall btree-key< (binding-key binding) key))) - do (return-from find-subnode (binding-value binding))) - (error "This shouldn't happen.")) + (let ((btree-key< (btree-key< btree)) + (last (1- (btree-node-index-count node)))) + (labels ((binary-search (start end) + (let* ((mid (+ start (truncate (- end start) 2))) + (mid-binding (node-binding node mid))) + (if (= start mid) + (if (not (funcall btree-key< (binding-key mid- binding) key)) + (binding-value mid-binding) + (binding-value (node-binding node (1+ mid)))) + (if (not (funcall btree-key< (binding-key mid- binding) key)) + (binary-search start mid) + (binary-search mid end)))))) + (if (funcall btree-key< (binding-key (node-binding node (1- last))) key) + (binding-value (node-binding node last)) + (binary-search 0 (1- last))))) + ;;; this is the old (linear search) version kept here for reference + ;;; for the moment + #+nil + (progn + (loop with btree-key< = (btree-key< btree) + with last-index = (1- (btree-node-index-count node)) + for i to last-index + for binding = (node-binding node i) + when (or (= i last-index) + (funcall btree-key< key (binding-key binding)) + (not (funcall btree-key< (binding-key binding) key))) + do (return-from find-subnode (binding-value binding))) + (error "This shouldn't happen."))) (defun find-binding-in-node (key node btree) + (let ((btree-key< (btree-key< btree)) + (array (btree-node-index node)) + (index-count (btree-node-index-count node))) + (labels ((binary-search (start end) + (let* ((mid (+ start (truncate (- end start) 2))) + (mid-binding (p-aref array mid)) + (mid-key (binding-key mid-binding))) + (if (= start mid) + (if (not (funcall btree-key< (binding-key mid- binding) key)) + (when (funcall (btree-key= btree) key mid-key) + mid-binding) + (when (< mid end) + (let* ((next-binding (p-aref array (1+ mid))) + (next-key (binding-key next- binding))) + (when (funcall (btree-key= btree) key next-key) + next-binding)))) + (if (not (funcall btree-key< (binding-key mid- binding) key)) + (binary-search start mid) + (binary-search (1+ mid) end)))))) + (when (plusp index-count) + (binary-search 0 (1- index-count))))) + + #+nil (let ((index-count (btree-node-index-count node))) (and (plusp index-count) (loop with array = (btree-node-index node) - with btree-key< = (btree-key< btree) - for i from 0 below index-count - for candidate = (p-aref array i) - for candidate-key = (binding-key candidate) - while (funcall btree-key< candidate-key key) - finally (when (funcall (btree-key= btree) key candidate-key) - (return candidate)))))) + with btree-key< = (btree-key< btree) + for i from 0 below index-count + for candidate = (p-aref array i) + for candidate-key = (binding-key candidate) + while (funcall btree-key< candidate-key key) + finally (when (funcall (btree-key= btree) key candidate- key) + (return candidate)))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;; ;;; Insert From alemmens at xs4all.nl Sun Jan 14 23:02:31 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 15 Jan 2007 00:02:31 +0100 Subject: [rucksack-devel] PATCH: b-tree binary search In-Reply-To: <2977A78A-01EA-45F5-960E-E6DAE8F65033@bobobeach.com> References: <2977A78A-01EA-45F5-960E-E6DAE8F65033@bobobeach.com> Message-ID: Cyrus Harmon wrote: > + (let ((btree-key< (btree-key< btree)) > + (last (1- (btree-node-index-count node)))) > + (labels ((binary-search (start end) > + (let* ((mid (+ start (truncate (- end start) 2))) > + (mid-binding (node-binding node mid))) > + (if (= start mid) > + (if (not (funcall btree-key< (binding-key mid-binding) key)) > + (binding-value mid-binding) > + (binding-value (node-binding node (1+ mid)))) I would be inclined to get rid of the IF NOT by reversing the order of the branches: (if (funcall btree-key< (binding-key mid-binding) key) (binding-value (node-binding node (1+ mid)) (binding-value mid-binding)) But that's no big deal. > + (if (not (funcall btree-key< (binding-key mid-binding) key)) > + (binary-search start mid) > + (binary-search mid end)))))) Same here. > + (if (funcall btree-key< (binding-key (node-binding node (1- last))) key) > + (binding-value (node-binding node last)) > + (binary-search 0 (1- last))))) I find the (1- LAST) a bit suspicious here, because LAST is already (1- (BTREE-NODE-INDEX-COUNT NODE)). So if a node has only one element, this would fail, right? (It's been a while, so I don't even remember if it's possible that a node has only 1 element. But I'd feel safer if you wrote this part in such a way that it doesn't depend on a minimum number of node elements.) Thanks, Arthur From ch-rucksack at bobobeach.com Sun Jan 14 23:10:32 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sun, 14 Jan 2007 15:10:32 -0800 Subject: [rucksack-devel] PATCH: b-tree binary search In-Reply-To: References: <2977A78A-01EA-45F5-960E-E6DAE8F65033@bobobeach.com> Message-ID: <51756822-88B7-4FDE-A0A0-2CDAB3822931@bobobeach.com> On Jan 14, 2007, at 3:02 PM, Arthur Lemmens wrote: > Cyrus Harmon wrote: > >> + (let ((btree-key< (btree-key< btree)) >> + (last (1- (btree-node-index-count node)))) >> + (labels ((binary-search (start end) >> + (let* ((mid (+ start (truncate (- end start) 2))) >> + (mid-binding (node-binding node mid))) >> + (if (= start mid) >> + (if (not (funcall btree-key< (binding-key >> mid-binding) key)) >> + (binding-value mid-binding) >> + (binding-value (node-binding node (1+ >> mid)))) > > I would be inclined to get rid of the IF NOT by reversing the order > of the branches: > > (if (funcall btree-key< (binding-key mid-binding) key) > (binding-value (node-binding node (1+ mid)) > (binding-value mid-binding)) > > But that's no big deal. Yes, that seems reasonable. > >> + (if (not (funcall btree-key< (binding-key >> mid-binding) key)) >> + (binary-search start mid) >> + (binary-search mid end)))))) > > Same here. ditto > >> + (if (funcall btree-key< (binding-key (node-binding node (1- >> last))) key) >> + (binding-value (node-binding node last)) >> + (binary-search 0 (1- last))))) > > I find the (1- LAST) a bit suspicious here, because LAST is already > (1- (BTREE-NODE-INDEX-COUNT NODE)). So if a node has only one > element, > this would fail, right? (It's been a while, so I don't even remember > if it's possible that a node has only 1 element. But I'd feel safer > if you wrote this part in such a way that it doesn't depend on a > minimum > number of node elements.) My understanding from reading the code was that the last element was somehow special and has a weird key value that we can't do btree-key< on and that this was detected in the old version by the presence of the (= i last-index), but we still return the (binding-value ...) for it. It took me a while to figure this out and ICBW, but that's what it looked like to me. Note that this only applies to find-subnode, not find-binding in node, or so it seemed to me. Cyrus From ch-rucksack at bobobeach.com Sun Jan 14 23:15:55 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Sun, 14 Jan 2007 15:15:55 -0800 Subject: [rucksack-devel] PATCH: b-tree binary search In-Reply-To: <51756822-88B7-4FDE-A0A0-2CDAB3822931@bobobeach.com> References: <2977A78A-01EA-45F5-960E-E6DAE8F65033@bobobeach.com> <51756822-88B7-4FDE-A0A0-2CDAB3822931@bobobeach.com> Message-ID: <7DFDD717-8FE3-4035-A209-9470C6423774@bobobeach.com> On Jan 14, 2007, at 3:10 PM, Cyrus Harmon wrote: > On Jan 14, 2007, at 3:02 PM, Arthur Lemmens wrote: >> Cyrus Harmon wrote: >> >> >>> + (if (not (funcall btree-key< (binding-key >>> mid-binding) key)) >>> + (binary-search start mid) >>> + (binary-search mid end)))))) >> >> Same here. > > ditto hmm... should that be (binary-search (1+ mid) end) ? From levente.meszaros at gmail.com Mon Jan 15 11:46:30 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Mon, 15 Jan 2007 12:46:30 +0100 Subject: [rucksack-devel] Export Message-ID: Hi, First of all, rucksack is great in many ways, thanks for the efforts! While I was thinking of the export machinery I came to the following idea. The algorithm traverses the whole database (optionally only partial) and dumps it to several lisp source files. Loading and running those lisp files will import the database into another rucksack. Splitting the database into finite parts which can be compiled and evaluated is simple I think and this solves the huge lambda problem. Storing the object -> identity during export and idenity -> object mapping during import could be done by using a fresh and lazy persistent hashtable which could be garbage collected after export/import. I don't know if there's any support for persistent hashtables which are not saved/loaded at once but lazily (since they do not fit into memory)? Am I right that btrees are sufficient to make a map like a hashtable? Is it allowed to put persistent objects into a btree as keys? Cheers, levy -- There's no perfectoin -------------- next part -------------- A non-text attachment was scrubbed... Name: serialize.lisp Type: text/x-lisp-source Size: 1787 bytes Desc: not available URL: From levente.meszaros at gmail.com Mon Jan 15 11:49:13 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Mon, 15 Jan 2007 12:49:13 +0100 Subject: [rucksack-devel] Some other things Message-ID: Hi, I have some other questions not directly related to export. Is it planned to support weak references? Maybe both as a persistent slot option and as a separate object? I implemented the serialization of structures using SBCL, this was fairly trivial dispatching on structure-object. I did not look at other implementations but will you include such patches? Since I am not using my linux box now you can find it in the attachment as a separate file not a patch this time. I also noticed that some slots (rucksack/transaction/etc.) of persistent classes are persistent-effective-slot-definitions even though they should not be. I guess this can be fixed in direct-slot-definition-class and effective-slot-definition-class by checking on the persistence flag. I tried to implement a persistent base class called audited-object which stores the time of creation, the time of last modification, last modifier, etc. In principle an :after method on (setf slot-value-using-class) could make it, but unfortunately I could not distinguish between code setting the persistent slot outside of rucksack and code internal to rucksack itself. Maybe rucksack could use a function slot-value-using-class-internal which would bind a special variable so that overriding svuc will be able to distinguish between loading the object from the disk and setting a slot value by the user. While I like the way with-transaction works I think most of the time user code will not pass in any parameters to it (see tests for example), so maybe it would be better to call the current one with-transaction* and introduce another one without any parameters called with-transaction. What's you opinon about this? I you have other tasks to do and would like I could spend some time on these things. Cheers, levy -- There's no perfectoin -------------- next part -------------- A non-text attachment was scrubbed... Name: serialize.lisp Type: text/x-lisp-source Size: 1787 bytes Desc: not available URL: From kentilton at gmail.com Mon Jan 15 17:08:06 2007 From: kentilton at gmail.com (Ken Tilton) Date: Mon, 15 Jan 2007 12:08:06 -0500 Subject: [rucksack-devel] Re: rucksack-devel Digest, Vol 9, Issue 13 In-Reply-To: <20070115170240.9F9172E1BD@common-lisp.net> References: <20070115170240.9F9172E1BD@common-lisp.net> Message-ID: <45ABB4F6.9090407@gmail.com> rucksack-devel-request at common-lisp.net wrote: >Send rucksack-devel mailing list submissions to > rucksack-devel at common-lisp.net > >To subscribe or unsubscribe via the World Wide Web, visit > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel >or, via email, send a message with subject or body 'help' to > rucksack-devel-request at common-lisp.net > >You can reach the person managing the list at > rucksack-devel-owner at common-lisp.net > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of rucksack-devel digest..." > > >Today's Topics: > > 1. Re: PATCH: b-tree binary search (Cyrus Harmon) > 2. Export ( Levente M?sz?ros ) > 3. Some other things ( Levente M?sz?ros ) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Sun, 14 Jan 2007 15:15:55 -0800 >From: Cyrus Harmon >Subject: Re: [rucksack-devel] PATCH: b-tree binary search >To: Arthur Lemmens >Cc: Rucksack CL persistence library >Message-ID: <7DFDD717-8FE3-4035-A209-9470C6423774 at bobobeach.com> >Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > >On Jan 14, 2007, at 3:10 PM, Cyrus Harmon wrote: > > >>On Jan 14, 2007, at 3:02 PM, Arthur Lemmens wrote: >> >> >>>Cyrus Harmon wrote: >>> >>> >>> >>> >>>>+ (if (not (funcall btree-key< (binding-key >>>>mid-binding) key)) >>>>+ (binary-search start mid) >>>>+ (binary-search mid end)))))) >>>> >>>> >>>Same here. >>> >>> >>ditto >> >> > >hmm... should that be (binary-search (1+ mid) end) ? > > Not if it is traditionally Lispy: (let ((s "abc123") (m 3)) (list (subseq s 0 m)(subseq s m))) -> "abc123" kt -------------- next part -------------- A non-text attachment was scrubbed... Name: kentilton.vcf Type: text/x-vcard Size: 171 bytes Desc: not available URL: From alemmens at xs4all.nl Mon Jan 15 20:58:23 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 15 Jan 2007 21:58:23 +0100 Subject: [rucksack-devel] PATCH: b-tree binary search In-Reply-To: <51756822-88B7-4FDE-A0A0-2CDAB3822931@bobobeach.com> References: <2977A78A-01EA-45F5-960E-E6DAE8F65033@bobobeach.com> <51756822-88B7-4FDE-A0A0-2CDAB3822931@bobobeach.com> Message-ID: Cyrus Harmon wrote: > My understanding from reading the code was that the last element was > somehow special and has a weird key value that we can't do btree-key< > on and that this was detected in the old version by the presence of > the (= i last-index), but we still return the (binding-value ...) for > it. Yes, I think you're right. The key for the last element is the symbol KEY-IRRELEVANT. It's irrelevant because the child values for a btree element are all BTREE-KEY<= the key, but the children of the last element are all BTREE-KEY> the key of the previous element. You could say they're less than infinity ;-) > Note that this only applies to find-subnode, not find-binding in node, > or so it seemed to me. Yes, FIND-BINDING-IN-NODE only needs to work for leaf nodes. In leaf nodes, you're looking for an exact key match (according to BTREE-KEY=). This is different from FIND-SUBNODE, which looks for a child node that contains a certain range of keys. Arthur From ch-rucksack at bobobeach.com Mon Jan 15 21:34:26 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Mon, 15 Jan 2007 13:34:26 -0800 Subject: [rucksack-devel] Re: rucksack-devel Digest, Vol 9, Issue 13 In-Reply-To: <45ABB4F6.9090407@gmail.com> References: <20070115170240.9F9172E1BD@common-lisp.net> <45ABB4F6.9090407@gmail.com> Message-ID: <8E273BF4-B1D2-4E97-BEEC-1D37FF436899@bobobeach.com> Ok, Kenny's email got me thinking. I think the code was working before, but for reasons I didn't quite understand. The issue has to do with the boundary conditions and how we do the splitting. In particular, what happens when the key we are looking for falls in between the last key of the left child and the first key of the right child. I think it was working properly before, but this version is perhaps a bit more explicit in ensuring that: (defun find-subnode (btree node key) "Returns the subnode that contains more information for the given key." ;; Find the first binding with a key >= the given key and return ;; the corresponding subnode. (let ((btree-key< (btree-key< btree)) (last (1- (btree-node-index-count node)))) (labels ((binary-search (start end) (let* ((mid (+ start (ash (- end start) -1)))) (cond ((= start mid) (let ((start-binding (node-binding node start))) (if (funcall btree-key< (binding-key start- binding) key) (binding-value (node-binding node end)) (binding-value start-binding)))) (t (let ((mid-binding (node-binding node mid))) (if (funcall btree-key< (binding-key mid- binding) key) (binary-search mid end) (binary-search start mid)))))))) (if (funcall btree-key< (binding-key (node-binding node (1- last))) key) (binding-value (node-binding node last)) (binary-search 0 last)))) ;;; this is the old (linear search) version kept here for reference ;;; for the moment #+nil (progn (loop with btree-key< = (btree-key< btree) with last-index = (1- (btree-node-index-count node)) for i to last-index for binding = (node-binding node i) when (or (= i last-index) (funcall btree-key< key (binding-key binding)) (not (funcall btree-key< (binding-key binding) key))) do (return-from find-subnode (binding-value binding))) (error "This shouldn't happen."))) (defun find-binding-in-node (key node btree) (let ((btree-key< (btree-key< btree)) (array (btree-node-index node)) (index-count (btree-node-index-count node))) (labels ((binary-search (start end) (let* ((mid (+ start (ash (- end start) -1)))) (cond ((= start mid) (let ((start-binding (p-aref array start))) (if (funcall btree-key< (binding-key start- binding) key) (when (< end index-count) (p-aref array end)) start-binding))) (t (let ((mid-binding (p-aref array mid))) (if (funcall btree-key< (binding-key mid- binding) key) (binary-search mid end) (binary-search start mid)))))))) (when (plusp index-count) (let ((candidate (binary-search 0 index-count))) (when (and candidate (funcall (btree-key= btree) (binding-key candidate) key)) candidate))))) ;;; this is the old (linear search) version kept here for reference ;;; for the moment #+nil (let ((index-count (btree-node-index-count node))) (and (plusp index-count) (loop with array = (btree-node-index node) with btree-key< = (btree-key< btree) for i from 0 below index-count for candidate = (p-aref array i) for candidate-key = (binding-key candidate) while (funcall btree-key< candidate-key key) finally (when (funcall (btree-key= btree) key candidate- key) (return candidate)))))) Cyrus On Jan 15, 2007, at 9:08 AM, Ken Tilton wrote: >> On Jan 14, 2007, at 3:10 PM, Cyrus Harmon wrote: >> >>> On Jan 14, 2007, at 3:02 PM, Arthur Lemmens wrote: >>> >>>> Cyrus Harmon wrote: >>>> >>>> >>>> >>>>> + (if (not (funcall btree-key< (binding- >>>>> key mid-binding) key)) >>>>> + (binary-search start mid) >>>>> + (binary-search mid end)))))) >>>>> >>>> Same here. >>>> >>> ditto >>> >> >> hmm... should that be (binary-search (1+ mid) end) ? >> > Not if it is traditionally Lispy: > > (let ((s "abc123") > (m 3)) > (list (subseq s 0 m)(subseq s m))) -> "abc123" > > kt From alemmens at xs4all.nl Mon Jan 15 21:59:07 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 15 Jan 2007 22:59:07 +0100 Subject: [rucksack-devel] Some other things In-Reply-To: References: Message-ID: Levente M?sz?ros wrote: > I have some other questions not directly related to export. Is it > planned to support weak references? I'm not sure what that would mean in the context of Rucksack. Could you explain what this would mean and why you would want them? Somehow a persistent weak pointer sounds like a contradiction in terms to me, but that's probably just my lack of imagination. > I implemented the serialization of structures using SBCL, this was > fairly trivial dispatching on structure-object. I did not look at > other implementations but will you include such patches? Hmm, difficult question. I'd like Rucksack files to be portable between CL implementations, so you can create a rucksack with one implementation and load it with another implementation later. I'd also like Rucksack files to be self-describing, so that it's possible to recreate all persistent data even if you don't have the source of the program that created the data. These goals are both difficult to achieve, but I try to keep them in mind when working on Rucksack. When we add support for serializing structures, I think we need a way to detect and record changes to structure definitions, similar to what Rucksack does for classes of metaclass PERSISTENT-CLASS. We'd have to detect the changes, add the diffs to the schema table, etcetera. Otherwise the data in a rucksack will become useless as soon as the programmer changes one DEFSTRUCT form in his program. The problem is: I suspect that this (handling changes to structure definitions) will be impossible to achieve in many (most? all?) CL implementations. And I'd rather not add it to Rucksack for one CL implementation if it's obvious that it will never be implemented for most of the others. But if you can implement this in SBCL and can show that it will be possible to implement in a few other implementations, I'll gladly add it to Rucksack. > I also noticed that some slots (rucksack/transaction/etc.) of > persistent classes are persistent-effective-slot-definitions even > though they should not be. I guess this can be fixed in > direct-slot-definition-class and effective-slot-definition-class by > checking on the persistence flag. I don't really understand what you mean. Could you be more specific? > I tried to implement a persistent base class called audited-object > which stores the time of creation, the time of last modification, last > modifier, etc. In principle an :after method on (setf > slot-value-using-class) could make it, but unfortunately I could not > distinguish between code setting the persistent slot outside of > rucksack and code internal to rucksack itself. Maybe rucksack could > use a function slot-value-using-class-internal which would bind a > special variable so that overriding svuc will be able to distinguish > between loading the object from the disk and setting a slot value by > the user. Yes, interesting idea. I'd be inclined to create a new metaclass, say AUDITED-PERSISTENT-CLASS which inherits from PERSISTENT-CLASS, and define a new SETF SLOT-VALUE-USING-CLASS method for classes of that metaclass. Your new method could then do tricks with special variables (if that's really necessary). > While I like the way with-transaction works I think most of the time > user code will not pass in any parameters to it (see tests for > example), so maybe it would be better to call the current one > with-transaction* and introduce another one without any parameters > called with-transaction. What's you opinon about this? You mean you don't like the extra pair of parentheses after WITH-TRANSACTION? Like (with-transaction () (foo) (bar)) I love that empty pair of parentheses. They give me a feeling of closure... Arthur From ch-rucksack at bobobeach.com Tue Jan 16 09:08:43 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Tue, 16 Jan 2007 01:08:43 -0800 Subject: [rucksack-devel] Re: first commit ready In-Reply-To: References: Message-ID: Ok, we're at 0.1.4 now, with 0.1.x corresponding to 1-4 below. >> 1. the slot-definition unique stuff >> >> 2. change the max b-tree size to 32 >> >> 3. use binary search in the btree lookup stuff >> >> 4. a keyword arg (:inhibit-gc nil) to with-transaction that can be >> used to inhibit GC (and corresponding support in transactions.lisp) On Jan 16, 2007, at 12:13 AM, Arthur Lemmens wrote: > Yes, very good. Thanks for doing this. It's very nice to have > someone like you pick up the thread. My pleasure! Thanks for getting the ball started and doing most of the hard work. >> and, if so, do you want them as separate patches? > > Yes, I think that would be slightly better. But it's no big deal. done > BTW, did you run some of the btree tests in test.lisp to check that > the binary search doesn't break something? Ah, I had looked at them a bit, but haven't been running them. Following up on Attila's comment, whether it's with 5AM or RT or hand- coded, it would be nice to have a standard test suite. Who knows, perhaps it's there already, but if so, I'm not sure how to invoke it. I see a few functions that can be called manually, but it would be nice to, for instance, do (asdf:oos 'asdf:test-op 'rucksack) and have things just work. > And are you aware that the btree-stress-test function will sometimes > fail if you run it for a long time? (I think the failure indicates > that there's a very rarely occurring bug in the btree deletion > routines. > Finding and fixing this bug is on the top of my list of things to > do to > make Rucksack more reliable.) No, I wasn't aware of this. Do we have a TRAC or other bug tracker set up for tracking things like this? Thanks, Cyrus From ch-rucksack at bobobeach.com Tue Jan 16 09:25:50 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Tue, 16 Jan 2007 01:25:50 -0800 Subject: [rucksack-devel] bug in rucsack-remove-class-index? Message-ID: <989E770C-0D12-4AB2-9E75-7A56F3CA0039@bobobeach.com> I think there's a bug in here: (defmethod rucksack-remove-class-index ((rucksack standard-rucksack) class &key (errorp nil)) (unless (symbolp class) (setq class (class-name class))) (handler-bind ((btree-deletion-error ;; Translate a btree error to something that makes more sense ;; in this context. (lambda (error) (declare (ignore error)) (simple-rucksack-error "Class index for ~S doesn't exist in ~A." class rucksack)))) (btree-delete-key class :if-does-not-exist (if errorp :error :ignore)))) I think the btree-delete-key at the end is wrong as that takes a btree and a key as required args, instead of class. Not sure what the fix is, but this looks bogus. Cyrus From alemmens at xs4all.nl Tue Jan 16 09:32:54 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Tue, 16 Jan 2007 10:32:54 +0100 Subject: [rucksack-devel] bug in rucsack-remove-class-index? In-Reply-To: <989E770C-0D12-4AB2-9E75-7A56F3CA0039@bobobeach.com> References: <989E770C-0D12-4AB2-9E75-7A56F3CA0039@bobobeach.com> Message-ID: Cyrus Harmon wrote: > (btree-delete-key class > :if-does-not-exist (if errorp :error :ignore)))) > > I think the btree-delete-key at the end is wrong as that takes a > btree and a key as required args, instead of class. Not sure what the > fix is, but this looks bogus. Yes. I haven't looked at this for very long, but it should probably be (btree-delete-key (class-index-table rucksack) class :if-does-not-exist (if errorp :error :ignore)) Arthur From ch-rucksack at bobobeach.com Tue Jan 16 09:47:32 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Tue, 16 Jan 2007 01:47:32 -0800 Subject: [rucksack-devel] thoughts on deferring GC Message-ID: <7BDA2663-8D00-4375-9A93-40CE63D77204@bobobeach.com> So, in order to get my persistent object creation stuff working properly, I've had to disable rucksack's GC-ing. This brings up a couple questions: 1. Do we want an interface for doing GC at some point? I see collect- some-garbage, which is called from with-transaction, but only to GC an amount proportional to the amount of space allocated in that transaction. How do we go about recovering the additional space from the transactions with :gc-inhibit t? 2. Do we ever reclaim the disk space, besides just freeing the blocks up for later use? 3. The GC code seems to give SBCL fits. I'm a bit concerned about the fake_foreign_call falling through stuff, which is basically a SIGILL or SIGBUS in SBCL. These occasionally happen under rare circumstances with SBCL, but GC, both during the transaction, and when done after the fact, seem to greatly increase the likelihood of this happening. Either we're doing something funky or we're tickling an SBCL bug that's causing it to go haywire. Thanks, Cyrus From ch-rucksack at bobobeach.com Tue Jan 16 09:56:47 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Tue, 16 Jan 2007 01:56:47 -0800 Subject: [rucksack-devel] Re: first commit ready In-Reply-To: References: Message-ID: On Jan 16, 2007, at 12:13 AM, Arthur Lemmens wrote: > And are you aware that the btree-stress-test function will sometimes > fail if you run it for a long time? (I think the failure indicates > that there's a very rarely occurring bug in the btree deletion > routines. > Finding and fixing this bug is on the top of my list of things to > do to > make Rucksack more reliable.) do you mean in the code to delete btrees or the code to remove objects from btrees (surely the later seems more error-prone)? Thanks, Cyrus From alemmens at xs4all.nl Tue Jan 16 09:57:33 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Tue, 16 Jan 2007 10:57:33 +0100 Subject: [rucksack-devel] Re: first commit ready In-Reply-To: References: Message-ID: > do you mean in the code to delete btrees or the code to remove > objects from btrees (surely the later seems more error-prone)? The code to remove objects from btrees. From alemmens at xs4all.nl Tue Jan 16 10:06:15 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Tue, 16 Jan 2007 11:06:15 +0100 Subject: [rucksack-devel] thoughts on deferring GC In-Reply-To: <7BDA2663-8D00-4375-9A93-40CE63D77204@bobobeach.com> References: <7BDA2663-8D00-4375-9A93-40CE63D77204@bobobeach.com> Message-ID: Cyrus Harmon wrote: > 1. Do we want an interface for doing GC at some point? Yes, I think so. We'd probably want at least the following: - turn it off completely - run one complete round of GC - set the ratio between amount of work / amount of allocated - (maybe) specify some kind of maximum heap size. If specified, the heap may never grow beyond this size. > How do we go about recovering the additional space from > the transactions with :gc-inhibit t? If you turn GC on afterwards, sooner or later it will recover the additional space. Probably ;-) > 2. Do we ever reclaim the disk space, besides just freeing the blocks > up for later use? Not in this version of the GC: it's a mark and sweep collector that doesn't compact the heap. I've written most of a copying collector, but I haven't committed that to CVS yet because it isn't finished. But in the long run, I think Rucksack should move to a copying GC. It's safer, simpler and it compacts the heap as part of the copying process. > 3. The GC code seems to give SBCL fits. I'm a bit concerned about the > fake_foreign_call falling through stuff, which is basically a SIGILL > or SIGBUS in SBCL. These occasionally happen under rare circumstances > with SBCL, but GC, both during the transaction, and when done after > the fact, seem to greatly increase the likelihood of this happening. > Either we're doing something funky or we're tickling an SBCL bug > that's causing it to go haywire. Hmm. If you finish the copying GC (should be relatively simple, I think that all the basic work is already done), you could try that one instead. I can send you the code if you're interested. Arthur From alemmens at xs4all.nl Tue Jan 16 10:17:01 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Tue, 16 Jan 2007 11:17:01 +0100 Subject: [rucksack-devel] Export In-Reply-To: References: Message-ID: "Levente M?sz?ros wrote: > While I was thinking of the export machinery I came to the following > idea. The algorithm traverses the whole database (optionally only > partial) and dumps it to several lisp source files. Loading and > running those lisp files will import the database into another > rucksack. Yes, I think that's possible. I have the feeling that actually creating Lisp source files (that must be compiled/interpreted) is more complicated than necessary, but it's definitely possible. > Storing the object -> identity during export Every persistent object already has a unique ID, so you could just use that during export. > and idenity -> object mapping during import The object table is basically an ID -> object mapping. Maybe you could use that during import, I'm not sure. > I don't know if there's any support for persistent hashtables which > are not saved/loaded at once but lazily (since they do not fit into > memory)? No, not at the moment. Would be an interesting addition to Rucksack, though. > Am I right that btrees are sufficient to make a map like a hashtable? Yes, in the sense that btrees provide a mapping. No, in the sense that btrees do not have the typical hash table characteristics. > Is it allowed to put persistent objects into a btree as keys? Yes. If you do that, you may need to provide some or all of the :KEY<, :VALUE=, :KEY-KEY and :VALUE-KEY initargs. Arthur From levente.meszaros at gmail.com Tue Jan 16 11:23:25 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 12:23:25 +0100 Subject: [rucksack-devel] Some other things In-Reply-To: References: Message-ID: On 1/15/07, Arthur Lemmens wrote: > > planned to support weak references? > > I'm not sure what that would mean in the context of Rucksack. Could > you explain what this would mean and why you would want them? I thought it could mean the same thing it means for in memory objects. Objects referred only by weak references will be garbage collected and the weak references will be set to nil. For example one can have an instance list for a particular class. This allows fast iteration over instances and weakness means the object will not be kept unless it is refererred by some other objects too. I think there are more complicated examples where weak references help building efficient application specific index structures without disturbing when objects are garbage collected. > > I implemented the serialization of structures using SBCL, this was > > fairly trivial dispatching on structure-object. I did not look at > > other implementations but will you include such patches? > I'd like Rucksack files to be portable between CL implementations, so > you can create a rucksack with one implementation and load it with another > implementation later. I'd also like Rucksack files to be self-describing, > so that it's possible to recreate all persistent data even if you don't > have the source of the program that created the data. These are nice goals I think but I still think being able to serialize structures is almost a must. There are libraries out there using structures and one does not want to rewrite those to use classes so that data can be serialized. So one either does not use rucksack or patches it. To mention one such library there is local-time which uses a structure to store the date/time and I guess it will not change in the near future so comparing the structure definitions is not that important in this case. I think If structures cannot fullfill the above goals it's still better to document it and let the user decide what to do than not having structures at all. > But if you can implement this in SBCL and can show that it will be possible > to implement in a few other implementations, I'll gladly add it to Rucksack. Probably it's doable in a platform specific way but to make this work on several platforms is too much for me to do. > > I also noticed that some slots (rucksack/transaction/etc.) of > > persistent classes are persistent-effective-slot-definitions even > > though they should not be. I guess this can be fixed in > > direct-slot-definition-class and effective-slot-definition-class by > > checking on the persistence flag. > > I don't really understand what you mean. Could you be more specific? RS-TEST> (sb-pcl::class-slots (find-class 'p-test)) (# # #) These should not be persistent slots but standard slots and the standard svuc is sufficient for them. It mighe even be faster. I think I'm going to give a try to send a patch of this. > > I tried to implement a persistent base class called audited-object > > which stores the time of creation, the time of last modification, last > > modifier, etc. In principle an :after method on (setf > > slot-value-using-class) could make it, but unfortunately I could not > > distinguish between code setting the persistent slot outside of > > rucksack and code internal to rucksack itself. Maybe rucksack could > > use a function slot-value-using-class-internal which would bind a > > special variable so that overriding svuc will be able to distinguish > > between loading the object from the disk and setting a slot value by > > the user. > > Yes, interesting idea. I'd be inclined to create a new metaclass, > say AUDITED-PERSISTENT-CLASS which inherits from PERSISTENT-CLASS, and > define a new SETF SLOT-VALUE-USING-CLASS method for classes of that > metaclass. Your new method could then do tricks with special variables > (if that's really necessary). I think this is certainly not a meta-class feature it's doable much simpler than that. (defentity audited-object () ((created-at (now) :type local-time) (last-modified-at :type local-time))) (defmethod (setf slot-value-using-class) :after (new-value (class persistent-class) (object audited-object) (slot persistent-effective-slot-definition)) (let ((slot-name (slot-definition-name slot))) (unless (or (eq 'created-at slot-name) (eq 'last-modified-at slot-name)) (setf (last-modified-at-of object) (now))))) The only problem is when setf svuc called I cannot decide whether it is called by user code or rucksack itself. I think I'm going to try to make a patch for this one too. > You mean you don't like the extra pair of parentheses after > WITH-TRANSACTION? Like > > (with-transaction () > (foo) > (bar)) > > I love that empty pair of parentheses. They give me a feeling of > closure... Fine, it's not that important and I can always have my own version. I just though writing the () 99 percent of the time is really superfluous. Well that's what LISP really is, you know the famous Lots of ... ;-) Something similar is when calling add-rucksack-root, rucksack-roots, etc. those functions require to pass in a rucksack. Which is fine for at least two reasons: because they need one and they want to dispatch on it. On the other hand 99 percent of the time I guess this will be *rucksack* which could be the default. I know that it is not doable in a generic method if you want to dispatch on it, so it's a naming issue again. How do we call those functions? I could only come up again with the add-rucksack-root and add-rucksack-root* which I doubt you are going to like too much. Cheers, levy -- There's no perfectoin From levente.meszaros at gmail.com Tue Jan 16 11:35:32 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 12:35:32 +0100 Subject: [rucksack-devel] Export In-Reply-To: References: Message-ID: > > running those lisp files will import the database into another > > rucksack. > > Yes, I think that's possible. I have the feeling that actually creating > Lisp source files (that must be compiled/interpreted) is more complicated > than necessary, but it's definitely possible. I think it has many advantages over other proprietary formats (not even talking about XML) - a lisper is already familiar with the syntax and semantics - the format is very concise - one can modify an exported database and use the full power of lisp - one doesn't need to write an interpreter/loader which imports that data because it's already present in the vm Of course there might be some disadvantages which I'm not aware of. > > Storing the object -> identity during export > > Every persistent object already has a unique ID, so you could just use that > during export. Well you are right I could write that to the file. > > and idenity -> object mapping during import > > The object table is basically an ID -> object mapping. Maybe you could use > that during import, I'm not sure. This will not work. When the objects are recreated I cannot determine the object id so I need a mapping from the one stored in the export file to the one present in the rucksack where I am importing to. > > Am I right that btrees are sufficient to make a map like a hashtable? > > Yes, in the sense that btrees provide a mapping. No, in the sense that > btrees do not have the typical hash table characteristics. Could you be more specific on which characteristics do you think of? Cheers, levy -- There's no perfectoin From levente.meszaros at gmail.com Tue Jan 16 11:50:54 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 12:50:54 +0100 Subject: [rucksack-devel] Tests Message-ID: Hi, What's the proper way to run all tests on rucksack? Is there a test-all or something? If you are considering using a test suite maybe it's worth looking at http://common-lisp.net/project/stefil/ which is a very simple (nothing more than a deftest and an is macro) but yet powerful test suite. Cheers, levy -- There's no perfectoin From levente.meszaros at gmail.com Tue Jan 16 12:22:13 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 13:22:13 +0100 Subject: [rucksack-devel] Line terminators are ^M Message-ID: Hi, I think there is something strange with the files on the server or I might be missing something, because when I do a cvs checkout I get ^M line terminators instead of the normal 0xA on linux. I use cvs for several other projects and this problem did never show up before. Did you try to do a checkout on a linux box? IIRC the windows cvs client likes to check in files with 0xD 0xA which causes this on linux. I looked at the files on the server with hexdump and there are 0xD 0xA line terminators under the cvsroot. Is there any client side setting I could turn on so that my linux cvs client converts those line terminators to 0xA? I could not find one. Cheers, levy -- There's no perfectoin From attila.lendvai at gmail.com Tue Jan 16 12:46:18 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Tue, 16 Jan 2007 13:46:18 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: Message-ID: > IIRC the windows cvs client likes to check in files with 0xD 0xA which especially cvsnt which was a real PITA as i remember... -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From levente.meszaros at gmail.com Tue Jan 16 15:41:42 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 16:41:42 +0100 Subject: [rucksack-devel] MOP Message-ID: Hi, The following patch separates persistent and standard slots in a persistent-class instance during class finalization based on the :persistence slot parameter. It also further specializes svuc and friends on persistent-class and friends. The patch allows subclassing non persistent classes and even non standard classes by a persistent class in a more flexible way and mixing meta class behaviour with other meta classes. Basic tests were passed. RS-TEST> (defclass foo () ((foo-slot))) # RS-TEST> (defclass bar (foo) ((bar-slot)) (:metaclass persistent-class)) # RS-TEST> (class-slots (find-class 'foo)) (#) RS-TEST> (class-slots (find-class 'bar)) (# # # # #) The patch consists of the following: - removed the persistence slot from the persistent-slot-mixin because a slot will be persistent iff it is of type persistent-slot-mixin - modified copy-slot-definition and slot-definition-equal accordingly - added preprocessing code to initialize- and reinitialize-instance around persistent-class which adds the default :persistence t parameter to direct slot specifications if not specified and removes it if :persistence nil is specified - modified direct-slot-definition-class and effective-slot-definition-class to return persistent classes only when persistence is t - modified compute-effective-slot-definition so that it does not set slot-persistence any more - modified persistent-object so that it does not send :index nil to non persistent slots since it is not understood by standard-slot-definition - modified slot-value-using-class and (setf slot-value-using-class) and slot-makunbound-using-class to be primary methods instead of around methods dispatching on persistent-class, persistent-object and persistent-effective-slot-definition because this is more flexible when merging the persistent mop with other mop classes such as computed-class, etc. - refactored lispworks specific slot-value-using-class and friends to support when the non standard slot name is given instead of a real slot object. this has to be verified since I do not have a lispworks environment here Cheers, levy -- There's no perfectoin -------------- next part -------------- A non-text attachment was scrubbed... Name: mop.patch Type: text/x-patch Size: 16135 bytes Desc: not available URL: From attila.lendvai at gmail.com Tue Jan 16 16:53:53 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Tue, 16 Jan 2007 17:53:53 +0100 Subject: [rucksack-devel] MOP In-Reply-To: References: Message-ID: > Basic tests were passed. fyi, if someone tries it out it worked fine for me, but sbcl seems to have some weird bug probably related to some pcl caching. when loading rucksack with slime the slime debugger pops up in (setf svuc). but after C-c C-c'ing it and rerunning test-basics it works fine. -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From levente.meszaros at gmail.com Tue Jan 16 17:19:27 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 18:19:27 +0100 Subject: [rucksack-devel] MOP In-Reply-To: References: Message-ID: Same with me, I thought this is local because it worked without this trick from direct SBCL. levy On 1/16/07, Attila Lendvai wrote: > > Basic tests were passed. > > fyi, if someone tries it out it worked fine for me, but sbcl seems to > have some weird bug probably related to some pcl caching. when loading > rucksack with slime the slime debugger pops up in (setf svuc). but > after C-c C-c'ing it and rerunning test-basics it works fine. > > -- > - attila > > "- The truth is that I've been too considerate, and so became > unintentionally cruel... > - I understand. > - No, you don't understand! We don't speak the same language!" > (Ingmar Bergman - Smultronst?llet) > -- There's no perfectoin From levente.meszaros at gmail.com Tue Jan 16 19:44:36 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Tue, 16 Jan 2007 20:44:36 +0100 Subject: [rucksack-devel] Tests Message-ID: Hi, We have started to work on a test suite for rucksack. Please find the first version attached. I know that it introduces some new dependency (namely stefil and its dependencies) at least for the test system. Maybe tests could go into a separate directory so dependencies could cause less headache. I had to modify the test package and the asd file too. (defpackage :rucksack-test (:nicknames :rs-test) (:use :common-lisp :rucksack #+allegro :mop #+lispworks :clos #+sbcl :sb-mop #+openmcl :openmcl-mop)) You can use the test suite as follows: CL-USER> (asdf:oos 'asdf:test-op :rucksack) ............................................................. NIL or RS-TEST> (rucksack-test-suite) ............................................................. # RS-TEST> or RS-TEST> (serialize-deserialize 12) . T # RS-TEST> (with-transaction/rollback) .. T # etc... When tests fail you can use the debugger and the inspector. Cheers, levy -- There's no perfectoin -------------- next part -------------- A non-text attachment was scrubbed... Name: unit-test.lisp Type: application/octet-stream Size: 6496 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rucksack.asd Type: application/octet-stream Size: 1330 bytes Desc: not available URL: From ch-rucksack at bobobeach.com Tue Jan 16 22:34:31 2007 From: ch-rucksack at bobobeach.com (Cyrus Harmon) Date: Tue, 16 Jan 2007 14:34:31 -0800 Subject: [rucksack-devel] thoughts on deferring GC In-Reply-To: References: <7BDA2663-8D00-4375-9A93-40CE63D77204@bobobeach.com> Message-ID: <6C16D3EE-3905-4845-BB55-96DC733CDA81@bobobeach.com> On Jan 16, 2007, at 2:06 AM, Arthur Lemmens wrote: > Hmm. If you finish the copying GC (should be relatively simple, I > think that all the basic work is already done), you could try that > one instead. I can send you the code if you're interested. I don't think I'll have time to work on that anytime soon, but I would be interested in at least looking at the code. Perhaps this would be a good use for a CVS branch? Cyrus From attila.lendvai at gmail.com Thu Jan 18 13:50:00 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Thu, 18 Jan 2007 14:50:00 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: Message-ID: > > IIRC the windows cvs client likes to check in files with 0xD 0xA which this issue is a bigger headache for us then you might think. so to help all i can, i've created a script that someone with cvs commit rights could easily run on common-lisp.net to remove the carriage return's. cvs -d :local:/project/rucksack/cvsroot co rucksack cd rucksack ~alendvai/bin/delete-carriage-returns.sh *.lisp *.txt hth, -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From gwking at metabang.com Thu Jan 18 14:02:34 2007 From: gwking at metabang.com (Gary King) Date: Thu, 18 Jan 2007 09:02:34 -0500 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: Message-ID: <74F8ABE8-7638-47FB-BDB8-A2550E891E1C@metabang.com> I thought that CVS checkout was supposed to bring the files in with the line endings of the current architecture... On Jan 18, 2007, at 8:50 AM, Attila Lendvai wrote: >> > IIRC the windows cvs client likes to check in files with 0xD 0xA >> which > > this issue is a bigger headache for us then you might think. so to > help all i can, i've created a script that someone with cvs commit > rights could easily run on common-lisp.net to remove the carriage > return's. > > cvs -d :local:/project/rucksack/cvsroot co rucksack > cd rucksack > ~alendvai/bin/delete-carriage-returns.sh *.lisp *.txt > > hth, > > -- > - attila > > "- The truth is that I've been too considerate, and so became > unintentionally cruel... > - I understand. > - No, you don't understand! We don't speak the same language!" > (Ingmar Bergman - Smultronst?llet) > _______________________________________________ > rucksack-devel mailing list > rucksack-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/rucksack-devel -- Gary Warren King, metabang.com Cell: (413) 885 9127 Fax: (206) 338-4052 gwkkwg on Skype * garethsan on AIM From attila.lendvai at gmail.com Thu Jan 18 14:15:22 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Thu, 18 Jan 2007 15:15:22 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: <74F8ABE8-7638-47FB-BDB8-A2550E891E1C@metabang.com> References: <74F8ABE8-7638-47FB-BDB8-A2550E891E1C@metabang.com> Message-ID: > I thought that CVS checkout was supposed to bring the files in with > the line endings of the current architecture... i tought that, too. maybe that only happens when the line terminators in the files on the server are only 0x0a? all i know is that checking out rucksack both on my linux box and common-lisp.net ends up with 0x0d 0x0a line terminators and converting the repo with tailor also includes the 0x0d terminators. looking at the files directly on the cvs server shows that there are 0x0d's checked in and a cvs commit after the conversion want to change all the files. and looking at the output of "cvs --help" does not suggest anything useful here. -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From levente.meszaros at gmail.com Thu Jan 18 15:06:06 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Thu, 18 Jan 2007 16:06:06 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: <74F8ABE8-7638-47FB-BDB8-A2550E891E1C@metabang.com> Message-ID: I think since 0x0d 0x0a is present in the server repo on a linux machine there is no reason to drop the 0x0d by a linux cvs client. levy On 1/18/07, Attila Lendvai wrote: > > I thought that CVS checkout was supposed to bring the files in with > > the line endings of the current architecture... > > i tought that, too. maybe that only happens when the line terminators > in the files on the server are only 0x0a? > > all i know is that checking out rucksack both on my linux box and > common-lisp.net ends up with 0x0d 0x0a line terminators and converting > the repo with tailor also includes the 0x0d terminators. > > looking at the files directly on the cvs server shows that there are > 0x0d's checked in and a cvs commit after the conversion want to change > all the files. and looking at the output of "cvs --help" does not > suggest anything useful here. -- There's no perfectoin From alemmens at xs4all.nl Sat Jan 20 18:19:54 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Sat, 20 Jan 2007 19:19:54 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: Message-ID: Attila Lendvai wrote: > this issue is a bigger headache for us then you might think. so to > help all i can, i've created a script that someone with cvs commit > rights could easily run on common-lisp.net to remove the carriage > return's. > > cvs -d :local:/project/rucksack/cvsroot co rucksack > cd rucksack > ~alendvai/bin/delete-carriage-returns.sh *.lisp *.txt Thanks, I've just run this script and committed the result. I'm not sure how I can prevent this from happening in the future. Maybe "cvs admin -k" can help. I'm looking into this. Arthur From attila.lendvai at gmail.com Sun Jan 21 14:32:16 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Sun, 21 Jan 2007 15:32:16 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: Message-ID: > Thanks, I've just run this script and committed the result. thanks a lot! i've merged this with our local changes that you can find at: http://common-lisp.net/cgi-bin/darcsweb/darcsweb.cgi?r=cl-wdim-rucksack;a=summary or darcs get http://www.common-lisp.net/project/cl-wdim/darcs/rucksack/ we can send plain-text diff's of any/all of these if requested, and you can darcs pull/unpull them individually to apply them. please take a look at the darcsweb interface and feel free to request the patches that you would like in the official. (bah, that's some strange english, no pun intended or anything... :) alternatively i can offer to merge them back individually in cvs upon request, my user on cl.net is alendvai. imho all the patches you can find there are good enough and useful enough for inclusion in the official, but YMMV. > I'm not sure how I can prevent this from happening in the future. > Maybe "cvs admin -k" can help. I'm looking into this. it's not a question of life or death, it only causes more merge conflicts then necessary... as of "cvs admin -k", i don't think you need anything like that. if all's well, now a cvs co or cvs up on a windoze box should either produce files with simple 0x0a line endings and then it's ignoring it completly, or it should produce 0x0d 0x0a line endings in which case it's most probably also converting it back at a commit. either case, i'll report back if some of the commits contain 0x0d 0x0a line endigs, so we can further investigate this. hth, -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From alemmens at xs4all.nl Mon Jan 22 00:04:44 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 22 Jan 2007 01:04:44 +0100 Subject: [rucksack-devel] MOP In-Reply-To: References: Message-ID: Levente M?sz?ros wrote: > The following patch separates persistent and standard slots in a > persistent-class instance during class finalization based on the > :persistence slot parameter. It also further specializes svuc and > friends on persistent-class and friends. Thanks, but unfortunately your patch breaks Rucksack on Lispworks (both 4.6 and 5.0). I spent half an hour trying to debug this, but couldn't find the bug in that time. I think the idea behind your patch is OK (although it has a rather low priority for me personally). So if you can come up with a version that works on Lispworks too (bonus points for including Allegro), I'll gladly merge your patch. But I don't have the time to debug it myself, and I can't accept it as is. By the way: the function you call REMOVE-KEYWORDS is called SANS in Rucksack (thanks to Erik Naggum and Edi Weitz). And calling it as you did, like: (remove-keywords args :direct-slots) doesn't actually do anything, so it's probably not what you intended. Arthur From levente.meszaros at gmail.com Mon Jan 22 08:54:47 2007 From: levente.meszaros at gmail.com (=?ISO-8859-1?Q?Levente_M=E9sz=E1ros?=) Date: Mon, 22 Jan 2007 09:54:47 +0100 Subject: [rucksack-devel] MOP In-Reply-To: References: Message-ID: > Thanks, but unfortunately your patch breaks Rucksack on Lispworks > (both 4.6 and 5.0). I spent half an hour trying to debug this, but > couldn't find the bug in that time. Ok, will take a look at that thing. > By the way: the function you call REMOVE-KEYWORDS is called SANS in > Rucksack (thanks to Erik Naggum and Edi Weitz). And calling it as > you did, like: > > (remove-keywords args :direct-slots) > > doesn't actually do anything, so it's probably not what you intended. Sans will be ok. I did fix that in the darcs repo in a separate patch, but forgot to include it in the original cvs patch. levy -- There's no perfectoin From attila.lendvai at gmail.com Mon Jan 22 09:10:05 2007 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Mon, 22 Jan 2007 10:10:05 +0100 Subject: [rucksack-devel] MOP In-Reply-To: References: Message-ID: > By the way: the function you call REMOVE-KEYWORDS is called SANS in > Rucksack (thanks to Erik Naggum and Edi Weitz). And calling it as > you did, like: i suggest using a different name then SANS, maybe REMOVE-FROM-PLIST if REMOVE-KEYWORDS is not ok, because SANS is not a too intentional name... :) i usually create a utils.lisp and a duplicates.lisp file in my projects. utils is the place for functions like SANS, so people can look around there first for general utilities. duplicates.lisp is for functionality that is already available in another lib, but to avoid the dependency, it's just copy-pasted with a comment of the origin. (fyi, REMOVE-KEYWORDS is from arnesi) just my 0.02 -- - attila "- The truth is that I've been too considerate, and so became unintentionally cruel... - I understand. - No, you don't understand! We don't speak the same language!" (Ingmar Bergman - Smultronst?llet) From alemmens at xs4all.nl Mon Jan 22 10:25:18 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 22 Jan 2007 11:25:18 +0100 Subject: [rucksack-devel] Some other things In-Reply-To: References: Message-ID: Levente M?sz?ros wrote: > These are nice goals I think but I still think being able to serialize > structures is almost a must. There are libraries out there using > structures and one does not want to rewrite those to use classes so > that data can be serialized. So one either does not use rucksack or > patches it. To mention one such library there is local-time which uses > a structure to store the date/time and I guess it will not change in > the near future so comparing the structure definitions is not that > important in this case. > > I think If structures cannot fullfill the above goals it's still > better to document it and let the user decide what to do than not > having structures at all. Yes, I think you're right. Thanks, I've committed your patch. Arthur From alemmens at xs4all.nl Mon Jan 22 10:34:28 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 22 Jan 2007 11:34:28 +0100 Subject: [rucksack-devel] MOP In-Reply-To: References: Message-ID: Attila Lendvai wrote: > i suggest using a different name then SANS, maybe REMOVE-FROM-PLIST if > REMOVE-KEYWORDS is not ok, because SANS is not a too intentional > name... :) Having the SANS name is a small tribute to Erik Naggum that I like to keep in the Rucksack code. And it means "without" in French, which is a good enough mnemonic for me... Arthur From alemmens at xs4all.nl Mon Jan 22 10:39:28 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 22 Jan 2007 11:39:28 +0100 Subject: [rucksack-devel] Line terminators are ^M In-Reply-To: References: Message-ID: Attila Lendvai wrote: > i'll report back if some of the commits contain 0x0d 0x0a line endigs, > so we can further investigate this. I think the problem should be solved now. But if there are any problems with the commit I made a few minutes ago, let me know. Thanks, Arthur From alemmens at xs4all.nl Mon Jan 22 10:58:08 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 22 Jan 2007 11:58:08 +0100 Subject: [rucksack-devel] sbcl compile warning In-Reply-To: References: Message-ID: Cyrus Harmon wrote: > ; caught STYLE-WARNING: > ; reading an ignored variable: BLOCK > > ; compiling (DEFMETHOD HEAP-INFO ...) > > ; file: /Users/sly/src/lisp/rucksack/rucksack/heap.lisp > ; in: DEFMETHOD HEAP-INFO (FREE-LIST-HEAP) > ; (GETF RUCKSACK::PLIST :NR-FREE-OCTETS) > ; --> BLOCK DO BLOCK LET TAGBODY RETURN-FROM PROGN > ; ==> > ; SB-IMPL::DEFAULT > ; > ; caught WARNING: > ; This is not a NUMBER: > ; NIL Thanks. Both warnings should be fixed now (version 0.1.7), but I haven't tested this with SBCL. Arthur From alemmens at xs4all.nl Mon Jan 22 11:04:01 2007 From: alemmens at xs4all.nl (Arthur Lemmens) Date: Mon, 22 Jan 2007 12:04:01 +0100 Subject: [rucksack-devel] Some other things In-Reply-To: References: Message-ID: Levente M?sz?ros wrote: > Something similar is when calling add-rucksack-root, rucksack-roots, > etc. those functions require to pass in a rucksack. Which is fine for > at least two reasons: because they need one and they want to dispatch > on it. On the other hand 99 percent of the time I guess this will be > *rucksack* which could be the default. I know that it is not doable in > a generic method if you want to dispatch on it, so it's a naming issue > again. How do we call those functions? I could only come up again with > the add-rucksack-root and add-rucksack-root* which I doubt you are > going to like too much. I think that functions like ADD-RUCKSACK-ROOT and RUCKSACK-ROOTS should not be called very often in 'normal' user code. So it's probably not worth it to create separate versions that give a default for one of the arguments. If you do happen to call them very often in your program, you can always create your own special versions. But I don't think it's a good idea to add them to Rucksack. Arthur From uvl at htg1.de Sun Jan 28 15:53:01 2007 From: uvl at htg1.de (Uwe von Loh) Date: Sun, 28 Jan 2007 16:53:01 +0100 Subject: [rucksack-devel] Practical ways of unlinking an instance from indexes Message-ID: <45BCC6DD.3010001@htg1.de> I try to understand rucksack since several weeks now, but could not get the mechanism of deleting an instance. I tryed several approaches. The first was already mentioned in the list: remove the instance from all indexes by index-delete. That works for me in the case of class indexes but not for the slot indexes. Maybe I misunderstood the relationships between object-id and classes or slots. My second (unsuccessful) approach was to slot-makunbound the slots of an instance. Can anybody help me with a short working example for that? The third way to remove an instance was to change its class to an empty class (defclass nothingness ()()) as shown by Eric Naggum on cll. This way would be great, to remove an instance from all indexes at once but it didn't work for me either. Please give me some hints and examples on removing instances? Thanks for your patience. Uwe From uvl at htg1.de Wed Jan 31 13:01:37 2007 From: uvl at htg1.de (Uwe von Loh) Date: Wed, 31 Jan 2007 14:01:37 +0100 Subject: [rucksack-devel] Practical ways of unlinking an instance from indexes, more info In-Reply-To: <45BCC6DD.3010001@htg1.de> References: <45BCC6DD.3010001@htg1.de> Message-ID: <45C09331.7030903@htg1.de> OK, I abandoned the idea with change-class or slot-makunbound in rucksack and came back to index-delete. This works for the class index but not for the slot indexes. As I understand slot-indexes they are btrees with the instance slot value as keys and the instance object-id as values. So what is wrong with (index-delete (rucksack-slot-index rs 'user 'pw) (pw usr) (object-id usr)) ??? Uwe 8-) Here is the code and the backtrace. (Sorry, I'm a novice). ====================================================================== This is the lisp file: ====================================================================== (in-package :rucksack) (eval-when (:compile-toplevel :load-toplevel :execute) (defparameter *uwes-rs* #p"/home/uvl/cl/src/ht/rs/") (with-rucksack (rs *uwes-rs*) (with-transaction () (defclass user () ((name :initarg :name :accessor name :index :case-insensitive-string-index) (pw :initarg :pw :accessor pw :index :string-index)) (:metaclass persistent-class) (:index t)))) ) (defun show-class (class-name) (with-rucksack (rs *uwes-rs*) (with-transaction () (rucksack-map-class rs class-name #'print)))) (defun show-user-with-password (value) (with-rucksack (rs *uwes-rs*) (with-transaction () (rucksack-map-slot rs 'user 'pw (lambda (usr) (format t "~A has password: ~A.~%" (name usr)(pw usr))) :equal value)))) (defun nuke-user-named (user-name) (with-rucksack (rs *uwes-rs*) (with-transaction () (rucksack-map-slot rs 'user 'name (lambda (x)(rm-usr-from-indexes x)) :equal user-name)))) (defun rm-usr-from-indexes (usr) (with-rucksack (rs *uwes-rs*) (with-transaction () ;;this works quite well (index-delete (rucksack-class-index rs 'user) (object-id usr) (object-id usr)) ;;removing from slot indexes doesn't work: (index-delete (rucksack-slot-index rs 'user 'pw) (pw usr) (object-id usr)) (index-delete (rucksack-slot-index rs 'user 'name) (name usr) (object-id usr))))) ;; From the source code just to keep in mind: Slot indexes are... ;; ... "A btree mapping class names to slot index tables, where each ;; slot index table is a btree mapping slot names to slot indexes. ;; Each slot index maps slot values to object ids." ====================================================================== I create three instances of 'user and try to remove one of them from all three indexes. ====================================================================== RS> (with-rucksack (rs *uwes-rs*) (with-transaction () (make-instance 'user :name "bob" :pw "bobpw" :rucksack rs))) #> T RS> (with-rucksack (rs *uwes-rs*) (with-transaction () (make-instance 'user :name "uwe" :pw "uwepw" :rucksack rs))) #> T RS> (with-rucksack (rs *uwes-rs*) (with-transaction () (make-instance 'user :name "jim" :pw "jimpw" :rucksack rs))) #> T RS> (show-class 'user) #> #> #> NIL T RS> (nuke-user-named "uwe") ====================================================================== Gives following error ====================================================================== Argument X is not a REAL: NIL [Condition of type SIMPLE-TYPE-ERROR] Argument X is not a REAL: NIL [Condition of type SIMPLE-TYPE-ERROR] Restarts: 0: [ABORT] Abort # 1: [RETRY] Retry # 2: [ABORT] Abort # 3: [RETRY] Retry # 4: [ABORT-REQUEST] Abort handling SLIME request. 5: [TERMINATE-THREAD] Terminate this thread (#) Backtrace: 0: (SB-KERNEL:TWO-ARG-< NIL 1) 1: (REMOVE-KEY # "uwepw") Locals: SB-DEBUG::ARG-0 = # SB-DEBUG::ARG-1 = "uwepw" 2: (LEAF-DELETE-KEY #> # (# NIL) "uwepw" :IGNORE) Locals: SB-DEBUG::ARG-0 = #> SB-DEBUG::ARG-1 = # SB-DEBUG::ARG-2 = (# NIL) SB-DEBUG::ARG-3 = "uwepw" SB-DEBUG::ARG-4 = :IGNORE 3: ((SB-PCL::FAST-METHOD BTREE-DELETE (BTREE #1="#<...>" . #1#)) (#(NIL) . #()) # #> "uwepw" 67 :IF-DOES-NOT-EXIST :IGNORE) Locals: SB-PCL::.PV-CELL. = (#(NIL) . #()) BTREE = #> #:IF-DOES-NOT-EXIST-DEFAULTING-TEMP = :IGNORE KEY = "uwepw" VALUE = 67 4: (RM-USR-FROM-INDEXES #>) Locals: SB-DEBUG::ARG-0 = #> 5: (P-MAPC # #>) Locals: SB-DEBUG::ARG-0 = # SB-DEBUG::ARG-1 = #> 6: ((LABELS MAP-SLOT) #) Locals: SB-DEBUG::ARG-0 = # 7: ((SB-PCL::FAST-METHOD RUCKSACK-MAP-SLOT (STANDARD-RUCKSACK #1="#<...>" . #1#)) # # # USER NAME #) Locals: SB-DEBUG::ARG-0 = 8 SB-DEBUG::ARG-1 = : SB-DEBUG::ARG-2 = : SB-DEBUG::ARG-3 = # SB-DEBUG::ARG-4 = USER SB-DEBUG::ARG-5 = NAME SB-DEBUG::ARG-6 = # 8: (NUKE-USER-NAMED "uwe") Locals: SB-DEBUG::ARG-0 = "uwe" 9: (SB-INT:SIMPLE-EVAL-IN-LEXENV (NUKE-USER-NAMED "uwe") #) Locals: SB-DEBUG::ARG-0 = 2 SB-DEBUG::ARG-1 = (NUKE-USER-NAMED "uwe") SB-DEBUG::ARG-2 = # 10: (SWANK::EVAL-REGION "(nuke-user-named \"uwe\") " T) Locals: SWANK::PACKAGE-UPDATE-P = T STRING = "(nuke-user-named \"uwe\") " 11: ((LAMBDA NIL)) 12: ((SB-PCL::FAST-METHOD SWANK-BACKEND:CALL-WITH-SYNTAX-HOOKS (T)) # # #) 13: (SWANK::CALL-WITH-BUFFER-SYNTAX #) 14: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:LISTENER-EVAL "(nuke-user-named \"uwe\") ") #) 15: ((LAMBDA NIL)) 16: ((SB-PCL::FAST-METHOD SWANK-BACKEND:CALL-WITH-DEBUGGER-HOOK (T T)) # # # #) 17: ((LAMBDA NIL)) 18: ((SB-PCL::FAST-METHOD SWANK-BACKEND:CALL-WITH-DEBUGGER-HOOK (T T)) # # # #) 19: (SWANK::CALL-WITH-REDIRECTED-IO # #) 20: (SWANK::CALL-WITH-CONNECTION # #) 21: (SWANK::HANDLE-REQUEST #) 22: ((LAMBDA NIL)) 23: ((LAMBDA NIL)) 24: ((SB-PCL::FAST-METHOD SWANK-BACKEND:CALL-WITH-DEBUGGER-HOOK (T T)) # # # #) 25: (SWANK::CALL-WITH-REDIRECTED-IO # #) 26: (SWANK::CALL-WITH-CONNECTION # #) 27: (SWANK::CALL-WITH-BINDINGS NIL #) 28: ((LAMBDA NIL)) 29: ("foreign function: call_into_lisp") 30: ("foreign function: funcall0") 31: ("foreign function: new_thread_trampoline") 32: ("foreign function: #xB7FCA504")