[elephant-devel] elephant-devel Digest, Vol 5, Issue 12

Tue Oct 28 18:26:44 UTC 2008

i don't know if you guys have looked at cache-oblivious b-trees (and
other data structures) but they seem like the new hotness.

http://supertech.csail.mit.edu/cacheObliviousBTree.html

b

On Tue, Oct 28, 2008 at 11:21 AM,
<elephant-devel-request at common-lisp.net> wrote:
> Send elephant-devel mailing list submissions to
>        elephant-devel at common-lisp.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://common-lisp.net/cgi-bin/mailman/listinfo/elephant-devel
> or, via email, send a message with subject or body 'help' to
>        elephant-devel-request at common-lisp.net
>
> You can reach the person managing the list at
>        elephant-devel-owner at common-lisp.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of elephant-devel digest..."
>
>
> Today's Topics:
>
>   1. Re: Ditching Darcs (Robert Synnott)
>   2. Re: Ditching Darcs (Alex Mizrahi)
>   3. Lisp Backend Architecture (Red Daly)
>   4. Re: Lisp Backend Architecture (Red Daly)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 28 Oct 2008 15:44:51 +0000
> From: "Robert Synnott" <rsynnott at gmail.com>
> Subject: Re: [elephant-devel] Ditching Darcs
> To: "Elephant bugs and development" <elephant-devel at common-lisp.net>
> Message-ID:
>        <24f203480810280844q5c4a25b2nadb515775ef8500a at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On this subject, is there a particular branch that I should be using
> for postmodern? I've noticed occasional issues where it'll complain
> that a table already exists, generally after a non-elephant error has
> occurred within a transaction.
> Rob
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 28 Oct 2008 19:52:53 +0200
> From: "Alex Mizrahi" <killerstorm at newmail.ru>
> Subject: Re: [elephant-devel] Ditching Darcs
> To: elephant-devel at common-lisp.net
> Message-ID: <ge7jho$vk9$1 at ger.gmane.org>
>
>  LPP> I must've missed something or unintentionally used a db
>  LPP> with older stuff already in it.
>
> yep, old format database could be the case if you've got it broken
> at start.
>
> but it seems our current tests are broken, so you get always get some
> error on the first run regardless of backend used.
>
>  LPP> How about the NIL problem with secondary indices, you tried
>  LPP> to tackle that?
>
> not really, so far :(
>
> i thought some more about approach with NULLs and wrote down
> queries and cases needed to handle NULLs -- it appears to be
> more realistic now than it was before, but still pretty hairy, so i've
> abandoned this for a while..
>
> for example, cursor-next uses query
>
> WHERE (qi > $1) OR ((qi = $1) AND (value > $2))
>
> and cursor-prev uses query
>
> WHERE (qi < $1) OR ((qi = $1) AND (value < $2))
>
> being similar, they are made via template "WHERE (qi ~a $1) OR ((qi = $1)
> AND (value ~a $2))" now.
>
> but with nulls that would be four different queries:
>
> cursor-next:
>  key is null:        (qi IS NULL) and (value > $2)
>  key is not null:  (qi > $1) OR (qi IS NULL) OR ((qi = 1) AND (value > $2)
> cursor-prev
>  key is null:        (qi IS NOT NULL) OR ((qi IS NULL) AND (value < $2))
>  key is not null:  (qi < $1) OR ((qi = $1) AND (value < $2))
>
> and taking into account other cursor operations, instead of 3 query
> templates
> we now have something like 12 different queries now, and i see any pattern
> how they can be merged :(
> or maybe it makes sense to ditch templated query generations and just write
> these conditions manually
>
>
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 28 Oct 2008 10:30:22 -0700
> From: "Red Daly" <reddaly at gmail.com>
> Subject: [elephant-devel] Lisp Backend Architecture
> To: "Elephant bugs and development" <elephant-devel at common-lisp.net>
> Message-ID:
>        <f6700f0c0810281030r35a7b0dan3ecb4fb9f42d8970 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I am hoping to clarify some of the prior discussions[1] about the native
> Lisp backend for Elephant and propose a basic architecture.  Hopefully we
> can modularize development so if somebody wants to hack for a few days on
> the backend he can avoid being overwhelmed.
>
> Multiprocess support: What features do the major lisps have for locking and
> concurrency?  What features are we missing from C/linux--what does BDB do in
> this regard that is hard to do in Lisp?  I do not know too much about
> implementing efficient multi-threaded lisp programs where there is a lot of
> contention.
>
> Distribution: Are others interested in making the backend distributed?  I
> prefer the design to allow adding an efficient distributed architecture at a
> later date.
>
> Custom indexing: A lot of discussion has gone into the best implementation
> practice for BTrees.  But what about other indexing structures?
> Multidimensional indexing is relevant to my project now because we have
> spacial information (longitude latitude pairs) we would like to query.
> Right now I am using a kd-tree with nodes made persistent by elephant, but
> usually this sort of index would be implemented by the database itself.  I
> imagine multi-dimensional indexing could improve queries, too.
>
> There are other indexing needs as well.  Document-level search is another
> common case.  I imagine an efficient search library is beyond our capacity,
> but how could we make the database suited to implement search on top of the
> thing?  BTrees are not everything, unless I am missing something.
>
>
> I propose the following basic architecture for the backend:
>
> Logging package with generic undo/redo logging support.  It would be
> customizable but it would not depend on other code from the Lisp backend.  A
> database could plug into the log by implementing undo, redo, and
> serialization functions.  By modularizing logging, it should make it easier
> to hack on it without being familiar with the rest of the backend.  It will
> also be reusable, for what it's worth.
>
> A persistent heap.  Beneath indices and data storage would be a heap-on-disk
> layer with transaction support.  The heap would have methods for reading,
> writing, and locking sequences of bytes.  It would also handle   The
> persistent heap alone would qualify as a database and might be useful as a
> basis for other DB projects or experiments.  Cachine would probably not be
> implemented at this level, though that would be easiest.  It may be possible
> to implement the transactional heavy lifting for the persistent heap and
> make it extensible enough to be used throughout.
>
> B-Trees.  A MOP-based implementation of B-Trees may be customizable enough
> to plug into the the rest of the system with a few additional method
> declarations for the persistent, transactional version.  Specialized methods
> would access the persistent heap for reads and writes.  Locking of the sort
> done by BDB (level 2 etc.) could probably be accomplished in these methods
> specialized for our persistent B-Tree.
>
> Bkd-Trees and other indexing structures could be implemented similarly to
> B-Trees.
>
> Caching.  B-Tree nodes or other lispy objects should be cached as opposed to
> byte arrays.  Caching of B-Tree nodes requires additional multiprocess code,
> so not all the transactional magic happens in the persistent heap layer.
> I'm a little fuzzy on exactly how this will work, but it sounds reasonable
> enough.
>
> The DB user would interact mostly with the B-Tree  and other indexing
> structures, and with transactions.
>
>
> How does this sound for a basic architecture?  The multiprocess details are
> not fleshed out so your comments are appreciated.  I have attached my basic
> implementation of a logger and persistent heap to make this clearer.
>
>
> Best regards,
>
> Red
>
>
> [1]  The most lengthy discussion I can find is here:
> http://common-lisp.net/pipermail/elephant-devel/2008-May/004108.html
> The trac has a page on the Lisp backend:
> http://trac.common-lisp.net/elephant/wiki/LispDataStore
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://common-lisp.net/pipermail/elephant-devel/attachments/20081028/d566669c/attachment-0001.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: persistent-heap.tar.gz
> Type: application/x-gzip
> Size: 7163 bytes
> Desc: not available
> Url : http://common-lisp.net/pipermail/elephant-devel/attachments/20081028/d566669c/attachment-0001.bin
>
> ------------------------------
>
> Message: 4
> Date: Tue, 28 Oct 2008 11:21:52 -0700
> From: "Red Daly" <reddaly at gmail.com>
> Subject: Re: [elephant-devel] Lisp Backend Architecture
> To: "Elephant bugs and development" <elephant-devel at common-lisp.net>
> Message-ID:
>        <f6700f0c0810281121r24a0b2c2xd89292d27729c0b1 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I believe the code I attached was scrubbed. Here is a link instead:
> http://iodb.org/static/persistent-heap/
>
> Red
>
> On Tue, Oct 28, 2008 at 10:30 AM, Red Daly <reddaly at gmail.com> wrote:
>
>> I am hoping to clarify some of the prior discussions[1] about the native
>> Lisp backend for Elephant and propose a basic architecture.  Hopefully we
>> can modularize development so if somebody wants to hack for a few days on
>> the backend he can avoid being overwhelmed.
>>
>> Multiprocess support: What features do the major lisps have for locking and
>> concurrency?  What features are we missing from C/linux--what does BDB do in
>> this regard that is hard to do in Lisp?  I do not know too much about
>> implementing efficient multi-threaded lisp programs where there is a lot of
>> contention.
>>
>> Distribution: Are others interested in making the backend distributed?  I
>> prefer the design to allow adding an efficient distributed architecture at a
>> later date.
>>
>> Custom indexing: A lot of discussion has gone into the best implementation
>> practice for BTrees.  But what about other indexing structures?
>> Multidimensional indexing is relevant to my project now because we have
>> spacial information (longitude latitude pairs) we would like to query.
>> Right now I am using a kd-tree with nodes made persistent by elephant, but
>> usually this sort of index would be implemented by the database itself.  I
>> imagine multi-dimensional indexing could improve queries, too.
>>
>> There are other indexing needs as well.  Document-level search is another
>> common case.  I imagine an efficient search library is beyond our capacity,
>> but how could we make the database suited to implement search on top of the
>> thing?  BTrees are not everything, unless I am missing something.
>>
>>
>> I propose the following basic architecture for the backend:
>>
>> Logging package with generic undo/redo logging support.  It would be
>> customizable but it would not depend on other code from the Lisp backend.  A
>> database could plug into the log by implementing undo, redo, and
>> serialization functions.  By modularizing logging, it should make it easier
>> to hack on it without being familiar with the rest of the backend.  It will
>> also be reusable, for what it's worth.
>>
>> A persistent heap.  Beneath indices and data storage would be a
>> heap-on-disk layer with transaction support.  The heap would have methods
>> for reading, writing, and locking sequences of bytes.  It would also handle
>>   The persistent heap alone would qualify as a database and might be useful
>> as a basis for other DB projects or experiments.  Cachine would probably not
>> be implemented at this level, though that would be easiest.  It may be
>> possible to implement the transactional heavy lifting for the persistent
>> heap and make it extensible enough to be used throughout.
>>
>> B-Trees.  A MOP-based implementation of B-Trees may be customizable enough
>> to plug into the the rest of the system with a few additional method
>> declarations for the persistent, transactional version.  Specialized methods
>> would access the persistent heap for reads and writes.  Locking of the sort
>> done by BDB (level 2 etc.) could probably be accomplished in these methods
>> specialized for our persistent B-Tree.
>>
>> Bkd-Trees and other indexing structures could be implemented similarly to
>> B-Trees.
>>
>> Caching.  B-Tree nodes or other lispy objects should be cached as opposed
>> to byte arrays.  Caching of B-Tree nodes requires additional multiprocess
>> code, so not all the transactional magic happens in the persistent heap
>> layer.  I'm a little fuzzy on exactly how this will work, but it sounds
>> reasonable enough.
>>
>> The DB user would interact mostly with the B-Tree  and other indexing
>> structures, and with transactions.
>>
>>
>> How does this sound for a basic architecture?  The multiprocess details are
>> not fleshed out so your comments are appreciated.  I have attached my basic
>> implementation of a logger and persistent heap to make this clearer.
>>
>>
>> Best regards,
>>
>> Red
>>
>>
>> [1]  The most lengthy discussion I can find is here:
>> http://common-lisp.net/pipermail/elephant-devel/2008-May/004108.html
>> The trac has a page on the Lisp backend:
>> http://trac.common-lisp.net/elephant/wiki/LispDataStore
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://common-lisp.net/pipermail/elephant-devel/attachments/20081028/75c72940/attachment.html
>
> ------------------------------
>
> _______________________________________________
> elephant-devel mailing list
> elephant-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/elephant-devel
>
>
> End of elephant-devel Digest, Vol 5, Issue 12
> *********************************************
>