[vivace-graph-devel] elephant
Kevin Raison
raison at chatsubo.net
Wed Sep 14 05:15:56 UTC 2011
A good paper on linear hashing to disk is attached. I will respond to
Mon key's comments after some sleep...
On 09/13/2011 09:15 AM, Kevin Raison wrote:
> Dan, I actually already tried using elephant at a very early stage in
> the development of VG. While elephant is an excellent library (which I
> have used in many projects), it is simply too slow to be of use here.
> One of the goals of VG is to be fast and to be able to handle billions
> of triples. Elephant slows down very quickly because of a number of
> factors, including its use of BerkeleyDB rather than a native Lisp
> back-end store, as well as its complexity. VG does not need the level of
> complexity or abstraction that you get with elephant's indexes and class
> redefinition logic. In our case, we are dealing with one class, the
> triple, and as such, we can be very specific about how we store and
> index it as well as how we deal with it in memory. Standard b-trees
> simply won't efficiently handle the fanout of a large triple store; we
> need something specifically tuned to our purpose. I am fairly certain
> that linear hashing for triple storage combined with b-tries or fb-trees
> for indexing would do much better. There are other graph dbs out there
> that use this strategy. See
> http://blog.directededge.com/2009/02/27/on-building-a-stupidly-fast-graph-database/
> for some good discussion.
>
> Another goal of mine is to develop a native Lisp back-end that projects
> like elephant might be able to take advantage of; not relying on
> external, non-Lisp libraries is a good thing, especially BerkeleyDB,
> given its terrible licensing terms (thanks, Oracle).
>
> You mention that you had some further thoughts after reading the fractal
> pre-fetching b-trees paper; care to share?
>
> -Kevin
>
>
> On 09/13/2011 07:44 AM, Dan Lentz wrote:
>> I've been thinking about persistent index strategies, and have read
>> through the paper on fpb+trees, and have had a few thoughts.
>>
>> The first and simplest is to make use of elephant. Its not very exotic
>> or course but it would allow a model in which triples can be first class
>> objects, yet leverage a reasonably performant back end (bdb). In
>> addition, the set-valued slots and association slots are nice
>> abstractions on top of which to build the rdf semantics (properties,
>> extensions) on top of a real clos mop.
>>
>> I figured I'd shoot the idea onto the mailing list to get a feel for the
>> degree and nature of agreement/disagreement.
>>
>>
>> _______________________________________________
>> vivace-graph-devel mailing list
>> vivace-graph-devel at common-lisp.net
>> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/vivace-graph-devel
>
> _______________________________________________
> vivace-graph-devel mailing list
> vivace-graph-devel at common-lisp.net
> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/vivace-graph-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e_ds_linearhashing.pdf
Type: application/pdf
Size: 108658 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/vivace-graph-devel/attachments/20110913/f708fe7a/attachment.pdf>
More information about the vivace-graph-devel
mailing list