[elephant-devel] Querying Advice
Daniel Salama
lists at infoway.net
Mon Nov 13 04:17:36 UTC 2006
On Nov 12, 2006, at 3:48 PM, Robert L. Read wrote:
> This requires a philosophical response. In general, I think it
> will be way easier than
> you image, once you have been pointed in the right direction. Take
> my advice with
> a grain of salt.
I certainly hope so. As I have been learning lisp, I have full
confidence that with the proper knowledge and the right guidance,
this is definitely a manageable task.
>
> First of all, ask yourself, what is the size of your dataset? Can
> you fit it all into memory?
> If so, you have the full power of lisp at your command in dealing
> with the querying. You
> will not have to write any macros to do this. You might find the
> DCM package, in the "contrib"
> directory, a useful package, although it does not address querying;
> it is more of a cache handling
> issue. (DCM has only been tested under SBCL, as far as I know.)
In general, I do think that the dataset fits in memory. However, we
have not fully loaded all of our data into Elephant. Simply looking
at the MySQL file storage for the database, it occupies 900MB of disk
space, including indices. However, when we did loads of some of the
tables into Elephant, the size of the Elephant data files were, at
least, 5 times the size. I don't know the reason why. I don't know if
that's the nature of BDB when it stores "arbitrarily" any type of
object in the k,v pair. I don't know if we had some circular
references when we loaded our model and that just simply increased
the data file by that much (although I wouldn't think so, since if
that was the case, I would expect that it only stored references to
objects and not duplicating the objects). Regardless, our dev server
currently has 4GB of RAM. I think that once properly loaded, all the
data should be able to fit in memory.
Now, for the target application, I prefer not to rely in the data
fitting in memory. Reason being is that the nature of the application
requires the data to be available for several years. This 900MB is
the results of only 1 year for one company. As we get more companies
to use the application and keep the data online for several years,
the assumption of the data fitting into memory will no longer be
applicable.
I have been looking into the DCM package and I think that it
certainly looks promising. We haven't used it yet, but certainly hope
that sooner, rather than later, will be made part of Elephant
permanently. I also hope that it's not targeted mainly at the in-
memory database type of application, but rather, as an efficient
caching mechanism for persistent data (regardless of where it's being
permanently stored).
With regards to: "...If so, you have the full power of lisp at your
command in dealing with the querying...", I agree with you. However,
where I'm trying to get at is how "easy" would it be to generate
these type of dynamic queries in a generic way. Of course, we could
always hard code all the cases for each of our different searchable
screens, but the thought of that simply just makes me vomit :)
>
> Under SBCL, when it comes to sorting you have "sort" and "stable-
> sort"; I think these are build in.
> I'll eat a candle if you don't find them to blazingly fast
> (although the predicates that you pass them
> might take some time.)
I thought the answer to my sorting question is exactly addressed by
your comment. I suppose that once I have the resulting dataset, I
could run it by "sort". They key would be how to make it arbitrarily
sortable (in a similar way as the dynamic query)
>
> I think really the only good way to answer this question in a
> deeper way is to provide some
> example code. I do exactly what you are talking about in my
> application (http://konsenti.com),
> (although I use DCM), so I ought to be able to produce an example
> program relatively quickly.
> You'll have to figure out how to map the GUI into those requests
> yourself, however.
>
I believe (and hope) that we should have no problem mapping the GUI
to the requests. Just out of curiosity (and I don't mean to divert
from the topic of this thread): if you're using DCM for your konsenti
(BTW, nice concept) site, how do you protect your in-memory data? Do
you just write an image to disk every once in a while for back ups?
How resilient is this to hardware failure and you loosing data since
the last image (if that's your approach)?
> I'll try to post an example by Monday.
>
Thanks,
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061112/b8bd6c88/attachment.html>
More information about the elephant-devel
mailing list