[elephant-devel] full text search solution?
Robert L. Read
read at robertlread.net
Wed May 14 03:16:58 UTC 2008
About two years ago I tried to use Montezuma and found it very buggy.
It may have gotten better. I had better luck using
"lucene-webeservice":
http://lucene-ws.net/
Which is (obviously) a webservice interface to a normal Lucene
installation. This worked pretty well for me. I used Closure-XML
to construct and deconstruct the web-services.
http://common-lisp.net/project/cxml/
CXML was far and away the best XML manipulator that I tried. One
advantage of this solution is that any improvement to Lucene you get
automatically, since you are really using Lucene, and not a
re-implementation of it. Given the weight behind Lucene, that is going
to be hard to beat.
The webservice was not a problem for the application that I was working
on.
However, once could argue that you don't want the overhead of a
webservice call. I would still be tempted to use or construct a C/file
level interface to Lucene, for the reason that I don't think anyone can
outdo Lucene. Clearly, Lucene-ws could be hacked into Lucene-LISPAPI,
since it already knows how to deal with Lucene (although through
Catalina/Tomcat and so on.).
I suspect that Postgres, with its extensible types, would allow a very
efficient use of full-text searching. I think this is major plus of
postgres (and has been since 7.1, if I'm not mistaken --- though it may
even better now.)
I agree with you that a full-text searching system should NOT be built
into Elephant, on the basis of separation of concerns.
On Tue, 2008-05-13 at 22:02 -0400, Ian Eslick wrote:
> Hello,
>
> Has anyone found a good solution for full text search in lisp? I'm
> interested in indexing website objects such as posts and perhaps
> external documents as well. BDB doesn't, to the best of my
> knowledge, have the appropriate building blocks for an efficient
> indexing system and you certainly don't want to do it on top of the
> current btree interface.
>
> I have an old full text index code base that supported wildcard and
> NEAR queries, all built on top of Elephant btrees. It was convenient
> but had a query time that slowed down linearly with the avg # of
> documents per word.
>
> I've decided that the best approach for me is to connect to a
> separate, probably external, system to which I can incrementally add
> content that will return something I can easily turn into an ordered
> list of OIDs.
>
> Most solutions I've run across require other languages, servers that
> add up to needless complexity for my modest application. In the lisp
> world I've only seen Montezuma, which isn't being developed or
> seriously maintained (unless it's just really stable I'd rather not
> fight with stale code).
>
> I am considering hacking something simple on top of postmodern that
> uses the new text indexing functions of Postgresql 8.3 and wondered if
> anyone here has insight into this application of postmodern or into
> the full-text indexing from lisp problem in general.
>
> Thank you,
> Ian
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list