[elephant-devel] full text search solution?

Wed May 14 03:16:58 UTC 2008

About two years ago I tried to use Montezuma and found it very buggy.
It may have gotten better.  I had better luck using
"lucene-webeservice":

http://lucene-ws.net/

Which is (obviously) a webservice interface to a normal Lucene
installation.  This worked pretty well for me.  I used Closure-XML
to construct and deconstruct the web-services.

 http://common-lisp.net/project/cxml/

CXML was far and away the best XML manipulator that I tried.  One
advantage of this solution is that any improvement to Lucene you get 
automatically, since you are really using Lucene, and not a
re-implementation of it.  Given the weight behind Lucene, that is going
to be hard to beat.  

The webservice was not a problem for the application that I was working
on.

However, once could argue that you don't want the overhead of a
webservice call.  I would still be tempted to use or construct a C/file
level interface to Lucene, for the reason that I don't think anyone can
outdo Lucene.  Clearly, Lucene-ws could be hacked into Lucene-LISPAPI,
since it already knows how to deal with Lucene (although through
Catalina/Tomcat and so on.).

I suspect that Postgres, with its extensible types, would allow a very
efficient use of full-text searching.  I think this is major plus of 
postgres (and has been since 7.1, if I'm not mistaken --- though it may
even better now.)

I agree with you that a full-text searching system should NOT be built
into Elephant, on the basis of separation of concerns.

On Tue, 2008-05-13 at 22:02 -0400, Ian Eslick wrote:
> Hello,
> 
> Has anyone found a good solution for full text search in lisp?  I'm  
> interested in indexing website objects such as posts and perhaps  
> external documents as well.   BDB doesn't, to the best of my  
> knowledge, have the appropriate building blocks for an efficient  
> indexing system and you certainly don't want to do it on top of the  
> current btree interface.
> 
> I have an old full text index code base that supported wildcard and  
> NEAR queries, all built on top of Elephant btrees.   It was convenient  
> but had a query time that slowed down linearly with the avg # of  
> documents per word.
> 
> I've decided that the best approach for me is to connect to a  
> separate, probably external, system to which I can incrementally add  
> content that will return something I can easily turn into an ordered  
> list of OIDs.
> 
> Most solutions I've run across require other languages, servers that  
> add up to needless complexity for my modest application.  In the lisp  
> world I've only seen Montezuma, which isn't being developed or  
> seriously maintained (unless it's just really stable I'd rather not  
> fight with stale code).
> 
> I am considering hacking something simple on top of postmodern that  
> uses the new text indexing functions of Postgresql 8.3 and wondered if  
> anyone here has insight into this application of postmodern or into  
> the full-text indexing from lisp problem in general.
> 
> Thank you,
> Ian
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel