Mulling Over Our Ruby on Rails Full Text Search Options

There are quite a few choices when it comes to adding a full text search in a Ruby on Rails application. We thought that had considered all of our options when we ultimately settled on using Sphinx / Ultrasphinx. We learned otherwise, though, after stumbling across Xapian / Acts_As_Xapian while trying to find a fix for our Sphinx implementation after a production build.

Here are the details of our thought process and how we ultimately ended up deploying with Xapian.

While looking for a full-text search engine when Mindbites was released a year ago, we were looking for something easy and quick. We ended up going with Douglas Shearer’s Acts_As_Indexed which worked out great. It was written entirely in Ruby and very easy to implement with automatic indexing. (ie No cron jobs needed to keep the index up to date.) If you have a simple site and want to implement a basic search very quickly, definitely give Acts_As_Indexed a look.

However, over the past month or so we decided it was time to find something a little more robust. We wanted the features that the full-blown full-text search engines give such as spell correction, stemming (ie “connection / connecting / connected” would all search for words containing “connect”), finding “similar” results, and the ability to work across multiple servers. We were also having a few index corruptions with Acts_As_Indexed that we were wanting to get away from with another tool.

We went through the usual suspects that are often seen in the Rails Community:

Solr / Acts_As_Solr

Very robust search server based on the Lucene Java library with a mature acts_as_solr Rails plugin. I was blown away by Erik Hatcher’s “Solr On Rails” talk at RailsConf 2007 and thought this could be a good fit. However, we decided to check out other options because, all things being equal, we would rather not deal with installing Java on our servers.

Ferret / Acts_As_Ferret

We nixed Ferret fairly quickly after hearing a few horror stories about corrupted indexes among other issues with Ferret on production servers.

Sphinx / Ultrasphinx

Sphinx appears to be the new defacto standard for full text search among Rails developers. From what we read, it is very powerful, very fast indexing, and easy to use with Ultrasphinx. This was our choice to replace Acts_As_Indexed. However, after about a week’s worth of development and a deployment to our production server, there still were a lot of mysteries to the Sphinx.

We were not thrilled with all of the config files involved. With Ultrasphinx, you will need a xxx.base file which is accessed via a rake task to generate a config file that Sphinx can use. This works, but I was hoping for something a little bit simpler.

We were also not big fans of the daemons that run in the background. In our ITG environment, the daemons decided to stop running on a couple of occasions which caused a 500 error to be thrown when searching. (We later fixed this issue by reindexing and restarting the daemon with each Capistrano deployment.) Lastly, I read this quote from the Ultrasphinx deployment notes about recommended cron jobs for Sphinx on your production server:

“The first line reindexes the delta index every 10 minutes. The second line reindexes the main index once a day at 4am. The third line will try to restart the search daemon every three minutes. If it is already running, nothing happens.”

Kicking off a job every 3 minutes just to make sure another job is running did not seem right to me.

Xapian / Acts_As_Xapian

I had never heard of Xapian before this past Thursday. I happened to stumble across this article on on Evan Weaver’s blog while I was researching a Sphinx issue with our production build. Always looking for new and better ways to do things, I took a closer look at Xapian. Within 15 minutes, I had Xapian installed on my local Ubuntu machine and was successfully searching Mindbites lessons using the acts_as_xapian plugin. I spent the rest of the weekend replacing our Sphinx code with Xapian.

Installation, configuration, and deployment all went so well over the weekend that we deployed our completely revamped search code to our Mindbites production server this past Monday with spelling corrections, stemming, and “You may also like” functionality intact.

As a side note, that same Evan Weaver blog post has a comment about a plugin called act_as_searchable using the Hyper Estraier full-text search system. I have not looked into this solution, but I would be curious to hear from other readers who have tried this.

In part 2 of this blog post, I will go into detail about our Rails Xapian implementation.

  •