Tuesday 10 June 2008

Ultrasphinx cheatsheet: indexing habtm relationships

In the previous post I quickly summarized how I am doing the indexing of fields of a model. Quite simple I know.

I struggled a little bit with getting the other models I wanted indexed to come through.

Before, as I mentioned, I was using Ferret. Now, despite all of its memory-hog bugginess it is simple to understand because you can modify stuff at the post-sql stage. So before, for instance, I had a method on the film model that did this:

def participants_name
participants.collect{|a| a.name}.join(' ')
end


That gave me a big long string and I could pass that to acts_as_ferret and it would use that to find all the films with Johnny Depp in them.

The problem came with Sphinx not doing any post-sql processing. I wanted the same results, but how?
The model looks like this:

class Film < ActiveRecord::Base
has_and_belongs_to_many :participants
...
end



At first I thought I could index participants and then step through all of their films, but that seemed too much work.

So... here comes the cheatsheet part!

How to index HABTM relationships

Again, maybe this is in the documentation but I had a hard time piecing it together.

I do this:

is_index :concatenate => [
{:class_name => "Participant",
:field => "name",
:table_alias => "p1",
:as => "actor",
:association_sql => "LEFT OUTER JOIN films_participants ON (films.`id` = films_participants.`film_id`) LEFT OUTER JOIN participants AS p1 ON (p1.`id` = films_participants.`participant_id`)"}
]


So, where as the :fields array of is_indexed is used for the fields of a model, :concatenate is where you set up the habtm relationships.
To break it down
1 - :class_name => the name of the class that this model relates to. We need to set this because you can reference by a different name as we'll see in a moment

2 - :field => same as the fields table on this model, what field do we want to index on. Not it is called 'field' and not 'fields'! You can only do one.

3 - :table_alias => i.e. GODSEND! I have 2 Participant habtm in my film model - one for actors and one for directors. The directors :concatentate definition is almost exactly the same as the actors one. If we didn't have :table_alias Sphinx wouldn't be able to tell them apart. Remember, Sphinx uses all this cleverness to generate one (very clever!) bit of sql.

4 - :as => what you want to call this in your search. I can then write a query that says "actor:'Johnny Depp'" and it will search only on that field.

5 - :association_sql => yes, Virginia, I am sorry but you have to write your own sql. It will ALWAYS look like this, so just copy and past mine and replace the table and column names.


That's it.

You can now do a query like this:

@films = Ultrasphinx::Search.new(:query => "Johnny Depp", :class_names => ['Film']).run



And it will search for any film titles with Johnny Depp in them AND any actors in the film. Sweet!

Now, didn't I say something about directors?

I have another habtm relationship:
 has_and_belongs_to_many :directors, :class_name => "Participant", 
:join_table => "directors_films"


Same other model (Participant). So now my :concatenate will look like this:

:concatenate => [              
{:class_name => "Participant", :field => "name", :table_alias => "p1",
:as => "actor", :association_sql => "LEFT OUTER JOIN films_participants ON (films.`id` = films_participants.`film_id`) LEFT OUTER JOIN participants AS p1 ON (p1.`id` = films_participants.`participant_id`)"},
{:class_name => "Participant", :field => "name", :table_alias => "p2",
:as => "director", :association_sql => "LEFT OUTER JOIN directors_films ON (films.`id` = directors_films.`film_id`) LEFT OUTER JOIN participants AS p2 ON (p2.`id` = directors_films.`participant_id`)"}]


Two references to Participant, but :table_alias saves the day.

2 comments:

Laurel Fan said...

Thanks for posting this, it's exactly what I was looking for. I was hoping to not have to write the sql, but oh well.

(btw, you seem to have some double escaped gts hanging around)

Brian Armstrong said...

Thanks this saved my butt today! Really appreciate you posting it.

Also, just wanted to mention (in case it helps someone else), it wasn't immediately clear to me what 'p1' and 'p2' were. I thought these were maybe in the database somewhere, but these are just arbitrary names you used in the association_sql. So people can change them to anything. Great job!