Friday, 12 December 2008

MySQL and Leopard

I had such problems on my macbook (though not on the macpro) in getting mysql running.

This article seems to have helped.

Monday, 8 December 2008

checkboxes for multiple items

I am doing an interface where I want to list a bunch of things (index) and have a checkbox next to each of them. Then, using a button, edit or show all those selected items.
I am always running into this problem of checkboxes (maybe I am just stupid) so I spent some time and figured out a way that works for me. There are probably other solutions, but here is what I do:

First, in the index.html I have this javascript code:

<% content_for :javascript do %>
<script type="text/javascript" charset="utf-8">
/* <![CDATA[ */
function editDevices() {
var device_count = document.getElementsByName('device[]').length;
var total="";
for(var i=0; i < device_count; i++){
if(document.forms['group_edit'].elements['device[]'][i].checked) {
if (total.length > 0) {
total += ","
total +=document.forms['group_edit'].elements['device[]'][i].value;

new Ajax.Request('<%= group_edit_devices_path %>', {
parameters: {
authenticity_token: encodeURIComponent('<%=form_authenticity_token%>'),
device_ids: total

function showDevices() {
var device_count = document.getElementsByName('device[]').length;
var total="";
for(var i=0; i < device_count; i++){
if(document.forms['group_edit'].elements['device[]'][i].checked) {
if (total.length > 0) {
total += ","
total +=document.forms['group_edit'].elements['device[]'][i].value;

new Ajax.Request('<%= group_show_devices_path %>', {
parameters: {
authenticity_token: encodeURIComponent('<%=form_authenticity_token%>'),
device_ids: total
/* ]]> */
<% end %>

Then I have the table of devices:

<table border="0" cellspacing="5" cellpadding="5">
<th># Displays</th>
<td colspan="5">
<ul class="tabs">
<li><%= link_to_function "Show", "showDevices()" %></li>
<li><%= link_to_function "Edit", "editDevices()" %></li>
<li><%= link_to "Delete" %></li>
<% for device in @devices %>
<td><%= check_box_tag("device[]",, false, :id => "device_#{}") %></td>
<td><%=link_to_remote h(device.uid), :url => formatted_device_path(device, :js), :method => "get" %></td>
<td><%=h %></td>
<td><%=h device.ip %></td>
<td><%= device.displays.count %></td>
<% end %>

The result is that my ajax function will be called with a comma separated list of the ids of the selected devices.

Thursday, 27 November 2008

&@$%^£& IE 7 and insecure content!

What a morning!

A user kindly reported that he couldn't log in to FilmAmora using IE7. IE6 worked, FireFox worked. Safari worked.

IE7 (for me) was coming up first with the 'there is insecure content on this page do you wish to show it?' (the login in page is under https). Clicking yes or no had the same result: the page would flash but then IE7 reported that it couldn't show the page. (It should have, it turns out, said 'I am choosing not to show the page').

The issue was that insecure content. For me, specifically it was the statcounter call. But your issue may be other calls. Anyway, I removed all calls to anything outside of my domain and that seems to have fixed it!

Monday, 3 November 2008


I use unfuddle as a git host and bug tracking, etc service. But, because I have so many projects on the go I ran into this problem where you can only have an ssh key in ONE account on unfuddle. So, I wrote to the support guys and asked them how to handle multiple accounts from one machine.
Here is their response:

The easiest way to deal with need multiple keys for multiple accounts is to actually just create them using different filenames in your ~/.ssh/ directory. For instance, if you have 2 accounts, "account1" and "account2", you would want to run "ssh-keygen -t rsa" twice, each time specifying a different filename:


Now, by default, ssh will use the file ~/.ssh/id_rsa when connecting to a remote server. As neither of your keys is in this file, you must tell SSH (not git) which private keyfile to use when connecting to each account. This is done using the SSH config file.

For instance, create or edit a file named .ssh/config with the following contents:

User git
IdentityFile ~/.ssh/unfuddle_account1_id_rsa

User git
IdentityFile ~/.ssh/unfuddle_account2_id_rsa

I have followed this model for my own accounts (of which I have many) and it has worked perfectly. I hope this helps you as well.

Thursday, 16 October 2008

Creating a summary

I have an app that has rows of statistics. We wanted to create summaries of arbitrary rows.
This method seems to work:

  def self.create_summary_row(array)
unless array.blank?
klazz = array.first.class
sum_stat =
array.each do |line|
if line.is_a?(klazz)
line.attributes.each do |k,v|
unless v.nil? || v.is_a?(Time) || v.is_a?(Date)
if sum_stat[k].nil?
sum_stat[k] = v
sum_stat[k] += v

Monday, 29 September 2008


This is so great - goodbye other hosting companies that can't compete!

Amazon has hosting facilities known as ec2 (elastic compute cloud). You pay by the hour ($.10 for a small installation). That works out to around $70/month. But, the power you get is much better than your average hosting.

Paul Dowman has released a great ec2 instance called ec2onrails that can get your rails app up and running on ec2 in about 30 minutes. I am now using it for FilmAmora as well as for another client.

It is awesome!

One thing I discovered today was the sweet built-in support for cron jobs. Write a script and plop it in your app's script directory. Call it hourly, daily or weekly (perhaps monthly? I didn't check). This script will get run automatically by an already existing cron job! No need to figure out those silly cron settings. Very cool!

I highly recommend it. FilmAmora is running much much better on it than on the old hosting (where I had to restart it every day just to hopefully have enough memory to run 2 mongrels).

Friday, 15 August 2008

secure forms

One thing I struggled with in RoR was creating forms that submitted to https. So I ended up writing a couple of helpers:

  def secure_form_for(record_or_name_or_array, *args, &proc)
unless RAILS_ENV == 'production'
url_options = {}
url_options = {:protocol => 'https://', :only_path => false }

options = args.last.is_a?(Hash) ? args.pop : {}
if !options[:url].nil?
options[:url] = url_options.merge options[:url]
options[:url] = url_options

return form_for(record_or_name_or_array, options, &proc)


  def secure_form_tag(*args, &proc)
logger.debug("secure_form_tag args #{args.inspect}")
unless RAILS_ENV == 'production'
url_options = {}
url_options = {:protocol => 'https://', :only_path => false }

options = args.last.is_a?(Hash) ? args.pop : {}
if !options[:url].nil?
options[:url] = url_options.merge options[:url]
options[:url] = url_options

logger.debug("secure_form_tag options #{options.inspect}")

return form_tag(options, &proc)

The should be simple replacements for form_for and form_tag. Now, I have to admit I wrote these when I was using Rails 1.2.5, so there might be some things in 2.x that make these redundant, but, they work for me.
The only caveat is that if you are doing a form where you are not passing in the action, just the object e.g

<% form_for @object do %>

Then you are going to have to supply the actions. I am looking into this now, and I might rewrite this post if I sort it out!

Saturday, 26 July 2008

xml_hidden plugin

Recently I've been investigating the steps needed to open up FilmAmora's API to the outside world. That means supplying xml data so people can do what they want with it.
But I didn't want sensitive data in models to be exposed in the xml. And I also am too lazy to write custom to_xml methods for everything.

So I wrote a plugin call xml_hidden That lets you set something on the class to hide certain attributes from xml output.

class Film < ActiveRecord::Base
attr_xml_hidden :acquiring_url, :id, :created_at

Now whenever I output a film to xml those values can't be seen.

Sunday, 20 July 2008

Escaping for JavaScript

Recently I've started to use the most excellent Prototip2 for doing sexy tooltip stuff. It works a treat.
The only problem I have had is that some of the things I want to put into the tips have single quotes in them (e.g Bob's Team). I was surprised to find that Rails doesn't (as far as I could find) have a handy dandy way of making strings JS friendly. So I whipped up this extention to Erb::Util.
I hope some of you find it useful. It will escape single quotes and as an added bonus it also does the html escaping so you only have to make one call.

class ERB
module Util
def js_escape(str)
h(str.gsub(/[']/, '\\\\\''))

alias js js_escape
module_function :js
module_function :js_escape

I put this into an initializer.

You can then call it from your views like this:


Monday, 14 July 2008

Combating XSS

This info is available elsewhere, but as I always forget, this is a good place to repeat it.

Install the White List plugin. Get it here

Once installed you'll need to add the following line to the init.rb of the white_list plugin:
ActionView::Base.send :include, WhiteListHelper

In the Application.rb (Application Controller)
include HtmlFilterHelper
before_filter :sanitize_params

Then I have a class called HtmlFilterHelper
module HtmlFilterHelper
def sanitize_params(params = params)
params = walk_hash(params) if params

def walk_hash(hash)
hash.keys.each do |key|
if hash[key].is_a? String
hash[key] = white_list(hash[key])
elsif hash[key].is_a? Hash
hash[key] = walk_hash(hash[key])
elsif hash[key].is_a? Array
hash[key] = walk_array(hash[key])

def walk_array(array)
array.each_with_index do |el,i|
if el.is_a? String
array[i] = white_list(el)
elsif el.is_a? Hash
array[i] = walk_hash(el)
elsif el.is_a? Array
array[i] = walk_array(el)

For what it is worth, I also do stuff like this:

<%=white_list synopsis.synopsis[0,100].gsub(/<\/?[^>]*>/, "")%>

Which removes any html-style tags from the text and then white_list's it before outputting.

Wednesday, 9 July 2008

Becoming a Git

Recently the SVN repository service we were using for FilmAmora had a lengthy outage. So, being a real Rails-er I decided to switch over to GitHub. Getting the source up into it was pretty easy.

So, I went ahead and started making changes and doing git commit's and all that good stuff. Well, today I went to the projects homepage on GitHub and you can imagine my surprise and horror when I saw that it looked like nothing had been checked in since my original commit.

Wot de hell?!

I then scoured the net for Git for dummies without success. So here it is:

1) Do all the stuff they tell you do to to get your code in.
2) Do your commits... but remember that for some strange reason you are committing into your LOCAL Git repository.
3) here's the key - do a git push! that will chuck your code up onto the server.

Seems perhaps obviously, but coming from an svn or cvs background it confused the hell out of me.
At a later date (probably this weekend) I will write a little Git note about how this works from the perspective of someone used to using one of the more traditional source code systems.

Saturday, 5 July 2008

Euro-friendly permalinks

Permalink_fu is rather an awesome thing. The only drawback is that it doesn't really handle accented characters very well. I've looked around and seen a bunch of solutions that attempt to be 'clever' and do the substitution on one line, etc. But I decided that I'd rather go for brute force.
I created an extension to String with one method called 'no_accents':
class String
def no_accents

Then I do a whack of these:

    str = self
#Spanish letters
str = str.gsub("á", "a")
str = str.gsub("é","e")

For upper and lower case.
Obviously there is a big list. Right now I am just doing Spanish, French and German as they are the ones I am most interested in.

I had to make a change to Permalink_fu to call my method first:
    def escape(str)
str = str.no_accents
s = ((translation_to && translation_from) ? Iconv.iconv(translation_to, translation_from, str) : str).to_s

But that's all.

Here is the file. What I did is put it in my initializers directory.

If you add to it maybe you could post your additions. I am sure some clever-clogs out there will come up with a more slick solution. But... mine works!

Saturday, 28 June 2008

Rails 2.1 caching - nothing is ever easy!

Last night I watched the new Railscast episode that talked about the new caching features in Rails 2.1. I thought it looked cool and would add it to the new round of FilmAmora changes.

But... nothing is every easy!

I may look simple, but I encountered several problems

1. Conflicts with GetText

We use GetText for the translations on FilmAmora. We like it because there are free poEditor apps on every platform and we can easily send off the files to whomever to be translated.

The problem comes with this line:
require 'gettext/rails'

That you need to have to fire up some of the Rails-specific GetText stuff. This is all fine - it has been working for quite some time. But, when I tried to cache our Genres like this:
  def self.all_cached(language)
key = "genres_#{language}"
Rails.cache.fetch(key) {Genre.find(:all).sort_by {|genre| genre.get_description}.reject {|g| g.get_films_count == 0}}

Here is the result:
undefined method `cache' for GetText::Rails:Module

What?!?! Yes, it seems that GetText::Rails will hide Rails. This is, quite frankly, SHIT. So, after a long time poking around I have discovered that you need to do this:

  def self.all_cached(language)
key = "genres_#{language}"
::Rails.cache.fetch(key) {Genre.find(:all).sort_by {|genre| genre.get_description}.reject {|g| g.get_films_count == 0}}

Yippee! It will all work now, right?


2. Class != Class

I hit refresh the first time and wow was I excited! I saw lines like this in the log:
Cache write (will save 0.59562): genres_es

Woo hoo! Look at all the time I will save!

So I hit refresh.
undefined method `get_description' for #<Genre id: 1, description: "Action and Adventure", order_by: 2>


I am using the 'default' memory store. Something is going funny with retrieving objects from it. I never solved this problem.

If I do this in the console:
>> @genres = Genre.all_cached("es")
>> @genres = Genre.all_cached("es")

I see this in the log:
Cache write (will save 0.51974): genres_es
Cache hit: genres_es ({})

And I can do this:
>> @genres[0].get_description("es")

So what is going on in my web app? I have no idea. It seems that the class retrieved from the memory cache is incomplete in some way. get_description is not an accessor, it does go off and get the translation, but... so what? it is still a method.
This had me stumped!

3. File Store no worky
So I added this to development.rb:
config.cache_store = :file_store, '/cache_store'

I ended up getting this:
undefined method `get_description' for #<String:0x5459f0c>

Ok, I am tired of this now.

4. MemCached
Everyone is talking about using MemCached for this kind of thing. Now, I know developers and we are a lazy bunch. My guess is that this whole caching stuff has been written with MemCache in mind and screw anything else (see points 2 and 3 above).
So I installed MemCache and changed the line in the development.rb to be this:
config.cache_store = :mem_cache_store

I restarted the server and hit refresh. Trying to contain my excitement I saw a properly rendered page.
I hit refresh again.
undefined class/module Subgenre

After another web-scouring exercise I discovered a solution.

My method now looks like this:
  def self.all_cached(language)
key = "genres_#{language}"
::Rails.cache.fetch(key) {Genre.find(:all).sort_by {|genre| genre.get_description}.reject {|g| g.get_films_count == 0}}

And guess what? It works. Of course it now means running memcache on my local machine and installing it on the production box. But I've saved .52 seconds! It took 3 hours to get to the solution, so I figure in only 350 web page hits I will make it back.

Oh, of course I was already caching the html on the server for most things, but this is a little tidier.

Friday, 27 June 2008

Linking offsite

When you get into something like Ruby on Rails you want to use it for everything - the old adage about getting a hammer and everything becomes a nail rings true in programming!

I wanted to link to Wikipedia. I thought I could use link_to for something like that, but it turns out that really link_to (and it's variations) are meant to link to something inside your RoR app, not outside of it.

So I wrote this handy method that generates a link off to Wikipedia.

def wikipedia_link(text, language = current_language.to_s)
"<a href=\"{h(text)}&amp;language=#{language}\" target=\"new\">Wikipedia</a>"

It's fairly simple - pass in the text you want to find on Wikipedia and your language.

Thursday, 19 June 2008

IE and form spacing

One has to wonder how Microsoft got it so wrong, but anyway.

One thing I love about RoR is the ajax side of it and how easy it makes it. For instance adding something to a cart with ajax:

<% form_remote_tag :url => {:controller => :store, :action => :add_to_cart, :id => film} do %>
<%=submit_tag(_('Buy &raquo;'), "title" => _("add to cart @ %s") % number_to_currency(film.price, :unit => "€"), :class => "buy_button_big buy_button")%>
<% end %>

That is great. On FireFox it looks like this:

But on IE6 it looks like this:

Insane quatities of extra space. Seems that IE loves to add space after forms. You can't stop it!

Or can you!?

I have found that adding this in my css is the trick:

form {

But, since I don't need it in ff... why not use the very sweet CSS Browser Selector plugin.

Now you can do this:

.ie6 form {

Add all shall be well in the garden.

Tuesday, 17 June 2008

pull from another site

Recently a site I was working on needed to forward all unknown requests to another website (their old website). This turned out to be shockingly easy to do!

In the Application Controller:

  def default
logger.debug("request.env[\"REQUEST_URI\"] #{request.env["REQUEST_URI"]}")
render :text => Net::HTTP.get_response(

And at the very bottom (so it is only done if nothing else catches the route):

  map.connect '*path' , :controller => 'application' , :action => 'default'

Easy peasy!

Saturday, 14 June 2008

Format My Source Code for Blogging

Format My Source Code for Blogging

Absolving your views of responsibility

One thing I struggled with when I started out with RoR was how to create a site that was 'current looking'. In all the books they told you about views and layouts, but not how to use the layout to its most effective end.
What I wanted was a simple 3 column website. But instead of 'static' information in the left and right column I wanted it to change. In some cases it was ads, in some cases a list of items, etc. The only thing I could think of was to hold a lot of layout information in the view. So, I would have the header and footer in layouts/application.html.erb but in the middle there was a big <%= :yield %> that would handle the middle (vertically) of the page, including the columns.
(Hold on, I'm getting to the code!)
The problem with this is that if your view is wrong, or if you don't include it, then your site goes all to hell.

The solution, I found, was named :yields.

To illustrate I am going to build a funky 3 column site in 5 minutes.

Create the project

$ rails threecol

To set up a new project

Neat tip: open TextMate and drag the threecol folder onto the icon in the doc. That way it will open up the whole structure (i.e. the rails project).

Add a layout

Matthew James Taylor has a great website with some free to use CSS templates. So, we're going to use his 3 column one.

Go here. View source, copy the CSS.

Create a new file in your public/stylesheets directory called site.css and paste the CSS from Matt's page in.

Include the CSS in your Layout

Create a file in your app/views/layouts directory called application.html.erb. This is the name for the default layout, so you don't need to change anything in the controller we'll create in a moment.

Go back to Matt's page, view source and paste the whole thing into application.html.erb. Delete the styles definitions from here. (lines 12 - 170 as of this writing)

Knock up a scaffold

(Best to setup the database.yml here first)

$ script/generate scaffold product name:string description:string

note: in Rails 2 you don't have the same scaffolding as before. doing the above, where you define some fields, will give you a controller, model, views and migration

Get everything setup:

$ rake db:drop db:create db:migrate test

Get your CSS on

Delete products.html.erb from the layouts directory. For our example we don't need it.

Back in your application.html.erb paste this line in the header:

<%= stylesheet_link_tag 'site' %>

If you start the server (script/server) now and go to localhost:3000/products you should see Matthew's page all nice and purty because we haven't included any of our content.

Get your content in

Delete the center of Matthew's code (lines 40 - 59 here) and replace it with

<%= :yield %>

Refresh now and you will see your scaffold in the middle. Nice.


Ok, we could have done all that easy, so let's get to the fun stuff.
Delete Matthew's code for either sidebar.
The center of your application.html.erb should now look like this:

            <div class="col1">

<!-- Column 1 start -->
<%= yield %>
<!-- Column 1 end -->
<div class="col2">
<!-- Column 2 start -->
<!-- Column 2 end -->
<div class="col3">
<!-- Column 3 start -->
<!-- Column 3 end -->

inside column 2 (the left nav in Matthew's layout) put this:

<%= yield :left_nav_content %>

In views/products/index.html.erb do this:
<% content_for :left_nav_content do %>
<h1>Here it is!</h1>
<div>Exciting left-nav content</div>
<% end %>

What is going on?

We all know about <%= yield %>, but adding a name to it says to look in your view (or any views including, even through partial rendering) and see if there is something that wants to write out to this space. <% content_for :left_nav_content do %> says that the stuff in here, please write there.

The power is - you can do anything in there. Include partials, for instance.

The advantage is that now you can not screw up your layout. You can still put things whereever you want, but the layout is out of the view's hands.

I hope this is clear. It made a big difference to how I construct views!

Wednesday, 11 June 2008

Ultrasphinx: ' != "

Just ran into a wee ultrasphinx problem.

I had been doing this:

@query = "actor:'#{}' OR director:'#{}'"

but it turns out that did nothing! You need to surround query segments in double quotes for it to actually parse them as a single unit:

@query = "actor:\"#{}\" OR director:\"#{}\""

Tuesday, 10 June 2008

Legitimate job posting?

I recently responded to a job posting I saw on jobs.rubynow. The company seemed to like me:

"Thank you for your interest in ***. After our first round of resume reviews, we are interested in moving you forward to our next round of evaluations. This is a significant accomplishment considering the competitive nature of this position, and the small number of candidates for a position that go to on – so congratulations! "

Wow! Great! Not exactly sure of the English at the end of the sentence there, but cool.

They then proceeded to ask me a list of questions... all of which I had already answered on the resume they loved so much! But I humored them and answered them again.
A few days later I got this back:

"A mini-test is a round between resume reviews and interviews, that helps the company get a sense of your real interest in the job, as well as your skills and abilities. Receiving a mini-test is a pretty significant accomplishment considering the competitive nature of this position, and the small number of candidates for a position that will receive one – so congratulations! "

Hmmm surprisingly similar to the first email.

They wanted me to write a test. Not really what I like to do, but perhaps I'd be interested. But here is the test:

3. Code as much of the system as you can within a time-box, which is a maximum of 10 hours in this case.

Uh, they want me to spend 10 HOURS???? on their test!?!

The real kicker for me was that they sent a spec for the test. It was to write them a new system for their HR department to track resumes! They didn't want a test, they wanted a free application!

They've made it sound like competition is fierce, so you'd better really do a good job. Now, most developers are proud of what they write and it is a remote test, so why not spend 20 hours or more and create an awesome system!

Or am I just being paranoid?

Oh, the also never answered any of my emails. I only got the bulk emails going out to candidates.

BTW, the company was Rich Applications Consulting,

Ultrasphinx cheatsheet: indexing habtm relationships

In the previous post I quickly summarized how I am doing the indexing of fields of a model. Quite simple I know.

I struggled a little bit with getting the other models I wanted indexed to come through.

Before, as I mentioned, I was using Ferret. Now, despite all of its memory-hog bugginess it is simple to understand because you can modify stuff at the post-sql stage. So before, for instance, I had a method on the film model that did this:

def participants_name
participants.collect{|a|}.join(' ')

That gave me a big long string and I could pass that to acts_as_ferret and it would use that to find all the films with Johnny Depp in them.

The problem came with Sphinx not doing any post-sql processing. I wanted the same results, but how?
The model looks like this:

class Film < ActiveRecord::Base
has_and_belongs_to_many :participants

At first I thought I could index participants and then step through all of their films, but that seemed too much work.

So... here comes the cheatsheet part!

How to index HABTM relationships

Again, maybe this is in the documentation but I had a hard time piecing it together.

I do this:

is_index :concatenate => [
{:class_name => "Participant",
:field => "name",
:table_alias => "p1",
:as => "actor",
:association_sql => "LEFT OUTER JOIN films_participants ON (films.`id` = films_participants.`film_id`) LEFT OUTER JOIN participants AS p1 ON (p1.`id` = films_participants.`participant_id`)"}

So, where as the :fields array of is_indexed is used for the fields of a model, :concatenate is where you set up the habtm relationships.
To break it down
1 - :class_name => the name of the class that this model relates to. We need to set this because you can reference by a different name as we'll see in a moment

2 - :field => same as the fields table on this model, what field do we want to index on. Not it is called 'field' and not 'fields'! You can only do one.

3 - :table_alias => i.e. GODSEND! I have 2 Participant habtm in my film model - one for actors and one for directors. The directors :concatentate definition is almost exactly the same as the actors one. If we didn't have :table_alias Sphinx wouldn't be able to tell them apart. Remember, Sphinx uses all this cleverness to generate one (very clever!) bit of sql.

4 - :as => what you want to call this in your search. I can then write a query that says "actor:'Johnny Depp'" and it will search only on that field.

5 - :association_sql => yes, Virginia, I am sorry but you have to write your own sql. It will ALWAYS look like this, so just copy and past mine and replace the table and column names.

That's it.

You can now do a query like this:

@films = => "Johnny Depp", :class_names => ['Film']).run

And it will search for any film titles with Johnny Depp in them AND any actors in the film. Sweet!

Now, didn't I say something about directors?

I have another habtm relationship:
 has_and_belongs_to_many :directors, :class_name => "Participant", 
:join_table => "directors_films"

Same other model (Participant). So now my :concatenate will look like this:

:concatenate => [              
{:class_name => "Participant", :field => "name", :table_alias => "p1",
:as => "actor", :association_sql => "LEFT OUTER JOIN films_participants ON (films.`id` = films_participants.`film_id`) LEFT OUTER JOIN participants AS p1 ON (p1.`id` = films_participants.`participant_id`)"},
{:class_name => "Participant", :field => "name", :table_alias => "p2",
:as => "director", :association_sql => "LEFT OUTER JOIN directors_films ON (films.`id` = directors_films.`film_id`) LEFT OUTER JOIN participants AS p2 ON (p2.`id` = directors_films.`participant_id`)"}]

Two references to Participant, but :table_alias saves the day.

Monday, 9 June 2008

Ultrasphinx cheatsheet: indexing fields you need!

The Sphinx/Ultrasphinx combination is great for searching. Very fast and much less load on your database than other options. It also takes a lot less memory and is more reliable.

But, the documentation is maddening. Here is what I have learned about implementing it.

To index a field on a model:

is_indexed :fields => [{:field => :field_name, :sortable => true}]

The sortable lets you sort on it later in searches. The thing to remember is that sphinx knows nothing except what you explicitly tell it. So it won't know about fields you don't mention, or what you might want to do with those fields. Don't include the :sortable flag and you can't sort on it.

To that end, we have a field called 'available' that we set/unset if an item in the catalogue shouldn't be displayed online. Now, we never really want to search on that, but, we do want to include it in searches as a filter. So, the first line of our model is really this:

is_indexed  :fields => [{:field => :title, :facet => true, :sortable => true}, :available]

That means we can do this:

@films = => "title:Foo", :class_names => ['Film'], :filters => {"available" => 1}).run

And we'll just get a list of available films named Red (or with Red in the title).

Sunday, 8 June 2008

Ultrasphinx & UTF-8 data

On we obviously have a lot of data that is not your standard ascii - Spanish film titles and Actor's name especially.

After switching from Ferret to Sphinx as a search engine (more about this in another post) everything was going great using the ultrasphinx plugin. That is until I tested searching on a name like 'Belén'. No films were found with an actress with that name - though I knew there were loads of actresses with that name and therefore films.

The problem was the data settings in the database. Even though I was displaying the names correctly this was due to a transliteration between the characters in the database and the webpage. The data was not being stored correctly.

You MUST create tables as default charset utf8 to get the correct results:

Step 1) Create the table in a migration

create_table(:films, :options => 'DEFAULT CHARSET=utf8') do |t|
t.string :title, :null => false

Step 2) Tell rails that the database is in utf-8 by putting this in your database.yml:

adapter: mysql
database: filmamora_development
encoding: utf8

Step 3) Make sure the data going IN is in utf8. Filmamora gets a lot of data from scraping our wholesales site (using scrapi), so I do this:

film_info_es = dvdgo_movie.scrape(URI.parse("#{server}/#{page}"),
:parser_options => {"char-encoding" => "latin0",
"output-encoding" => "utf8", "quote-marks" => "true" },
:user_agent => "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20080201 Firefox/")

Step 4) search!
@films = => @query, :class_names => ['Film'],
:page => params[:page]||1, :per_page => 12,
:filters => {"available" => 1}).run

Now, if you were (like us) unlucky enough to have NOT done this from the start and can't re-run your migrations (we have a catalogue of 35,000 items!), you can do this to your tables:


We then deleted all our character data (film titles, actor's name, synopsis) and reimported. We kept our id's (which we needed to make sure our existing orders and wishlists, etc didn't go wrong) but got the new data.

In theory you don't need to set the charset on each table. In MySQL you can create a database with a default charset:


And then all your tables should use it too.

But, I don't think that rake db:create does this... (maybe it does now that I have set the character set in the database.yml... hmmm). I also am not sure if the rake db tasks to create tables explicitly set the charset or not. Anyone?

I'm going to write a few posts on sphinx/ultrasphinx because it is so awesome and I had some issues finding clear info on how to use it, so I might as well write it up.