Get them here. I had a great time there - well worth going!

A colleague and I recently updated an app to rails 2.3.4 and found ourselves with a slightly sticky performance problem - slow app response times, with time disappearing seemingly nowhere. New Relic also showed that requests were queueing inside mongrels which wasn't shouldn't have been a problem since there were plenty of mongrels for the traffic being received at the time

Our friendly Engineyard support engineer pointed us to a ticket on the Rails lighthouse describing a rather similar situation encountered during the RC phase of rails 2.3: Active record based sessions causing trouble because they weren't returning their connection to the connection pool [1]. Connection pooling isn't new in 2.3 but the move to rack changed the order in which things like sessions, query caching, connection pool are put up.

This ticket was mark as resolved, however looking at the fix it became clear why we were still experiencing the problem:

if ActionController::Base.session_store == ActiveRecord::SessionStore
  configuration.middleware.insert_before :"ActiveRecord::SessionStore", ActiveRecord::ConnectionAdapters::ConnectionManagement
   configuration.middleware.insert_before :"ActiveRecord::SessionStore", ActiveRecord::QueryCache
else
  configuration.middleware.use ActiveRecord::ConnectionAdapters::ConnectionManagement
  configuration.middleware.use ActiveRecord::QueryCache
end

(Source here)

Since our session store isn't ActiveRecord::SessionStore, the code to put the database related middleware in the right place in the chain isn't executed. I'm not entirely sure what to do about this - the rails initializer cannot be expected to know the details of what the session store is doing (unless part of the session store interface was a uses_active_record? flag) but as it is this very sneaky and would have taken a while to find if we hadn't been pointed in the right direction.


[1] As of Rails 2.2, Active Record has a concept of a connection pool: a given rails instance won't use more than a certain number of connections (5 by default) even if that rails instance was a jruby based instance with hundreds or threads. If you need a connection from the pool and they are all in use then you have to wait until one becomes available. Normally you are blissfully unaware of this - rails marks your connection as no longer used when your action has finished processing.

Fun with ruby http clients

April 13th, 2009

Quite a few people have written about the performance failings of Net::HTTP, but until recently, to be honest, I never really cared a lot. Most of my http request needs have been fairly meagre, often not much more than hitting a url and checking the result code.

I've been playing with couchdb recently, and so my app does a fair amount of http requests. I've been using RelaxDB which uses net/http, so Net::HTTP's performance has started to matter.

Net::HTTP is not the only game in town. I spent some time recently playing with rfuzz, eventmachine and taf2-curb and came to largely the same conclusion as Paul Dix.

Leaning on a mature library such as libcurl gives taf2-curb a huge advantage. While eventmachine was on par speed wise, neither of the 2 http clients it includes are a complete implementation of the HTTP protocol. For example HttpClient will tell the remote server that it speaks HTTP/1.1, yet it does not support chunked encoding (mandatory part of the spec). HttpClient2 does understand chunked encoding, but doesn't let you set headers or a body to the request. Fine for just pinging a url, but not up to the task of working with couchdb. Something to do with couchdb's chunk encoded also seemed to confuse rfuzz.

taf2-curb does the job very nicely. On my dumb benchmark, 1000 requests for a static html page hosted on the same machine (ie we're pretty much only testing overhead) the numbers are:

Benchmark.bmbm(5) do |x|
  x.report 'net/http' do
    u = URI.parse('http://docs.local/')
    1000.times {Net::HTTP.get u}
  end

  x.report 'curb' do
    1000.times do
      c = Curl::Easy.new 'http://docs.local/'
      c.perform
    end
  end
end
               user     system      total        real
net/http   0.560000   0.270000   0.830000 (  1.065960)
curb       0.310000   0.170000   0.480000 (  0.696188)

On the other extreme, these numbers corresponds to ~1 meg of data pulled from couchdb (benchmark code the same apart from the urls, and I did 100 iterations rather than 1000).

               user     system      total        real
net/http  17.400000   8.900000   2.630000 (  32.067821)
curb       0.700000   1.300000   2.000000 (  29.586022)

curb comes up squarely on top. Another thing of note during this test is cpu usage (as you might expect from the difference in user time). With Net::HTTP the ruby process running this was taking up 60-70% (on a 2.4GHz core duo), with curb it used around 5% of cpu.

The commit to switch RelaxDB from net/http to taf2-curb is here for those interested - really very straightforward stuff. There may well be more to be had by fiddling with libcurl options, I haven't tried yet.

If you work with designers or getting to grip with a new codebase you may find yourself frequently answering the question 'What template generates this bit of html'. This smidgen of code adds comments to your html with the path to the templates used. For example the template

Hello
 <%= render :partial => 'some_partial' -%>

might generate

<!-- TEMPLATE: views/dummy/index.html.erb -->
Hello
<!-- TEMPLATE: views/dummy/_some_partial.erb -->
Here is a partial
<!-- ENDTEMPLATE: views/dummy/_some_partial.erb -->
<!-- ENDTEMPLATE: views/dummy/index.html.erb -->

You can grab the code from github. Original idea from this thread on rails-talk

It's a good thing when tests run quickly. You get a shorter feedback cycle, you're unlikely to move on to something else while your tests are running. Another good thing is ruby-debug: a fast debugger for ruby, worth its weight in gold when you need it. As of Rails 2.1 ruby-debug is always loaded and active during tests.

Having the debugger there does have some overhead. Not much, but it's still there. You're incurring that overhead whether your tests need it or not. On your CI server, it's pure wasted time for example. Even running locally it's frequently unecessary - most of the time when I run the tests I'm just running the tests, not using the debugger to try to understand a test failure.

So, without further ado I present this:

unless ENV['DEBUGGER']
  Debugger.stop if defined?(Debugger)
end

Stick it in your test_helper.rb, just below where it says

require 'test_help'

Run your tests again, and hey presto, faster! It does very from app to app, on some of the smaller apps I have the difference is pretty marginal, on the bigger apps I've got gains of up to 20-30%. I expect that on the smaller ones loading the Rails environment, setting up fixtures etc... dominates actual test runtime. And the day you do need to run the tests under the debugger it's as easy as

rake DEBUGGER=true

if you're using rake to run your tests or

DEBUGGER=true ruby -Itest test/unit/some_test.rb

if you're running an individual test (this might change according to what shell you use). In fact most of the time you won't even need to do this, as calling debugger calls Debugger.start for you.

When cache_classes gets you down

December 28th, 2008

As I've mentioned before, in Rails 2.2 if config.cache_classes is true (which it is in production) then all of your models, controllers etc are loaded up as part of application initialization. This is great if you're firing up a server that should actually be handling requests as it avoids all sorts of nasty race conditions to do with requiring files and defining classes in ruby. It's not so great if you're running a small script, cron job or something like a migration.

The reasons are two-fold. First off, it makes startup slower. This is irrelevant when starting up a web server because it is just getting out of the way stuff you would do anyway but it is time wasted if you've just got a single threaded script that doesn't care about dependency race conditions (and probably only touches one or two models anyway), especially if the script only takes a second or two to run once the environment is loaded. Secondly it can break stuff if models or controllers try and look at the database when the class is loaded. One example of this is ActiveScaffold. For those of you not familiar with it, ActiveScaffold helps you create adminy CRUD style interfaces. A very basic controller might look like this:

class ProductsController < ApplicationController
  active_scaffold
end

When the controller is loaded ActiveScaffold sees that it's working on the ProductsController, infers that the corresponding model is Product and goes off to find out what columns that model has. Imagine you've just added that model and controller and it's time to deploy on your production machine. At this point the products table doesn't exist (since the migration hasn't run yet). Since you're running in your production environment when the rake task causes rails to be loaded it will load your application's classes, including ProductsController which will try and introspect the products table. Fail.

There are two workarounds I can think of. One is a separate rails environment: define an environment with cache_classes set to false but that still uses the production database. Instead of running your migrations in the production environment run them in the production_without_cache_classes environment. This works but can be a bit annoying, especially for adhoc scripts and stuff like that and is even more of a pain if you have multiple productiony environments (eg a staging environment).

Another is to edit your production.rb so that it reads

if defined?(DONT_CACHE_CLASSES) && DONT_CACHE_CLASSES
  config.cache_classes = false
else
  config.cache_classes = true
end

Then for any script which you want to run without class caching turned on stick DONT_CACHE_CLASSES=true at the top (before you require config/environment). If you want to extend this to your rake files then edit the Rakefile at the top level of the app. This feels slightly neater to me as I don't have to remember to fiddle with the environment when running the script.

Another variation upon this is to use an environment variable instead of a constant which would allow you to do things such as rake db:migrate CACHE_CLASSES=false or similar (although of the default tasks I can't of one you'd run in production where you would want class caching to be on).

This is far from beautiful but appears to get the job done for now.

Dates, params and you

December 3rd, 2008

A not particularly nice area of Rails are the date and time helpers. 3 popups just isn't a very nice bit of user interface. It's a lot of clicks when you want to change dates and most people can't reason in their head about just the date. It's far easier to pick a date from a calendar type view. Still the helpers rails provides are fine for that quick and dirty date input.

Based on the questions on the mailing list about this, the thing that trips people up is that, unlike other attributes you might typically have, dates and times are not representable by a single input control. Instead you have several, one for each component (year, month, day etc...). So in particular, there is no single value in your params hash with your date or time. Exactly what is in your params hash depends on whether your using select_date or date_select (if you're entering a datetime, select_datetime or datetime_select).

These are to each other as text_field_tag is to text_field: date_select is expecting to hook up to an attribute of an instance variable (or if you use form_for or fields_for an attribute of the corresponding object) whereas select_date isn't. However unlike the other pairs of functions like textfield_tag/text_field, select/select_tag these two send very different parameters through to your controller.

select_date is perhaps the easiest to understand. It will result in a hash (by defaults it is named "date", but you can override this with the :prefix option) with keys like year, month, day. You can then put those together to get an instance of Date or Time. For example the following in your view

<%= select_date Date::today, :prefix => 'start' %>

will result in a params hash like this:

{:start => {:year => '2008', :month => '11', :day => '22'}}

As I said, there is nothing in the params hash that is the actual value. You have to put it together yourself, for example

my_date = Date::civil(params[:start][:year].to_i, params[:start][:month].to_i, 
                      params[:start][:day].to_i)

A bit more work than you average parameter, but there's nothing mysterious going on here. Under the hood, select_date is also quite boring: it's just calling select_year, select_month, select_day with appropriate options and concatenating the result. A consequence of that is that if you want some odd combination (eg just months and seconds) you can just do that concatenation work yourself. One interesting thing about those subhelpers is that the first parameter you give them can be one of two things:

  • an integer in which case the corresponding day/month/year is displayed (eg 3 for March)
  • something like a Date or DateTime in which case the relevant date component is extracted from it

date_select is where the fun is. Here the expectation is that there is a model object we will want to update and we want to be able to do

my_object.update_attributes params[:my_object]

However update_attributes just wants to set attributes. If you pass it {'foo' => 'bar'} it will try and call the method foo= passing bar as a parameter. For a date input that is made up of these multiple parameters this is clearly a problem. What solves this is something called multiparameter assignment. If there are parameters whose name is in a certain format, then instead of just trying to call the appropriate accessor Rails will gather the related parameters, feed them through a transformation function (for example Time.mktime or Date::new[1]) and then set the appropriate attribute.

The format used is as follows: all the related parameters start with the name of the attribute which lets Rails know they are related. Next Rails needs to know in what order to pass them to the transformation function and whether a typecast is needed. If your view contained

<%= date_select 'product', 'release_date' %>

Then your parameters hash would look like

{:product => 
        {'release_date(1i)' => '2008', 'release_date(2i)' => '11', 'release_date(3i)' => '22'}}

Rails can look at this and see that this is to do with the release_date attribute. It's a date column, so rails knows to use Date::civil. The suffixes tell rails that 2008 is the first parameter to Date::civil and is an integer, that 11 is the second parameter and so on. Rails constructs the value using Date::civil(2008,11,22) and assigns that to release_date.

If you don't intend to pass the parameters to update_attributes (or other functions with that syntax such as the new or create methods on an ActiveRecord class) there's not a lot of point in putting up with the scary parameter names althouh you can of course construct the date yourself with

Date::civil(params[:product]['release_date(1i)'].to_i,
 params[:product]['release_date(2i)'].to_i, 
params[:product]['release_date(3i)'].to_i)
You might as well just use select_date and have readable parameter names though.

So, to sum up use date_select or datetime_select when creating/updating ActiveRecord objects but select_date or select_datetime for just a general purpose date input. As a closing tip, with select_datetime you can use the :use_hidden option in which case hidden form inputs are generated instead of select boxes.[2]


[1] There's a bit more to this. For one the range of times representable by a Time object is limited on most platforms (since it's commonly a 32 bit number of seconds since an epoch). Rails has some conversion code that will try and create an instance of Time but if necessary will fall back and create a DateTime object. Secondly there's some cleverness to do with interpreting the user's input with respect to the correct time zone.

[2] This is (I think) a slight misuse. The intent of the use hidden is that it is the mechanism by which the :discard_day and so on work

with_options for fun and profit

November 25th, 2008

Active Support has a nifty little helper that can cut down on repetition. A lot of things in Rails like validations, associations, named_scopes, routes etc... take a hash of options as their final parameter. There are times where you use many of these with some common options, for example

class Customer < ActiveRecord::Base
  validates_presence_of :phone_number, :if => :extended_signup
  validates_presence_of :job_title, :if => :extended_signup
  validates_presence_of :job_industry, :if => :extended_signup
end

or maybe you have a bunch of associations which all share an association proxy extension module and some settings

class Customer < ActiveRecord::Base
  has_many :foos, :extend =>MyModule, :order => "updated_at desc", :conditions => {:active => true}
  has_many :bars, :extend =>MyModule, :order => "updated_at desc"
  has_many :things, :extend =>MyModule, :order => "updated_at desc"
end

Enter with_options!

class Customer < ActiveRecord::Base
  with_options :extend => MyModule, :order => "updated_at desc" do |options|
    options.has_many :foos, :conditions => {:active => true}
    options.has_many :bars
    options.has_many :things
  end
end

Any time you've got a bunch of method calls taking some common options, with_options can help.

So how does it work on the inside? All the with_options method actually does is yield a special object to its block - all the craftiness is in that object. What we want that object to do is forward method calls to the object we're really interested in (in this case the Customer class) adding the options before it does so.

As with many such proxy objects we undefine just about every method and just implement method_missing. The implementation of method_missing inspects the arguments and merges any options present with the common set defined by the call to with_options (so individual methods can take extra options or override the common ones) before passing them onto the "real" object.

Originally a limitation was that the last argument just had to be a hash, so for example if you had a procedural named_scope then with_options couldn't work. Luckily a recent commit rectifies this: if the thing you're trying to merge with is a proc, then with_options will replace it with a new proc that merges the common options with the result of the call to the original proc. If you're not on edge you'll have to wait for 2.3 in order to get this.

While both the examples I gave showed using with_options on what is essentially model class configuration it is by no means limited to that. You could use it for that sort of configuration on your own classes or just inside a regular method - anytime you are making several method calls on the same object with a hash of options at the end.

In the before time, the bottom of most of my apps' environment.rb was an unholy mess. Inflector rules, requiring of various libraries or gems, various bits of app specific configuration etc... all jumbled together. Rails 2.0 introduced initializers: any file in config/initializers is run at an appropriate time during the initialisation process. You get to split that mess into a handful of well organised, single purposed little files (and rails 2.1 simplified the case of requiring gems with the config.gems mechanism).

You might still have a few stragglers though, that one require that you didn't bother moving into an initializer because it hardly seemed worth creating a whole file just for that one line. With the imminent release of Rails 2.2 it's high time you made that change.

Living Thread Dangerously

Unless you've been living under an internet-proof rock you've probably heard about Rails' new threadsafeness. There's a bunch of hard work across the framework that's gone into making this possible, but one particular area is to do with loading code. Ruby's require mechanism isn't threadsafe (or as it has been put to me, it's thread dangerous) nor is the automatic loading stuff Rails' uses. For example say two threads both hit a constant called Foo that has yet to be loaded. Thread 1 starts loading foo.rb and gets as far as

class Foo < ActiveRecord

At this point Thread 2 hits Foo. However at this point the constant Foo now exists and so Thread 2 doesn't load foo.rb. However since Thread 1 hasn't yet processed the rest of foo.rb the Foo class will be missing all its instance methods, validations etc... If both threads end up loading foo.rb at the same time then weird things can happen like validations being added twice and so on. It can also cause the dependencies system to spuriously claim it couldn't find a constant. It's a small world of pain you don't want to get involved in. Making require threadsafe is fundamentally hard (and is something the ruby-core and jruby folks have been worrying about).

What Rails 2.2 does in production mode is load all of your models, controllers and so on as part of the initialization process instead of loading them as they are needed. No more loads from different threads when your app is actually running, no more pain.

The Bad Thing

So, how does this connect with the statement I made above about moving things into initializers? Your average environment.rb file looks a little like this

#set some constants like RAILS_GEM_VERSION
require File.join(File.dirname(__FILE__), 'boot')
Rails::Initializer.run do |config|
  #set some config settings
end
#if you're old school, app configuration here
require 'some_dependency'

The bulk of initialization happens when you call run. This yields to the block to allow you to set the various settings (and also reads the appropriate environment file and so on) but the key thing is that by the time that function has returned, all of the initialization has happened.

In particular, Rails will try to load all of your application classes before the stuff at the bottom of environment.rb has been executed. If a model depends on some_dependency.rb being loaded (for example if that file added a validation that it uses) then your app will die before it even finished initialising.

If however you're a good person and move these things into initializers (i.e. files in config/initializers) then they will run at an appropriate time in the boot process (i.e. before Rails loads up all your application classes) and you won't get an unpleasant surprise when you try and deploy your app.

First, foremost and [0]

November 15th, 2008

This doesn't work (the something field will not be updated):

post.comments.first.something = true
post.comments.first.save

but this does

post.comments[0].something = true
post.comments[0].save

as does

post.comments.each {|c| puts c.id}
post.comments.first.something = true
post.comments.first.save

Both of these would have worked in rails 2.0.x and previous versions. So what changed?

Quacks like a duck but breathes fire

As you may know, post.comments looks an awful lot like an array but isn't an array. It's an association proxy. It has methods defined on it for things like finding objects from the database, the count method that does an sql count and things like that. When you ask it to do something that can only really be done by having the ruby objects in memory it will load the objects from the database into an actual ruby array and pass methods onto that (this all happens via method_missing). So far business as usual, rails has been like this for a long long time. In particular were you to call the first or the last methods on an association then the array would be loaded and first would be called on that array.

This sort of depends on your problem domain, but a lot of the time loading the entire array just to look at the first or last element of it is wasteful. You've always been able to do some_association.find :first (and as of 2.1 some_association.find :last) but that flows a little less easily off the tongue and of course doesn't play nicely if you pass your association to some code that thinks it's just working with an array. So a few months ago changes were made to make first and last load just that one item from the database[1]. Of course if the target array is already loaded then it just returns the first item from that array.

At the end of the day what that ends up meaning is that in the first example I gave, each call to post.comments.first returns a different object, ie the one that we call save on is not the same as the one we made the change to. The second and third examples are ok purely because they force the array to be loaded which in turn means that calls to first no longer hit the database in that way[2].

Of course if you're doing things right your unit tests would catch this sort of thing, but it's still likely to leave you scratching your head a little (I certainly recall spending a few minutes looking at code very similar to the first example and wondering why it no longer worked). Slightly more subtle are performance problems, for example if you were iterating over various attributes then you'd be hitting the database each time to load somethings.first.

I'm not sure what to think about this sort of thing. There is a perfectly sound rationale for doing this but it introduces little ifs, buts and maybes into the illusion that association proxies behave like Arrays. As far as performance goes the implications vary. For big associations it can be a huge win, other times loading 1 object instead of 3 will make little to no difference. In other places I do genuinely want to load the whole array but I'd rather write first than [0] if I'm accessing the first element. Maybe the example I gave is a little artificial, maybe not, but at the end of the day, first no longer being a synonym for [0] is a habit that is hard to break.


[1] I'm simplifying things quite considerably here - there are a number of edge cases which that code has to tread around quite carefully, including unsaved parent objects, unsaved children objects, custom finder sql etc... [2] While I've concentrated on first, everything I've said here applies to last too. In a way it's slightly worse in that (at least for me) the difference in comfort between writing last and [-1] is greater than the difference between first and [0]

Few things are more head bashing inducing than code that passes all unit tests, runs perfectly on your development machine but fails on your staging/production servers. In that vein, both of these examples are wrong:

class Person < ActiveRecord::Base
  has_many :posts
  has_many :recent_posts, :class_name => 'Post', :conditions => ["created_at > ?", 1.week.ago]
  validates_inclusion_of :birth_date, :in => (20.years.ago..13.years.ago), 
                            :message => "You must be a teenager to signup", :on => :create
end

class Post < ActiveRecord::Base
  named_scope :recent, :conditions => ["created_at > ?", 1.week.ago]
end

In development mode this will work absolutely fine. When you deploy this code onto the production server it will work fine too, but after a while it won't behave quite right. For example Person.recent_posts will start returning posts older than 1 week.

The key to this is understanding when the code runs. In particular when does "1.week.ago" get turned into an instance of Time with some fixed value such as 1st November 2008 at 20:32?

The statements has_many, validates_inclusion_of etc... are just method calls, so their arguments are evaluated when that function is called. You can look in the options hash for an association to see this (assuming you've just typed in the Person class given above):

Person.reflections[:recent_posts].options
=> {:conditions=>["created_at > ?", Sun Nov 02 14:27:26 +0000 2008]}

So when are these functions called? Quite simply when ruby loads person.rb. In development mode your source code is reloaded for each request[1], providing the illusion that the "1.week.ago" is re-evaluated whenever it is used. In production mode person.rb would only be read once per Rails instance and so once your mongrels had been running for a week Post.recent_posts would return anything written in the last 2 weeks (1 week before the date at which your mongrels were launched). You would also notice this if you were running script/console and keeping an eye on the sql generated: you'd see that the date in the WHERE clause didn't change.

Fixing it.

Fortunately it's not hard to fix this. In this case of the awesome named_scope you probably already know that you can supply a Proc for when you want your scope to take arguments. We can equally make one with no arguments, just to ensure that the time condition is evaluated whenever the scope is accessed.

class Post
  named_scope :recent, lambda { {:conditions => 1.week.ago}}
end

For conditions on things like associations we can use a little trick called interpolation. As I'm sure you know when ruby encounters "#{ 'hello world' }" it evaluates the things inside the #{}, but if you use single quotes (or equivalently things like %q() then it doesn't. What you may not know is that Active Record will perform that interpolation again at the point where sql is generated. For example we can write the recent posts associations like this:

class Person < ActiveRecord::Base
  has_many :recent_posts, :class_name => 'Post', 
           :conditions => 'created_at > #{self.class.connection.quote 1.week.ago}'
end

When person.rb is loaded the stuff in the #{} will not be evaluated, however when Active Record generates the sql needed to load the association it will be[2].

Validations can't play any of the clever little games that the other 2 examples can. You'll just have to something like

class Person < ActiveRecord::Base
  validate_on_create :is_a_teenager

  def is_a_teenager
    unless birth_date < 13.years.ago && birth_date > 20.years.ago
      ...
    end
  end
end

[1] Assuming you've got config.cache_classes set to false in development mode which is the default

[2] You can do a lot more with interpolation. Normally the code is interpolated in the context of the instance of the model so you can use any model methods, instance variables etc... When an association is fetched with :include it will be interpolated in the context of the class (since the whole point is to bulk load instances it does not make sense (nor would it work) to work per instance data in there.

Required or Not ?

September 28th, 2008

One of Rails' slightly gnarly areas is all the magic that goes into enabling the automatic reloading of source in development mode[1]. Reloading a class isn't just as simple as reading the source again: that would just reopen the class. While this would allow you to add or change existing methods, it wouldn't allow you to remove methods, change the class an object inherits from, stop including a module and things like that. In the particular context of Rails this would also cause validations, filters and callbacks to be added repeatedly. You also don't want to reload absolutely everything. For example reloading standard ruby libraries would be pointless (and slow) as would be reloading Rails itself and (usually) plugins.

A related service that Rails' dependencies system provide is autoloading of constants. Rails hooks const_missing: when an unknown constant is found Rails will try and determine the name of the file containing it (according to Rails' conventions) and search for it in the appropriate folders. After a request (or when you call reload!) Rails unsets the constant. This means that reading the corresponding file again will create a new class rather than reopening the old one. It also means that the next use of that constant will cause const_missing to be hit again and load the class.

require messes with reloading

The long and short of this is that Rails needs to track what needs to be reloaded (i.e. which constants it should remove). When a file is loaded via Rails' dependency system, all the constants are stashed away, in Dependencies.autoloaded_constants[2]. At the end of the request all of those constants are removed. But if you have bypassed the Rails dependency system then it won't get that treatment. Here's an example script/console session

>> Customer.object_id
=> 19116470
>> reload!
Reloading...
=> true
>> Object.constants.include?('Customer')
=> false
>> Customer.object_id
=> 18966210

The reload! function does the reloading that Rails would do at the end of a request. Here everything is happening as normal: we've let Rails handle the loading and after the reload the Customer constant is removed, ensuring we then get a fresh copy of the Customer class. Now lets try something different: explicitly require customer.rb:

>> require 'customer'
=> ["Customer"]
>> Customer.object_id
=> 19121220
>> reload!
Reloading...
=> true
>> Customer.object_id
=> 19121220

Lo and behold: the Customer class isn't being reloaded. Had you done this in a real app you would find that changes to the customer file weren't being picked up until you restarted the server. Even more confusingly it would be fine until you loaded a file that did such a require but thereafter changes would have no effect, even on pages where previously it worked.

Fun with associations

A lot of problems happen when you have something hanging onto an old version of a class. One way that can happen in a Rails app is via associations. Suppose our Customer class has an orders association.

>> require 'customer'
=> ["Customer"]
>> Customer.find(1).orders
=> [#<Order id: 1, customer_id: 1>]
>> Order.object_id
=> 18291410
>> Customer.reflections[:orders].klass.object_id
=> 18291410
>> Customer.reflections[:orders].klass.instance_methods - ActiveRecord::Base.instance_methods
=> ["build_customer", "create_customer", "belongs_to_before_save_for_customer", "customer", 
"customer=", "my_instance_method", "set_customer_target"]

Everything is as we would expect it. Customer.reflections[:orders] returns an AssociationReflection object which is something that describes an association. It holds data like what kind of association it is, any options that were supplied (eg :foreign_key, :counter_cache) and so on. In particular its klass attribute is the ActiveRecord::Base subclass for the association. Here we can see that that class is the same as Order which we would expect.

The association's class has the methods you would expect: some methods to deal with the customer association that Order has and an instance method we added. So far so good. Lets reload the code:

>> reload!
Reloading...
=> true
>> Customer.find(1).orders
=> [#<Order id: 1, customer_id: 1>]

Superficially things look fine, but if we dig a little deeper, everything has gone horribly wrong. The first clue is this:

>> Order.object_id
=> 18680200
>> Customer.reflections[:orders].klass.object_id
=> 18291410

This tells us that the Order class is no longer the same class as the class referenced by the association. Because Order was loaded via the Rails' dependencies system it was reloaded when we did reload! but as we saw before Customer isn't. This causes quite a few problems, for example

>> Customer.find(1).orders << Order.new
ActiveRecord::AssociationTypeMismatch: Order(#18291410) expected, got Order(#18680200)

Oh noes! When you add a record to a collection Active Record checks that it is of the correct type, but the Customer class is trying to check that the object is an instance of the old Order class, which it isn't. The fun thing about this sort of situation is that it will work fine the first time you view the page after restarting the server, but not the second or following times. Madness!

There's more stuff too. If we repeat our earlier test to list the instance methods of the association's class we get this:

>> Customer.reflections[:orders].klass.instance_methods - ActiveRecord::Base.instance_methods
=> []
>> Customer.find(1).orders.customer
NoMethodError: undefined method `customer' for #<Class:0x23e34a4>

They've all gone. This can be more than a little baffling, when a page works fine but reloading it causes methods you know exist to just disappear into thin air. The culprit here is the reset_subclasses method in Active Record, which as its name implies, clears out classes. It only does this to autoloaded classes, which normally is fine because such classes are just thrown away and never used again, but we're hanging onto this gutted class and trying to use it[3]. Even if this gutting of classes didn't happen you'd still have a lot of confusion: instances of Order retrieved via the association would be the old class and so wouldn't reflect any changes you had made to the source, but instances created directly would.

Just don't do it

By now you've probably got the message that using require to load your models can cause some weird stuff to happen. Loading classes behind Rails' back just gets things confused. There are two ways to stop this happening:

  • Just don't require stuff. If you lets Rails' automagic loading do its work none of this will happen
  • If you do need to require stuff explicity, use require_dependency. This means that Rails is kept in the loop

Of course require is fine for requiring gems, bits of standard libraries and so on, but using require to load bits of your own application should be viewed with suspicion. It only takes one require somewhere to mess things up, so be careful.


[1] Or to be quite precise, when config.cache_classes is set to false. If it is set to true (for example in production mode) nothing in this article applies

[2] In Rails 2.2 and higher, Dependencies was moved into the ActiveSupport namespace. If you're running that version mentally prepend ActiveSupport:: wherever you see Dependencies. There are a lot of other settings in there that control all of this, for example load_once_paths and explicitly_unloadable_constants allow you to control what is reloaded and what isn't.

[3] As far as I can tell and according to this thread the exact reason this is necessary is rather lost in the mists of time, possibly an artefact of previous implementations of Rails' dependencies.

Selenium and Firefox 3

September 27th, 2008

I recently spent a bit of time making our Selenium tests play nicely with Firefox 3 and spent quite a lot of time starting at

Preparing Firefox profile...

Selenium would launch Firefox, and then Firefox would just sit there doing nothing. Eventually some digging around found a ticket on the Selenium issue tracker. It turns out Selenium installs a tiny little extension into the Firefox profiles it generates that basically just lets selenium kill firefox by telling it to go to a magic chrome url. Firefox extensions specify which versions they are compatible with and the one embedded in selenium had 2.0.0.* as their maximum version (and this is still the case with the latest downloadable release (although you could of course download the nightly builds)).

It seems that this was the only thing from keeping selenium and Firefox 3 playing nicely together as changing the maximum version to 3.0.* got all our tests passing again with our existing version of selenium (0.9.2).

All I had to do was extract the relevant files from selenium-server.jar:

jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf 
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf 
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

This extracts the files (and the directory structure containing them). To be honest I'm not entirely sure of the difference between all of these extensions - safest bet seems to be changing them all. Now edit all of the .rdf files (they're just text files) and change the maximum version from 2.0.0.* to whatever you want (for example 3.0.*) and put them back in the jar:

jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf 
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf 
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

Voila! all done

RailsConf presentation code

September 8th, 2008

The code from our presentation is now available here. It's worth a look even if you weren't at our presentation.

Knock yourselves out!

Thoughts on Jeremy's Keynote

September 6th, 2008

I really enjoyed Jeremy Kemper's talk on wednesday. The sort of talk that has you itching to run home and try out what you've seen. All good stuff. For those of you who weren't there, Jeremy was talking about performance.

The key point is that it's all about the user experience: how fast do our users think the app is? Part of it is your ruby code (and Jeremy had plenty to say about that, with GC tips, profiling tips etc...) but a huge chunk is the network. A common trick is to bundle up your assets, what with the limit on the number of concurrent loads from a domain and the part that latency plays, loading 1 medium or biggish javascript (or stylesheet file or whatever) is almost always preferable to 5 small files. With 2.1 Rails makes this easy and will bundle up your js for you, but there's another trick you can play.

You could stick assets like your javascript files on some sort of content distribution network, close to your users wherever they are. This would be an inordinate amount of effort to go to though just to host a few javascript files. Luckily, google has done that for you with their Ajax Libraries API. They are hostting copies of common javascript frameworks, including prototype, scriptaculous, dojo, jquery and mootools. You get to use their content distribution network and can assume they've got all the caching and compression stuff done right. The libraries are all versioned too (ie you say 'give me prototype 1.6.0.1'), so no worries about that.

There is of course another advantage: instead of your browser caching a copy of prototype.js from every site that uses it, your users will only be caching it once (per version). Chances are when they come to your site it will already be in the cache.

Jeremy also mentioned the importance of putting yourself in your users shoes and seeing what your webapp is like when viewed through a slow or high(er) latency connection. I described one way of doing this alternative a few months ago and it really can be quite the eye-opener.