Space Vatican

Ramblings of a curious coder

Spork and Parallel_tests

What’s more terrifying than being stabbed in the face with a spork? Being stabbed in the face with 2 sporks!

In these days where even laptops are dual or quad-core parallel_tests is a great tool for using all of those cores when running specs. Spork is another great tool for avoiding the overhead of loading rails, gems etc. when running tests.

Out of the box spork and parallel_tests don’t work together. When spork receives a request to run specs it aborts its current run, so only one of your parallel_spec instances actually gets to run. In addition parallel_spec’s trick of setting an environment variable to indicate which test environment should be used doesn’t work, because the runners forked by spork don’t inherit the environment parallel_spec sets up.

Fortunately there are at least two ways to get around this. I assume that you’ve already got parallel_tests up and running with the required extra databases, changes to database.yml etc.

Method 1: Multiple sporks

Grab the right version of parallel_tests

Make sure you’ve got parallel_tests 0.5 or later (Previous versions deal with the .opts file in a different way)

Gentlemen, start your sporks

Run as many sporks as you need. Run the first one on port 8989 (the default), the second on port 8991, the third on 8992 and so on. You should set TEST_ENV_NUMBER in each one, so your invocations should look like

1
2
3
TEST_ENV_NUMBER='' spork -p 8989
TEST_ENV_NUMBER='2' spork -p 8991
TEST_ENV_NUMBER='3' spork -p 8992

and so on

Setup parallel_spec.opts

Edit spec/parallel_spec.opts so that it looks like

--drb
--drb-port <%= 8989 + ENV['TEST_ENV_NUMBER'].to_i %>
--format progress
--format ParallelSpecs::SpecRuntimeLogger --out tmp/parallel_profile.log

Profit:

Run parallel_spec. When rspec is invoked it passes the .opts file through erb so the first instance (TEST_ENV_NUMBER=‘’) will have –drb-port set to 8989, the second to 8991 and so on. Each rspec will connect to a different spork instance that has the correct TEST_ENV_NUMBER setup.

Method 2: Single hacked spork

A disadvantage to method 1 is that you have multiple instances of spork sitting around, consuming memory. It requires extra keystrokes to ensure each spork instance is restarted properly and more cpu cycles are consumed everytime you restart spork.

Use my spork

My fork of spork doesn’t prohibit running multiple spec runs concurrently and allows environment data parallel_tests requires to be passed in. I’ve only modified the forking strategy.

Setup script/spec

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env ruby
require 'rspec'
require 'rspec/autorun'

module RSpec
  module Core
    class DRbCommandLine
      def run(err, out)
        begin
          DRb.start_service("druby://localhost:0")
        rescue SocketError, Errno::EADDRNOTAVAIL
          DRb.start_service("druby://:0")
        end
        spec_server = DRbObject.new_with_uri("druby://127.0.0.1:#{drb_port}")
        spec_server.run(@options.drb_argv, err, out, 'TEST_ENV_NUMBER' => ENV['TEST_ENV_NUMBER'])
      end
    end
  end
end

This ensures that when parallel_spec runs script/spec, the test environment number is passed through to the chikd

Make sure the right database is used

When spork forks a child with a specific TEST_ENV_NUMBER we need to ensure that the child connects to the correct database. This can be done with a Spork.each_run block in your spec_helper.rb. The app I’ve been working with uses mongomapper as well as Active Record, so my each_run block looks like this.

1
2
3
4
5
6
7
8
9
  if !ENV['TEST_ENV_NUMBER'].blank?
  #reconnect to your AR & mongo dbs
    db_config = YAML::load(ERB.new(File.read(Rails.root.join("config", "mongo.yml"))).result)
    if db_config[Rails.env]
      MongoMapper.setup db_config, Rails.env
    end
    ActiveRecord::Base.configurations = DressipiCom::Application.instance.config.database_configuration
    ActiveRecord::Base.establish_connection
  end

Profit

bundle exec spork and run parallel_spec as normal (things won’t work if the ‘normal’ version of the spork binary is run). Your spork will fork multiple instances which will pick up TEST_ENV_NUMBER and reconfigure their database settings as needed.

Sporking With Akephalos

Spork is a test server for saving on environment startup time when running rspec or cucumber. Akephalos is an awesome capybara driver that wraps the htmlunit library. Because htmlunit is a java library, akephalos spawns a jruby process and then uses DRb to allow non-jruby instances to get at the htmlunit goodness.

If you try and combine the two, you get a somewhat uninviting error message:

Fs-ElephantBook-2:dressipi fred$ cucumber --drb features/user_signs_up.feature 
Using the default profile...
Disabling profiles...
.F---------------------------

(::) failed steps (::)

0x00000083df8904 is not id value (RangeError)
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:375:in `_id2ref'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:375:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1405:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1713:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:613:in `recv_request'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:908:in `recv_request'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1533:in `init_with_client'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1545:in `setup_message'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1497:in `perform'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1592:in `block (2 levels) in main_loop'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1588:in `loop'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1588:in `block in main_loop'
(druby://127.0.0.1:60779) -e:1
./features/step_definitions/authentication_steps.rb:31:in `block (2 levels) in '
./features/step_definitions/authentication_steps.rb:30:in `each_pair'
./features/step_definitions/authentication_steps.rb:30:in `/^I sign up with$/'
features/user_signs_up.feature:10:in `When I sign up with'

Which a shame as spork would be especially useful here because akephalos can take quite a few seconds to launch its jruby process.

DRb

First let’s step back and remind ourselves how DRb works. DRb (short for distributed ruby) is a distributed object system for ruby. In the simplest of cases, when you call a method on a remote object, your arguments are marshalled and transmitted over the wire. In the remote process, the arguments are unpacked, the method called and then the result is marshalled and sent back to you.

Obviously this doesn’t always work - some objects (procs, IO streams etc.) can’t be dumped. A very common example would be if you want to call a method and pass a block. So, in addition to the previously described scheme, DRb also allows making proxies for objects. Instead of getting or passing a marshalled dump of an object it uses a proxy that knows the process the object has come from (a uri like druby://127.0.0.1:8990) and something identifying the object in that process. By default it uses the object_id for the object, which DRb can then use to find the actual object. This can require some careful coding, as if in the remote process the object appears unused it might get garbage collected. If this happens then when you try to call a method on it DRb won’t be able to turn the object_id back into an actual object and you get an exception like the one above.

When you run code such as

1
some_drb_object.do_foo { }

Then there are actually DRb calls happening in both directions. The client asks the server process to run do_foo, but because there is a block passed to the method DRb wraps up that block in a proxy object and sends it to the server along with the arguments. When do_foo calls yield, a DRb call happens in the reverse direction in order to invoke the block.

Akephalos and Spork

Back to the original problem. When you run some specs or cucumber features with spork, spork forks off a new instance and then uses DRb to run your specs/features in that process. It’s the combination of that use of DRb with akephalos’ use of DRb that causes the problem. After digging around I traced the exception raised to this line in akephalos:

1
page.find(selector).each { |node| nodes << Node.new(self, node) }

page is a DRb object from akephalos’s jruby process that wraps the htmlunit representation of the page. A little more digging revealed that the invalid id was in fact referring to the block passed to each. You may have noticed that the error is being raised from druby://127.0.0.1:8990, ie from spork’s parent process.

There was already a DRb server running in spork parent process ( druby://127.0.0.1:8990 ), and because our test running process was forked from that process, when DRb packaged up that block into a proxy object it gave the uri for that server as the uri to use when making calls to the block. When each yielded to the block, instead of the method call going to the process that called each, it went to the spork parent process. Because the call went to the wrong process, the object id passed is entirely meaningless.

Fixing it

I’m not entirely sure of the best way to fix this - akephalos is blissfuly ignorant of this aspect of its environment. I was able to get my cucumber features running with both spork and akephalos by doing this in my env.rb

1
2
3
4
5
6
7
8
9
Spork.prefork do
  ... #my usual Spork.prefork stuff
  Akephalos::RemoteClient.manager #force akephalos to be started by the parent process
                                  #so that we don't keep starting it again and again
end

Spork.each_run do
  Thread.current['DRb'] = { 'server' => DRb::DRbServer.new }
end

This ensures that when DRb is constructing a proxy object it uses the uri for the freshly constructed DRbServer as the uri for any proxy objects it passes, and so block invocations and the like go to the correct process.

Adding to Rails’ Log File Performance Breakdown

You’ll all have seen lines like this in your log files:

Completed 200 OK in 14ms (Views: 0.2ms | ActiveRecord: 2.7ms)

While obviously a very high level view of your application’s performance, it’s still a good place to start when you’re wondering where the time has gone. If you’re using a different datastore or if external services (solr, sphinx, external API calls) are important contributors to your application’s performance you might want time spent in those services listed too.

Prior to Rails 3, it was hard to add information to this summary without making a mess of things. What was reported on was just hardcoded into Action Controller. Things are a lot easier in Rails 3.

New in Rails 3 is ActiveSupport::Notifications. The idea here is to decouple things producing performance data and things consuming them.

1
2
3
ActiveSupport::Notifications.instrument('some_task.my_app', data) do
  #do some processing
end

will record how long it took to do the task, and then yield that data to anyone interested (instances of subclasses of ActiveSupport::LogSubscriber). Action Controller’s process action is wrapped by a call to instrument and one particular subscriber to that is one that writes what Action Controller has been up toto the app’s log file. If you wanted to analyze your app’s data other than by eyeballing the log files you could write your own class that could subscribe to these notifications and do whatever processing you wanted, instead of writing something to parse your log files.

A simple example

Suppose we want the time we spend query solr to be logged. Following the way Active Record does things, you need to do three things: instrument the action we wish to monitor using ActiveSupport::Notifications, record the information in a subscriber and then plug the resulting data into Action Controller.

At its barest, the instrumentation might look like this

1
2
3
4
5
6
7
8
9
10
module Solr
  class Connection
    def send_with_instrumentation(request)
      ActiveSupport::Notifications.instrument("send_query.solr") do
        send_without_instrumentation(request)
      end
    end
    alias_method_chain :send, :instrumentation
  end
end

Here send is a funnel through which all queries to solr flow.

The log subscriber might look a little like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
module Solr
  class LogSubscriber < ActiveSupport::LogSubscriber
    def self.runtime=(value)
      Thread.current["solr_runtime"] = value
    end

    def self.runtime
      Thread.current["solr_runtime"] ||= 0
    end

    def self.reset_runtime
      rt, self.runtime = runtime, 0
      rt
    end

    def send_query(event)
      self.class.runtime += event.duration
    end
  end
end

Solr::LogSubscriber.attach_to :solr

The call to attach_to means that whenever a foo.solr event is fired, our subscriber’s foo method will be called. The only thing we do in our subscriber is keep track of how much time has been spent, using the runtime and runtime= method. reset_runtime resets the counter and returns the time accumulated so far.

The last thing we need to do is use the hooks that Action Controller helpfully provides. The following is lifted pretty much line for line from Active Record[1]:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
module Solr
  module Railties
    module ControllerRuntime
      extend ActiveSupport::Concern

    protected

      attr_internal :solr_runtime_before_render
      attr_internal :solr_runtime_during_render

      def cleanup_view_runtime
        self.solr_runtime_before_render = Solr::LogSubscriber.reset_runtime
        runtime = super
        self.solr_runtime_during_render = Solr::LogSubscriber.reset_runtime
        runtime - solr_runtime_during_render
      end

      def append_info_to_payload(payload)
        super
        payload[:solr_runtime] = (solr_runtime_before_render || 0) +
                                 (solr_runtime_during_render || 0) +
                                 Solr::LogSubscriber.reset_runtime
      end

      module ClassMethods
        def log_process_action(payload)
          messages, solr_runtime = super, payload[:solr_runtime]
          messages << ("Solr: %.1fms" % solr_runtime) if solr_runtime
          messages
        end
      end
    end
  end
end
ActiveSupport.on_load(:action_controller) do
   include Solr::Railties::ControllerRuntime
end

cleanup_view_runtime is used to prevent double counting of time spent performing database queries etc. as both database time and view rendering time. The call to super in cleanup_view_runtime will wind up actually rendering the view, so looking at the solr time before and after this tells us how much time we need to deduct from the view render time (which is what the call to super returns).

append_info_to_payload is where we record how much time has been used in total, which is the sum of solr time before, during and after rendering the view.

log_process_action (a class method) acts on a payload hash and produces an array of components such as ‘ActiveRecord: 32.2ms’.

Finally we add an on_load hook that will include this module in ActionController::Base.

Put all this together and you should see stuff like

Completed 200 OK in 17ms (Views: 10.8ms | ActiveRecord: 0.3ms | Solr: 3.7ms)

in your logs.


[1] At the time of writing rails 3.0.7 actually does this incorrectly because it doesn’t consider time after rendering, or when no view was rendered at all

Showing View Spec Exceptions

When an exception is raised during rendering of a view it is wrapped inside an ActionView::Template::Error which is then reraised.

This leads to unhelpful failure messages in view specs, along the lines of

contact/index.html.haml renders a form to send feedback
     Failure/Error: it "renders a form to send feedback" do render
     ActionView::Template::Error
     # ./app/views/contact/index.html.haml:19
     # ./spec/views/contact/index.html.haml_spec.rb:10:in `block (2 levels) in '

This little monkey patch

1
2
3
4
5
6
7
8
9
10
11
12
13
module RSpec
  module Core
    class Example
      def set_exception(exception)
        if exception.is_a?(ActionView::Template::Error)
          @exception ||= exception.original_exception
        else
          @exception ||= exception
        end
      end
    end
  end
end

ensures that the original exception is the one that is displayed.

Unhelpfully Helpful

I loathe helper :all. What’s the point of nicely divided helper modules if they’re all injected into same namespace, overwriting each other at random? I like to write reusable helpers that individual controllers can use by implementing their version of a foo method. I see no reason why two completely unrelated helper modules couldn’t both has a display_help_text method. helper :all prevents both of these and creates situations where you can accidentally overwrite a helper method in a completely different part of your application.

Obviously not everyone agrees with me because Rails continues to move in the other direction. Once upon a time, a bare controller only pulled in its default helper (ie Bar::FooController gets Bar::FooHelper) and the helpers for its ancestors. Later, the default application skeleton added helper :all to ApplicationController but this was still an opt in scheme. With Rails 3 it is an opt out scheme: you have to put clear_helpers in a controller to clear out the helpers Rails includes automatically for it.

Update:

As of Rails 3.1, you can do config.action_controller.include_all_helpers = false to escape this madness.

View Specs

Rspec tries to do the right thing - if you poke into view_example_group.rb in rspec-rails you can see it determining the default helper class and including that and ApplicationHelper into the view object. Unforunately it also includes controller._helpers. This controller isn’t an instance of the controller class that will use the view - it’s an instance of ActionView::TestCase::TestController, so has Rails’ default of the full set of helpers, irrespective of how the controller in the app is configured.

The first step is of course calling clear_helpers on ActionView::TestCase::TestController but if your application has helper modules that are used by multiple controllers then some of your view specs may start to fail. You could of course add those helpers to the view from your before(:each) blocks but that is a slightly nasty piece of duplication: you’ve already specified once which helpers should be used. More dangerously you could develop a situation where your view specs pass because they specify one set of helpers but your app fails because your controllers specify a different set.

To remedy this I’ve got the following in one of my apps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
module ViewHelpersSpecHelper
  extend ActiveSupport::Concern

  included do
    if metadata[:type] == :view
      name_parts = metadata[:example_group][:description].split('/')
      name_parts.pop

      controller_name = "#{name_parts.join('/').camelize}"
      unless controller_name =~ /\wMailer/
        controller_name += 'Controller'
      end

      helpers = controller_name.constantize._helpers
      before do
        view.singleton_class.class_eval do
          include helpers unless included_modules.include?(helpers)
        end
      end
    end
  end
end

This works out what controller class the view will normally be used with, for admin/foo/bar.html.haml this would be Admin::FooController and uses the helpers that controller would use normally.

A call to config.include(ViewHelpersSpecHelper) in my RSpec.configure block ensures that this module gets included everywhere relevant.

Cells

This starts to break down for partials that are rendered by multiple different controllers or for shared ones that don’t have a natural controller they can be tested with. This is perhaps the one thing that helper :all doesn’t make worse, since it does mean that a given view is always rendered with the same set of helper functions available. If you do have such units of view and helper code, the cells library is well worth a look.