Dressipi Is Hiring

Aug 24^th, 2011

Dressipi (where I work) is looking to hire a junior-ish software engineer to work on the guts of our various web applications, recommendation systems and mobile applications. We’re a rails shop but we are above all looking for bright minds that can pick up new technologies with ease, so prior knowledge of rails is not required.

If you’re interested drop us a line at jobs ]at[ dressipi.com with a covering letter & CV or any questions you might have.

NSData and RubyCocoa on 10.7

Aug 3^rd, 2011

For some reason code like

OSX::NSData.dataWithBytes_length(data, data.length)

segfaults on lion. Others have noted that this only occurs when the data contains bytes with the high bit set. This smells like something trying to interpret the string in some encoding when the string is in fact arbitrary binary data.

I’d love to use MacRuby instead of RubyCocoa, but unfortunately MacRuby doesn’t seem to be able to handle Active Record at the moment, so I can’t use it (yet). I haven’t had time to delve properly into how RubyCocoa converts between ruby and Objective-C objects but I was able to hack around by using ruby inline

require 'inline'
class CFDataGenerator
  begin
    inline do |builder|
      builder.include "<CoreFoundation/CoreFoundation.h>"
      builder.c_raw_singleton "
      
      static VALUE from_string(int argc, VALUE *args){
        CFDataRef d = CFDataCreate(NULL, RSTRING_PTR(args[0]), RSTRING_LEN(args[0]));    
        
        VALUE result;
        char type = '@';
        ocdata_to_rbobj(Qnil,&type, (const void *)&d, &result, false) ;
        return result;
      }
    "
    end
  end
end

Then you can use CFDataGenerator.from_string to convert ruby strings into nsdata instances. Remember to release the instance when you’re done!

Spork and Parallel_tests

Jul 30^th, 2011

What’s more terrifying than being stabbed in the face with a spork? Being stabbed in the face with 2 sporks!

In these days where even laptops are dual or quad-core parallel_tests is a great tool for using all of those cores when running specs. Spork is another great tool for avoiding the overhead of loading rails, gems etc. when running tests.

Out of the box spork and parallel_tests don’t work together. When spork receives a request to run specs it aborts its current run, so only one of your parallel_spec instances actually gets to run. In addition parallel_spec’s trick of setting an environment variable to indicate which test environment should be used doesn’t work, because the runners forked by spork don’t inherit the environment parallel_spec sets up.

Fortunately there are at least two ways to get around this. I assume that you’ve already got parallel_tests up and running with the required extra databases, changes to database.yml etc.

Method 1: Multiple sporks

Grab the right version of parallel_tests

Make sure you’ve got parallel_tests 0.5 or later (Previous versions deal with the .opts file in a different way)

Gentlemen, start your sporks

Run as many sporks as you need. Run the first one on port 8989 (the default), the second on port 8991, the third on 8992 and so on. You should set TEST_ENV_NUMBER in each one, so your invocations should look like

TEST_ENV_NUMBER='' spork -p 8989
TEST_ENV_NUMBER='2' spork -p 8991
TEST_ENV_NUMBER='3' spork -p 8992

and so on

Setup parallel_spec.opts

Edit spec/parallel_spec.opts so that it looks like

--drb
--drb-port <%= 8989 + ENV['TEST_ENV_NUMBER'].to_i %>
--format progress
--format ParallelSpecs::SpecRuntimeLogger --out tmp/parallel_profile.log

Profit:

Run parallel_spec. When rspec is invoked it passes the .opts file through erb so the first instance (TEST_ENV_NUMBER=‘’) will have –drb-port set to 8989, the second to 8991 and so on. Each rspec will connect to a different spork instance that has the correct TEST_ENV_NUMBER setup.

Method 2: Single hacked spork

A disadvantage to method 1 is that you have multiple instances of spork sitting around, consuming memory. It requires extra keystrokes to ensure each spork instance is restarted properly and more cpu cycles are consumed everytime you restart spork.

Use my spork

My fork of spork doesn’t prohibit running multiple spec runs concurrently and allows environment data parallel_tests requires to be passed in. I’ve only modified the forking strategy.

Setup script/spec

#!/usr/bin/env ruby
require 'rspec'
require 'rspec/autorun'

module RSpec
  module Core
    class DRbCommandLine
      def run(err, out)
        begin
          DRb.start_service("druby://localhost:0")
        rescue SocketError, Errno::EADDRNOTAVAIL
          DRb.start_service("druby://:0")
        end
        spec_server = DRbObject.new_with_uri("druby://127.0.0.1:#{drb_port}")
        spec_server.run(@options.drb_argv, err, out, 'TEST_ENV_NUMBER' => ENV['TEST_ENV_NUMBER'])
      end
    end
  end
end

This ensures that when parallel_spec runs script/spec, the test environment number is passed through to the chikd

Make sure the right database is used

When spork forks a child with a specific TEST_ENV_NUMBER we need to ensure that the child connects to the correct database. This can be done with a Spork.each_run block in your spec_helper.rb. The app I’ve been working with uses mongomapper as well as Active Record, so my each_run block looks like this.

  if !ENV['TEST_ENV_NUMBER'].blank?
  #reconnect to your AR & mongo dbs
    db_config = YAML::load(ERB.new(File.read(Rails.root.join("config", "mongo.yml"))).result)
    if db_config[Rails.env]
      MongoMapper.setup db_config, Rails.env
    end
    ActiveRecord::Base.configurations = DressipiCom::Application.instance.config.database_configuration
    ActiveRecord::Base.establish_connection
  end

Profit

bundle exec spork and run parallel_spec as normal (things won’t work if the ‘normal’ version of the spork binary is run). Your spork will fork multiple instances which will pick up TEST_ENV_NUMBER and reconfigure their database settings as needed.

Sporking With Akephalos

Jul 3^rd, 2011

Spork is a test server for saving on environment startup time when running rspec or cucumber. Akephalos is an awesome capybara driver that wraps the htmlunit library. Because htmlunit is a java library, akephalos spawns a jruby process and then uses DRb to allow non-jruby instances to get at the htmlunit goodness.

If you try and combine the two, you get a somewhat uninviting error message:

Fs-ElephantBook-2:dressipi fred$ cucumber --drb features/user_signs_up.feature 
Using the default profile...
Disabling profiles...
.F---------------------------

(::) failed steps (::)

0x00000083df8904 is not id value (RangeError)
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:375:in `_id2ref'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:375:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1405:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1713:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:613:in `recv_request'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:908:in `recv_request'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1533:in `init_with_client'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1545:in `setup_message'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1497:in `perform'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1592:in `block (2 levels) in main_loop'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1588:in `loop'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1588:in `block in main_loop'
(druby://127.0.0.1:60779) -e:1
./features/step_definitions/authentication_steps.rb:31:in `block (2 levels) in '
./features/step_definitions/authentication_steps.rb:30:in `each_pair'
./features/step_definitions/authentication_steps.rb:30:in `/^I sign up with$/'
features/user_signs_up.feature:10:in `When I sign up with'

Which a shame as spork would be especially useful here because akephalos can take quite a few seconds to launch its jruby process.

DRb

First let’s step back and remind ourselves how DRb works. DRb (short for distributed ruby) is a distributed object system for ruby. In the simplest of cases, when you call a method on a remote object, your arguments are marshalled and transmitted over the wire. In the remote process, the arguments are unpacked, the method called and then the result is marshalled and sent back to you.

Obviously this doesn’t always work - some objects (procs, IO streams etc.) can’t be dumped. A very common example would be if you want to call a method and pass a block. So, in addition to the previously described scheme, DRb also allows making proxies for objects. Instead of getting or passing a marshalled dump of an object it uses a proxy that knows the process the object has come from (a uri like druby://127.0.0.1:8990) and something identifying the object in that process. By default it uses the object_id for the object, which DRb can then use to find the actual object. This can require some careful coding, as if in the remote process the object appears unused it might get garbage collected. If this happens then when you try to call a method on it DRb won’t be able to turn the object_id back into an actual object and you get an exception like the one above.

When you run code such as

some_drb_object.do_foo { }

Then there are actually DRb calls happening in both directions. The client asks the server process to run do_foo, but because there is a block passed to the method DRb wraps up that block in a proxy object and sends it to the server along with the arguments. When do_foo calls yield, a DRb call happens in the reverse direction in order to invoke the block.

Akephalos and Spork

Back to the original problem. When you run some specs or cucumber features with spork, spork forks off a new instance and then uses DRb to run your specs/features in that process. It’s the combination of that use of DRb with akephalos’ use of DRb that causes the problem. After digging around I traced the exception raised to this line in akephalos:

page.find(selector).each { |node| nodes << Node.new(self, node) }

page is a DRb object from akephalos’s jruby process that wraps the htmlunit representation of the page. A little more digging revealed that the invalid id was in fact referring to the block passed to each. You may have noticed that the error is being raised from druby://127.0.0.1:8990, ie from spork’s parent process.

There was already a DRb server running in spork parent process ( druby://127.0.0.1:8990 ), and because our test running process was forked from that process, when DRb packaged up that block into a proxy object it gave the uri for that server as the uri to use when making calls to the block. When each yielded to the block, instead of the method call going to the process that called each, it went to the spork parent process. Because the call went to the wrong process, the object id passed is entirely meaningless.

Fixing it

I’m not entirely sure of the best way to fix this - akephalos is blissfuly ignorant of this aspect of its environment. I was able to get my cucumber features running with both spork and akephalos by doing this in my env.rb

Spork.prefork do
  ... #my usual Spork.prefork stuff
  Akephalos::RemoteClient.manager #force akephalos to be started by the parent process
                                  #so that we don't keep starting it again and again
end

Spork.each_run do
  Thread.current['DRb'] = { 'server' => DRb::DRbServer.new }
end

This ensures that when DRb is constructing a proxy object it uses the uri for the freshly constructed DRbServer as the uri for any proxy objects it passes, and so block invocations and the like go to the correct process.

Adding to Rails’ Log File Performance Breakdown

Apr 28^th, 2011

You’ll all have seen lines like this in your log files:

Completed 200 OK in 14ms (Views: 0.2ms | ActiveRecord: 2.7ms)

While obviously a very high level view of your application’s performance, it’s still a good place to start when you’re wondering where the time has gone. If you’re using a different datastore or if external services (solr, sphinx, external API calls) are important contributors to your application’s performance you might want time spent in those services listed too.

Prior to Rails 3, it was hard to add information to this summary without making a mess of things. What was reported on was just hardcoded into Action Controller. Things are a lot easier in Rails 3.

New in Rails 3 is ActiveSupport::Notifications. The idea here is to decouple things producing performance data and things consuming them.

ActiveSupport::Notifications.instrument('some_task.my_app', data) do
  #do some processing
end

will record how long it took to do the task, and then yield that data to anyone interested (instances of subclasses of ActiveSupport::LogSubscriber). Action Controller’s process action is wrapped by a call to instrument and one particular subscriber to that is one that writes what Action Controller has been up toto the app’s log file. If you wanted to analyze your app’s data other than by eyeballing the log files you could write your own class that could subscribe to these notifications and do whatever processing you wanted, instead of writing something to parse your log files.

A simple example

Suppose we want the time we spend query solr to be logged. Following the way Active Record does things, you need to do three things: instrument the action we wish to monitor using ActiveSupport::Notifications, record the information in a subscriber and then plug the resulting data into Action Controller.

At its barest, the instrumentation might look like this

module Solr
  class Connection
    def send_with_instrumentation(request)
      ActiveSupport::Notifications.instrument("send_query.solr") do
        send_without_instrumentation(request)
      end
    end
    alias_method_chain :send, :instrumentation
  end
end

Here send is a funnel through which all queries to solr flow.

The log subscriber might look a little like this:

module Solr
  class LogSubscriber < ActiveSupport::LogSubscriber
    def self.runtime=(value)
      Thread.current["solr_runtime"] = value
    end

    def self.runtime
      Thread.current["solr_runtime"] ||= 0
    end

    def self.reset_runtime
      rt, self.runtime = runtime, 0
      rt
    end

    def send_query(event)
      self.class.runtime += event.duration
    end
  end
end

Solr::LogSubscriber.attach_to :solr

The call to attach_to means that whenever a foo.solr event is fired, our subscriber’s foo method will be called. The only thing we do in our subscriber is keep track of how much time has been spent, using the runtime and runtime= method. reset_runtime resets the counter and returns the time accumulated so far.

The last thing we need to do is use the hooks that Action Controller helpfully provides. The following is lifted pretty much line for line from Active Record[1]:

module Solr
  module Railties
    module ControllerRuntime
      extend ActiveSupport::Concern

    protected

      attr_internal :solr_runtime_before_render
      attr_internal :solr_runtime_during_render

      def cleanup_view_runtime
        self.solr_runtime_before_render = Solr::LogSubscriber.reset_runtime
        runtime = super
        self.solr_runtime_during_render = Solr::LogSubscriber.reset_runtime
        runtime - solr_runtime_during_render
      end

      def append_info_to_payload(payload)
        super
        payload[:solr_runtime] = (solr_runtime_before_render || 0) +
                                 (solr_runtime_during_render || 0) +
                                 Solr::LogSubscriber.reset_runtime
      end

      module ClassMethods
        def log_process_action(payload)
          messages, solr_runtime = super, payload[:solr_runtime]
          messages << ("Solr: %.1fms" % solr_runtime) if solr_runtime
          messages
        end
      end
    end
  end
end
ActiveSupport.on_load(:action_controller) do
   include Solr::Railties::ControllerRuntime
end

cleanup_view_runtime is used to prevent double counting of time spent performing database queries etc. as both database time and view rendering time. The call to super in cleanup_view_runtime will wind up actually rendering the view, so looking at the solr time before and after this tells us how much time we need to deduct from the view render time (which is what the call to super returns).

append_info_to_payload is where we record how much time has been used in total, which is the sum of solr time before, during and after rendering the view.

log_process_action (a class method) acts on a payload hash and produces an array of components such as ‘ActiveRecord: 32.2ms’.

Finally we add an on_load hook that will include this module in ActionController::Base.

Put all this together and you should see stuff like

Completed 200 OK in 17ms (Views: 10.8ms | ActiveRecord: 0.3ms | Solr: 3.7ms)

in your logs.

[1] At the time of writing rails 3.0.7 actually does this incorrectly because it doesn’t consider time after rendering, or when no view was rendered at all

← Older Blog Archives Newer →

Space Vatican

Ramblings of a curious coder