Space Vatican

Ramblings of a curious coder

Memory Leak in YAML on Ruby 1.9.2

We recently upgraded to delayed_job 3.0 and immediately started seeing some major memory leaks in our app, in the delayed job workers, passenger instances and even standalone scripts which don’t even use delayed job. In the end I tracked it down to a bug in YAML.load

Out of the box YAML support can be provided by 1 of 2 backends in ruby 1.9 : syck and psych. Syck is an older implementation based around a no longer support C library, whereas psych uses the newer and supported libYAML. The default backend is psych, but earlier version of delayed_job did work with psych, and so were forcing the yaml engine to syck (which doesn’t have this bug). When we upgraded to 3.0 they fixed their problems with psych and so we (unintentionally) started used psych. Unfortunately the version of psych that comes with ruby 1.9 has a memory leak in YAML.load. If YAML::ENGINE.yamler is ‘psych’ and Psych::VERSION is 1.0.0 then you are using an affected version

In particular this means that each time you load a model with serialised attributes, you leak memory. One of our very frequently used models has some serialized columns so that was why we were leaking. Delayed job obviously does a lot of yaml loading and so its workers were haemorrhaging memory.

Plugging the leaks

It took a bit of work to narrow down the leaks we were seeing to yaml but once that was done it turn out a few people have already written about this, notably over at nerdd.dk but I am somewhat amazed that knowledge of this issue is not more widespread. The issue is perhaps clouded by the fact that if libyaml isn’t available when ruby is built ruby will just skip building psych (in which case syck is the only backend). Ruby 1.9.3 has a fixed version of psych, but disappointingly currently available versions of 1.9.2 (currently p290) still have this bug, 18 months after the release of 1.9.2.

Luckily there is a gem version of psych, however using it can be a bit fiddly if (as most rails apps do) you use bundler. Bundler loads psych early on its its setup process so you can’t just stick psych in your Gemfile - both versions end up being loaded which causes an ugly mess.

nerdd.dk has a series of posts about how they tacked the various issues. In the end what I did was

  • set up config/setup_load_paths.rb to keep passenger happy:
1
2
3
4
require 'rubygems'
gem 'psych'
require 'bundler'
Bundler.setup
  • edit config/boot.rb to do gem ‘psych’ just after require ‘rubygems’
  • hacked the stub executable for bundle to also have gem ‘psych’ after ruby gems is loaded
  • added the same version of psych to the Gemfile as was installed outside of bundler

A Small Difference Between 1.9.2 and 1.9.3

I was looking at moving an application to ruby 1.9.3 and was getting some strange syntax errors along the lines of syntax error, unexpected keyword_do_block on code that was working fine on 1.9.2. I spend quite a few minutes staring at the code which looked completely benign.

It turns out the ruby 1.9.2 is a bit too permissive: it allows you to write an extra comma after your argument list but before the do that marks the start of your block.

1
2
3
  some_method arg1, arg2, do
    ...
  end

ruby 1.9.3 on the other hand won’t accept this.

Passing a Block From a Method Written in C

Everynow and again I wind up rewriting a ruby performance hotspot in C. It happens infrequently enough that I always forget the C api for passing a block implemented in C to some ruby code. Hopefully writing this down will help me remember this in the future. Today, I wanted to call find_each on a class, using a C function as the block. Pre ruby 1.9 you need to call rb_iterate which always did my head in, but in 1.9 you use rb_block_call which is way more straightforward (rb_iterate is still there but deprecated)

Dressipi Is Hiring

Dressipi is looking for a mid to senior level developer to join their development team. If you love solving problems, teasing meaning out of large volumes of data and have a passion for writing well designed and tested code we’d love to hear from you. We develop recommendation systems that make shopping simpler and looking good effortless, using a combination of knowledgeable stylists and clever technology.

We’re looking for:

  • a seasoned software engineer, with at least a year’s experience of Ruby on Rails
  • good knowledge of MySQL (exposure to other databases/datastores a plus)
  • able to hit the ground running with a large and growing Rails 3 application.
  • TDD/BDD experience

Some experience of recommendation systems or machine learning techniques would be useful but by no means required. Frontend skills (HTML, CSS, jQuery) appreciated but not required.

You’ll join a small, dynamic team to work on all the technology that powers Dressipi, from consumer facing web applications to heavy backend calculations and mobile apps. No knowledge (or interest in) fashion or clothes required - our delightful stylists have all the clothing expertise we need.

As well as working with the rest of the development team you will be working closely with the stylists, encoding their unique knowledge and understanding of clothing into Dressipi’s algorithms.

You will take part in the full lifecycle of the product from understanding the business needs, to deploying the solution and analysing the resulting data.

Dressipi tackles problems in a wide range of areas, including recommendation systems, text extraction, visual recognition, crowd sourcing and more.

Request Specs and Authlogic

I was writing some rspec request specs the other day and was curious to notice that

1
2
3
4
before(:each) do
  activate_authlogic
  UserSession.create(@user)
end

wasn’t working: the code under test didn’t think the user was logged in at all. The same code works fine for controller specs and the authlogic documentation asserts that functional and integration tests should behave identically in this respect. Everything should just work.