Space Vatican

Ramblings of a curious coder

Elasticsearch Native Scripts for Dummies

One of the cool things about elasticsearch is the ability to provide scripts that calculate custom ordering or that filter based on application specific logic. Out of the box elasticsearch supports mvel and there are also plugins that support python and javascript. I imagine that it would be pretty simple to provide a jruby one too.

You can also use so called native scripts, written in java. These are faster than the other alternatives and may also be handier if you need to integrate with some existing java code to calculate your scores. There is some info out there on how to build these but they presuppose a certain familiarity with java and its environment. If you’re anything like me then you can bumble through java syntax readily enough but classpaths, jars etc. are a bit of a mystery. So here’s how I got a native script running with instructions that (hopefully) presuppose almost no knowledge of java. I’m no java wizard - I may well be doing something dumb - but this is working well enough for us in production.

Ruby Bindings for Liblinear

There are already some ruby bindings but I’ve written my own.

I mostly did this for fun, but I think swig sometimes gives you slightly unnatural feeling interfaces, because you’re focussing too much on mapping C++ classes to your ruby classes. For example some liblinear methods take a struct param argument. liblinear-ruby-swig mirrors this by proving an LParameter class that maps onto this:

1
2
  param = LParameter.new(:C => 1, :eps => 0.01)
  model = LModel.new(problem, param)

but I’d rather write:

1
  model = RubyLinear::Model.new problem, :c => 1, :eps => 0.01

I imagine that SWIG must incur some sort of overhead but I would have thought that was pretty negligible for something like liblinear where most of the heavy lifting happens in the library being wrapped.

Seeding CoreData Databases With Ruby

If you’re writing an iOS app that uses Core Data then you may well want to ship it with an initial database (which potentially gets over the air updates later on).

On iOS, CoreData stores always use sqlite3 as their backend. You could create a sqlite database directly, but you’d have to reverse engineer the way apple uses sqlite, ensure that you use the same name manging for table and column names, generate the same meta data used for persistent store migration etc. Too brittle for my liking.

Memory Leak in YAML on Ruby 1.9.2

We recently upgraded to delayed_job 3.0 and immediately started seeing some major memory leaks in our app, in the delayed job workers, passenger instances and even standalone scripts which don’t even use delayed job. In the end I tracked it down to a bug in YAML.load

Out of the box YAML support can be provided by 1 of 2 backends in ruby 1.9 : syck and psych. Syck is an older implementation based around a no longer support C library, whereas psych uses the newer and supported libYAML. The default backend is psych, but earlier version of delayed_job did work with psych, and so were forcing the yaml engine to syck (which doesn’t have this bug). When we upgraded to 3.0 they fixed their problems with psych and so we (unintentionally) started used psych. Unfortunately the version of psych that comes with ruby 1.9 has a memory leak in YAML.load. If YAML::ENGINE.yamler is ‘psych’ and Psych::VERSION is 1.0.0 then you are using an affected version

In particular this means that each time you load a model with serialised attributes, you leak memory. One of our very frequently used models has some serialized columns so that was why we were leaking. Delayed job obviously does a lot of yaml loading and so its workers were haemorrhaging memory.

Plugging the leaks

It took a bit of work to narrow down the leaks we were seeing to yaml but once that was done it turn out a few people have already written about this, notably over at nerdd.dk but I am somewhat amazed that knowledge of this issue is not more widespread. The issue is perhaps clouded by the fact that if libyaml isn’t available when ruby is built ruby will just skip building psych (in which case syck is the only backend). Ruby 1.9.3 has a fixed version of psych, but disappointingly currently available versions of 1.9.2 (currently p290) still have this bug, 18 months after the release of 1.9.2.

Luckily there is a gem version of psych, however using it can be a bit fiddly if (as most rails apps do) you use bundler. Bundler loads psych early on its its setup process so you can’t just stick psych in your Gemfile - both versions end up being loaded which causes an ugly mess.

nerdd.dk has a series of posts about how they tacked the various issues. In the end what I did was

  • set up config/setup_load_paths.rb to keep passenger happy:
1
2
3
4
require 'rubygems'
gem 'psych'
require 'bundler'
Bundler.setup
  • edit config/boot.rb to do gem ‘psych’ just after require ‘rubygems’
  • hacked the stub executable for bundle to also have gem ‘psych’ after ruby gems is loaded
  • added the same version of psych to the Gemfile as was installed outside of bundler

A Small Difference Between 1.9.2 and 1.9.3

I was looking at moving an application to ruby 1.9.3 and was getting some strange syntax errors along the lines of syntax error, unexpected keyword_do_block on code that was working fine on 1.9.2. I spend quite a few minutes staring at the code which looked completely benign.

It turns out the ruby 1.9.2 is a bit too permissive: it allows you to write an extra comma after your argument list but before the do that marks the start of your block.

1
2
3
  some_method arg1, arg2, do
    ...
  end

ruby 1.9.3 on the other hand won’t accept this.