Tuesday, May 29, 2012

Faster RSpec on JRuby with guard-jruby-rspec

The biggest complaint I hear about JRuby is how long it takes to run tests or specs.  I feel your pain.  That's why I started hacking on guard-jruby-rspec.

This guard extention allows you to run all of your specs on JRuby without the initial start up cost.  It does not run a subset of your specs like guard-rspec (yet) and it does not trigger a run when a file changes (yet).  Instead, this extension loads all of your application files in advance, and reloads them individually when they change.  That way, when you run RSpec, the JVM is already running, and your files have already been required.

Here's a short video of me using it:

There's still a lot to do.  For instance:
  • Autorun specs like guard-rspec (want to integrate with guard-rspec so as to not duplicate all of it's logic).
  • Allow for extra rspec options
  • Fix the way guard uses stdin so its not flaky on JRuby
  • Work out the kinks in gj-rspec script so that specs can be run in main terminal.
More to come...

Friday, May 4, 2012

Zero-Downtime Deploys with JRuby

One of the most common questions I get from readers of my book is about zero-downtime deployment. That is, how do you deploy new versions of a JRuby web application without missing users' requests?

To answer this question, let's first look at how MRI-based deployments handle zero-downtime.  When a process running an MRI web server needs to load a new version, we shut it down, push the new code, and start it up again.

This leaves a gap where no requests can be handled.  But most MRI deployments use a pool of application processes, which provides a nice way around this problem.  While one process is reloading, we can rely on the other processes to service requests.  The result is a "rolling restart" in which the re-deployment of each process is staggered.

In practice this is a difficult dance to coordinate.  Technologies like Passenger make it a lot easier, but under the covers it's still complicated.

JRuby deployments are different, though.  Instead of having a pool of processes, we deploy our applications to a single JRuby server process, which never gets shutdown (ideally).  The result is that our deployment has just two steps: undeploy and deploy.

However, this still leaves a gap where requests can be dropped, and we don't have other server processes that can take over while we're updating.  To fix this, we simply need to reverse the order of the steps!

A zero-downtime JRuby deployment requires that we fully deploy a new version of the application before we undeploy the old version.  Thus, we will have two version of the app running at the same time, but only one will handle requests.

The good news is that Trinidad essentially does this for us.  All we have to do is redeploy our application. It works because deep within the bowels of Trinidad is a method that looks like this:

In the takeover method, Trinidad is creating a new context for the next version of the application while the old version continues to run.  Then it swaps those contexts in one step.  The result is effectively zero-downtime deployment.

Unfortunately, not all JRuby web servers do this for us, so we may have to script the process ourselves.  Let's take TorqueBox for example.  When we deploy a new version of a TorqueBox application to a running TorqueBox server, it completely undeploys the app before loading the new version.

Getting around this is pretty easy when TorqueBox is running in a cluster (i.e. multiple TorqueBox instances across mutiple physical or virtual servers).  We simply need to deploy a new version of the application to one node at a time.  When the old version is undeployed, the Apache mod_cluster proxy will stop sending it requests.

If you're really paranoid, you can manually disable a node prior to deploying the new version of your application by invoking the disable() operation on the server's jboss.as.modcluter MBean.  The screen shot below shows me doing this from the JMX console.

In my book, I show how to invoke an MBean operation programmatically from a Rake task. That way, you can easily work this step into your deployment scripts.

If you're not running TorqueBox in a cluster, the process is a little more complicated.  Rather than just dropping your Knob file into the deployment directory or relying on Capistrano to create a deployment descriptor, you'll need to create a custom deployment descriptor for each new version of your application.  An example might look like this:

When the YAML file is dropped into the $JBOSS_HOME/standalone/deployments directory, it will deploy the new version of the application under the myapp-v2/ context without undeploying the old version of the application (assuming it is not also using the myapp-v1/ context).  Then you need to configure your proxy to point to myapp-v2/ instead of myapp-v1/.  The resulting process looks like this:

In my experience, if you really care about zero-downtime deployment, then you are probably running a redundant cluster anyways.  So the need to orchestrate the context switching on a single node is unusual.

In any case, it's certainly possible to achieve zero-downtime deployment with JRuby.  And in most cases, it's a lot easier than with MRI.