Monday, July 22, 2013

Why JRuby Arrays are not Threadsafe

Run JRuby-Lint against one of your Rails projects and you'll probably see this message:

It's a warning that the statement below is a potential problem in multi-threaded applications because it's appending values to an Array.

Rails is thread safe, so this line of code won't be a problem. But JRuby-Lint must warn us because it's impossible to write a program that determines if some code is absolutely thread safe. In your own code, however, you won't have the same guarantee. That's because JRuby uses a different threading model than MRI. It has a big impact on performance, but it also makes the Array class a little different.

Threads in MRI are encumbered by a mechanism called the Global Interpreter Lock (GIL), which is also known as the Giant VM Lock (GVM). The GIL allows only one thread in the process to execute Ruby code at a time. It does this very efficiently but it will never allow Ruby threads to run in parallel. As a result, there will be little to no increase in speed when running on a multiprocessor machine, which is the case for most production servers. To illustrate this, consider the figure below.



Each user thread in the JVM, which is just an instance of the Thread class, is mapped directly to a kernel thread. Kernel threads are scheduled and managed by the operating system kernel, which can schedule two threads from the same process to run on two different CPU cores at the same time. The GIL prevents this from happening.

So getting back to the += operation described earlier, let's look at an example that is not thread safe. Put the following code, which populates an array with 256 integers, into a file called threads.rb.

It uses << to append to the array, which presents the same problem as the += operator. When we run threads.rb with MRI, we see the following:


The array was correctly populated with 256 integers. But when we run the same program with JRuby, we might see the following (though probably not exactly):


In fact, nearly every time we run the program with JRuby we could get a different result. And we might even encounter a ConcurrencyError. That's because the threads in our program are running in parallel and can corrupt the array. Arrays are not threadsafe in JRuby. They aren't in MRI either. But when we ran the program with MRI, the GIL prevented the threads from executing concurrently. Thus, the array was not corrupted.

The following statement is the critical section:


Even though it is a single statement, it actually involves multiple steps. Both the Java code in JRuby and the C code in MRI implement methods on the Array class that do essentially the same thing as this pseudo-code:


In JRuby, you can see exactly this (with some additional error checking) in the RubyArray.java append method (lines 7, 18, and 21 below):


The execution of these three instructions on two threads is illustrated below.



The instructions execute atomically in MRI because the GIL is locked at the start of the method and released at the end of the method. So there is overlap between two threads. In the JVM, however, the threads can walk all over each other. In this illustration, both threads set the i variable to the same value because Thread 2 executed the first statement before Thread 1 reallocated the array. Then, Thread 2 overwrites the value that Thread 1 set instead of adding a new value. That's how our JRuby array can end up with less than 256 values.

The lack of thread safety in the JRuby Array class is not insurmountable. The JRuby team could have put a lock around this method. Or we could put a lock around the critical section in our thread. But either change would make the program slower.

The real problem is that we are sharing the array between multiple threads. It would be okay if the array was immutable. That is, if we were not changing it. But we are mutating the array. Having mutable shared data is the quickest way to make a program not threadsafe. The best, and only full-proof way, to make your Ruby programs threadsafe is to avoid sharing any data between threads.

We can consider our code thread-safe if it behaves correctly when accessed from multiple threads without any synchronization or other coordination on the part of the calling code. That's a mouthful. But thread safety is difficult to define. Formal attempts in academia are complicated and don't provide much practical guidance.

Ultimately, thread safety is a matter of program correctness. Whether the execution is correct or not really depends on your program. In the threading example we expected our array to have 256 integers when it finished. But with JRuby it doesn't always end up that way. So its not correct.

Running the program with MRI always gives us 256 integers. But if we were to inspect the array we would find that it is not always in the same order. Does the order of the data array affect the correctness of our program? It depends on the requirements. Thread safety can be situational.

Fortunately, there are some good heuristics for making our code thread-safe. The most important is to avoid mutable shared state. In fact, that was the problem with the threading example. It modified the data array from multiple threads.

The most common way to accidentally share mutable data between threads in a Ruby program is with class variables. Consider the following example:


In a multi-threaded environment, like a web server, this class suffers from the same problem illustrated earlier. In general, its a good idea to avoid class variables unless you protect them with some kind of thread synchronization. Databases are very good at that kind of protection. Thus, if you are going to keep any kind of state in your program, its usually preferable to do so in a database or anything else that handles concurrent access well.

Ultimately, JRuby Arrays are non-threadsafe because we want them to perform as fast as possible. With this power comes responsibility, but as you can see from the Rails example, it is feasible to use JRuby Arrays safely.

I borrowed some examples from Nick Seiger for this post. Thanks Nick!

Saturday, May 11, 2013

How to Run Rails 4.0.0.rc1 on JRuby


Getting Rails 4.0.0.rc1 running on JRuby isn't that different from running it on MRI, but there are a few minor things you'll need to adjust. Let's start with a new Rails4 app. Make sure you're using JRuby and install the Rails4 release candidate like this:

Then use the rails command to generate a new app.

Move into the newly created my_app directory so we can tweak some settings. Open the Gemfile in an editor and look for these lines:

First, we'll remove jruby-openssl because it's included in JRuby 1.7 (this has been fixed in Rails master). Then we need to set the version of activerecord-jdbcsqlite3-adapter to 1.3.0.beta1.  Next, you should thank Karol Bucek (@kares) for all the hard work he's put into activerecord-jdbc and jruby-rack. Your Gemfile should now have just these lines in in place of the code shown above:

Now return to the console and update bundler like this:

Do some basic housekeeping

Now we can smoke test the app by starting the server like this:

Point a browser to http://localhost:3000 and there's a chance you'll see the app running.  But you might encounter the error shown below:

OpenSSL::Cipher::CipherError: Illegal key size: possibly you need to install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for your JRE

If you see this error it means you need to install the unrestricted policy files for the JVM. You can find these at the Oracle Website. Download the zip file, and extract the two important files it contains: local_policy.jar and US_export_policy.jar. Move these files into your $JAVA_HOME/jre/lib/security directory, and replace the existing files of the same name.  On Mac OS X they are probably located here:

/Library/Java/JavaVirtualMachines/jdk1.7.0_09.jdk/Contents/Home/jre/lib/security/

With the unrestricted policy files installed, restart the server and you'll be good to go. But this may present a problem in deployment. For example, you won't be able to update the JVM on a Heroku dyno. Another option may be to downgrade cryptography as described in this JIRA issue, but I haven't tried that. Hopefully this will all get worked out.

And of course, you probably won't want to use WEBrick in production. Warbler 1.3.8 may work for you, but try the rails4 branch if it doesn't.  I have an example of a working Rails4 app on BitBucket in my warbler-examples repo.

I haven't attempted to run a Rails4 app on Trinidad, Puma or TorqueBox. I would love to hear your results.

Please give this a go, and report back with any problems you find.  We would love to have Rails4 working on JRuby the day it's released.


Tuesday, May 29, 2012

Faster RSpec on JRuby with guard-jruby-rspec

The biggest complaint I hear about JRuby is how long it takes to run tests or specs.  I feel your pain.  That's why I started hacking on guard-jruby-rspec.


This guard extention allows you to run all of your specs on JRuby without the initial start up cost.  It does not run a subset of your specs like guard-rspec (yet) and it does not trigger a run when a file changes (yet).  Instead, this extension loads all of your application files in advance, and reloads them individually when they change.  That way, when you run RSpec, the JVM is already running, and your files have already been required.

Here's a short video of me using it:



There's still a lot to do.  For instance:
  • Autorun specs like guard-rspec (want to integrate with guard-rspec so as to not duplicate all of it's logic).
  • Allow for extra rspec options
  • Fix the way guard uses stdin so its not flaky on JRuby
  • Work out the kinks in gj-rspec script so that specs can be run in main terminal.
More to come...

Friday, May 4, 2012

Zero-Downtime Deploys with JRuby


One of the most common questions I get from readers of my book is about zero-downtime deployment. That is, how do you deploy new versions of a JRuby web application without missing users' requests?

To answer this question, let's first look at how MRI-based deployments handle zero-downtime.  When a process running an MRI web server needs to load a new version, we shut it down, push the new code, and start it up again.


This leaves a gap where no requests can be handled.  But most MRI deployments use a pool of application processes, which provides a nice way around this problem.  While one process is reloading, we can rely on the other processes to service requests.  The result is a "rolling restart" in which the re-deployment of each process is staggered.


In practice this is a difficult dance to coordinate.  Technologies like Passenger make it a lot easier, but under the covers it's still complicated.

JRuby deployments are different, though.  Instead of having a pool of processes, we deploy our applications to a single JRuby server process, which never gets shutdown (ideally).  The result is that our deployment has just two steps: undeploy and deploy.


However, this still leaves a gap where requests can be dropped, and we don't have other server processes that can take over while we're updating.  To fix this, we simply need to reverse the order of the steps!

A zero-downtime JRuby deployment requires that we fully deploy a new version of the application before we undeploy the old version.  Thus, we will have two version of the app running at the same time, but only one will handle requests.

The good news is that Trinidad essentially does this for us.  All we have to do is redeploy our application. It works because deep within the bowels of Trinidad is a method that looks like this:



In the takeover method, Trinidad is creating a new context for the next version of the application while the old version continues to run.  Then it swaps those contexts in one step.  The result is effectively zero-downtime deployment.

Unfortunately, not all JRuby web servers do this for us, so we may have to script the process ourselves.  Let's take TorqueBox for example.  When we deploy a new version of a TorqueBox application to a running TorqueBox server, it completely undeploys the app before loading the new version.

Getting around this is pretty easy when TorqueBox is running in a cluster (i.e. multiple TorqueBox instances across mutiple physical or virtual servers).  We simply need to deploy a new version of the application to one node at a time.  When the old version is undeployed, the Apache mod_cluster proxy will stop sending it requests.

If you're really paranoid, you can manually disable a node prior to deploying the new version of your application by invoking the disable() operation on the server's jboss.as.modcluter MBean.  The screen shot below shows me doing this from the JMX console.



In my book, I show how to invoke an MBean operation programmatically from a Rake task. That way, you can easily work this step into your deployment scripts.

If you're not running TorqueBox in a cluster, the process is a little more complicated.  Rather than just dropping your Knob file into the deployment directory or relying on Capistrano to create a deployment descriptor, you'll need to create a custom deployment descriptor for each new version of your application.  An example might look like this:



When the YAML file is dropped into the $JBOSS_HOME/standalone/deployments directory, it will deploy the new version of the application under the myapp-v2/ context without undeploying the old version of the application (assuming it is not also using the myapp-v1/ context).  Then you need to configure your proxy to point to myapp-v2/ instead of myapp-v1/.  The resulting process looks like this:



In my experience, if you really care about zero-downtime deployment, then you are probably running a redundant cluster anyways.  So the need to orchestrate the context switching on a single node is unusual.

In any case, it's certainly possible to achieve zero-downtime deployment with JRuby.  And in most cases, it's a lot easier than with MRI.

Thursday, March 29, 2012

Clustering TorqueBox

I've create a new screencast to go with the second Beta release of my book.

In this video, I demonstrate how TorqueBox scheduled jobs can be run in a cluster without duplicating the job across nodes.



To run the examples I've shown in the video, you'll need to install the torquebox-server gem to your JRuby runtime:


Here are the commands and code I run in the video:






Thursday, March 15, 2012

Talkin' about JRuby

I gave a talk today at hsv.rb on the subject of my book. The video, slides and code samples are below. Enjoy!



Deploying with JRuby
View more presentations from jkutner