Monday, July 22, 2013

Why JRuby Arrays are not Threadsafe

Run JRuby-Lint against one of your Rails projects and you'll probably see this message:

It's a warning that the statement below is a potential problem in multi-threaded applications because it's appending values to an Array.

Rails is thread safe, so this line of code won't be a problem. But JRuby-Lint must warn us because it's impossible to write a program that determines if some code is absolutely thread safe. In your own code, however, you won't have the same guarantee. That's because JRuby uses a different threading model than MRI. It has a big impact on performance, but it also makes the Array class a little different.

Threads in MRI are encumbered by a mechanism called the Global Interpreter Lock (GIL), which is also known as the Giant VM Lock (GVM). The GIL allows only one thread in the process to execute Ruby code at a time. It does this very efficiently but it will never allow Ruby threads to run in parallel. As a result, there will be little to no increase in speed when running on a multiprocessor machine, which is the case for most production servers. To illustrate this, consider the figure below.



Each user thread in the JVM, which is just an instance of the Thread class, is mapped directly to a kernel thread. Kernel threads are scheduled and managed by the operating system kernel, which can schedule two threads from the same process to run on two different CPU cores at the same time. The GIL prevents this from happening.

So getting back to the += operation described earlier, let's look at an example that is not thread safe. Put the following code, which populates an array with 256 integers, into a file called threads.rb.

It uses << to append to the array, which presents the same problem as the += operator. When we run threads.rb with MRI, we see the following:


The array was correctly populated with 256 integers. But when we run the same program with JRuby, we might see the following (though probably not exactly):


In fact, nearly every time we run the program with JRuby we could get a different result. And we might even encounter a ConcurrencyError. That's because the threads in our program are running in parallel and can corrupt the array. Arrays are not threadsafe in JRuby. They aren't in MRI either. But when we ran the program with MRI, the GIL prevented the threads from executing concurrently. Thus, the array was not corrupted.

The following statement is the critical section:


Even though it is a single statement, it actually involves multiple steps. Both the Java code in JRuby and the C code in MRI implement methods on the Array class that do essentially the same thing as this pseudo-code:


In JRuby, you can see exactly this (with some additional error checking) in the RubyArray.java append method (lines 7, 18, and 21 below):


The execution of these three instructions on two threads is illustrated below.



The instructions execute atomically in MRI because the GIL is locked at the start of the method and released at the end of the method. So there is overlap between two threads. In the JVM, however, the threads can walk all over each other. In this illustration, both threads set the i variable to the same value because Thread 2 executed the first statement before Thread 1 reallocated the array. Then, Thread 2 overwrites the value that Thread 1 set instead of adding a new value. That's how our JRuby array can end up with less than 256 values.

The lack of thread safety in the JRuby Array class is not insurmountable. The JRuby team could have put a lock around this method. Or we could put a lock around the critical section in our thread. But either change would make the program slower.

The real problem is that we are sharing the array between multiple threads. It would be okay if the array was immutable. That is, if we were not changing it. But we are mutating the array. Having mutable shared data is the quickest way to make a program not threadsafe. The best, and only full-proof way, to make your Ruby programs threadsafe is to avoid sharing any data between threads.

We can consider our code thread-safe if it behaves correctly when accessed from multiple threads without any synchronization or other coordination on the part of the calling code. That's a mouthful. But thread safety is difficult to define. Formal attempts in academia are complicated and don't provide much practical guidance.

Ultimately, thread safety is a matter of program correctness. Whether the execution is correct or not really depends on your program. In the threading example we expected our array to have 256 integers when it finished. But with JRuby it doesn't always end up that way. So its not correct.

Running the program with MRI always gives us 256 integers. But if we were to inspect the array we would find that it is not always in the same order. Does the order of the data array affect the correctness of our program? It depends on the requirements. Thread safety can be situational.

Fortunately, there are some good heuristics for making our code thread-safe. The most important is to avoid mutable shared state. In fact, that was the problem with the threading example. It modified the data array from multiple threads.

The most common way to accidentally share mutable data between threads in a Ruby program is with class variables. Consider the following example:


In a multi-threaded environment, like a web server, this class suffers from the same problem illustrated earlier. In general, its a good idea to avoid class variables unless you protect them with some kind of thread synchronization. Databases are very good at that kind of protection. Thus, if you are going to keep any kind of state in your program, its usually preferable to do so in a database or anything else that handles concurrent access well.

Ultimately, JRuby Arrays are non-threadsafe because we want them to perform as fast as possible. With this power comes responsibility, but as you can see from the Rails example, it is feasible to use JRuby Arrays safely.

I borrowed some examples from Nick Seiger for this post. Thanks Nick!