Monday, November 25, 2013

Making your Java Class thread safe.

Thread safety is a very hot topic for Java programmers right now. At the JavaOne 2007, I noticed a fairly large number of technical sessions related to dealing with multithreaded code. It seems to be mostly due to the evolution of the hardware rather than a shift in paradigm in the software world. While most servers have been multiprocessor systems for a long time, most desktop computers are also starting to offer options for concurrent execution of programs, be it using multiple CPUs or a multiple core CPU. So while developers working mostly on the server side of things have been focused on building multithreaded systems, desktop developers are now also forced to take multithreaded designs into consideration. Defining the threading model for your application is probably the most important step you can take into making your code thread safe. The actual coding actually only comes second in my opinion, you need to have your threading model down beforehand so you can actually get your code right. The reason for that is that there are a lot of different ways to make your class thread safe, and you can only pick the right one if you know the thread access pattern to your class.

So what does it mean to have a thread safe class? A thread safe class is a class that guarantees the internal state of the class as well as returned values from methods are correct while invoked concurrently from multiple threads. There is an important distinction to make, just because a class is thread safe, does not mean that you cannot introduce any concurrency problems using it. The most common example of this can be found in the collections framework, consider the following code: 


import java.util.Collections;
import java.util.HashSet;
import java.util.Set;

public class ThreadUnsafeBookStore {

    private final Set availableBooks = 
            Collections.synchronizedSet(new HashSet());
    
    public boolean buyBook(Book book) {
        if(availableBooks.contains(book)) {
            // Order book ...
            if(--numberRemainingCopies == 0) {
                availableBooks.remove(book);
            }
            return true;
        } else {
            return false;
        }
    }
}


We are using a synchronised Set to keep track of the available books at this particular bookstore. Basically any number of threads can access the availableBooks Set concurrently and the set will maintain a consistent internal state, however that does not mean that the state it represents in the book store will remain consistent. Imagine two customers trying to buy the same book at the same time, yet the book store only has one copy left. Two threads enter the buyBook() method and one possibility for the order of execution is as follows (thread one is green, the other is red):

availableBooks.contains(book)
Order book ...

availableBooks.contains(book)
Order book ...

if(--numberRemainingCopies == 0)
availableBooks.remove(book)
return true;

if(--numberRemainingCopies == 0)
return true;


In this example, both customers managed to order the book, but there will be one unhappy customer because in the physical world the bookstore only has one copy. There is a couple of ways to fix this, one is to declare the buyBook() method synchronized in which case only one thread at a time would be able to execute this method. However, that is somewhat misleading synchronization, synchronizing the method means you are taking the Monitor on the BookStore instance, but that is not the resource you are actually worried about. The resource you need to restrict concurrent access to is the availableBooks Set. The following code is more explicit about that.


import java.util.HashSet;
import java.util.Set;

public class ThreadSafeBookStore {

  private final Set availableBooks = new HashSet();

  public boolean buyBook(Book book) {
    synchronized (availableBooks) {
      if (availableBooks.contains(book)) {
        // Order book ...
        if (--numberRemainingCopies == 0) {
          availableBooks.remove(book);
        }
        return true;
      } else {
        return false;
      }
    }
  }
}


This code clearly indicates that the synchronized block of code is accessing the availableBooks resource and we need to protect this resource from concurrent access. A side effect of this is that you do not need to use a synchronized set anymore (if you put all accesses to it inside a synchronized block as shown in buyBook()). This is also more flexible when managing threaded access to the BookStore class. Imagine that you have several resources that you need to protect in your class, then synchronizing on the BookStore instance is inefficient, because you might only access one of the resources in one method, while another method uses another resource. Synchronizing on the BookStore instance, only one thread at a time can work, but synchronizing on individual resources several threads can work on the bookstore concurrently as long as they don’t access the same resources inside the bookstore.
The synchronized keyword is a fairly straightforward, but powerful, tool to make your classes thread safe. However, it is not the only option you have available; let us look at another example of an unsafe class, this time dealing with visibility.


public class ThreadUnsafeBean {
    
    private int x;
    
    public int getX() {
        return x;
    }

    public void setX(int x) {
        this.x = x;
    }
}


Right now you might be wondering what could possibly go wrong with this code, certainly it doesn’t get more basic than this. The problem resides in the Java memory model, which allows the JVM to optimize a number of things, especially when it comes to memory accesses. Reading something from memory is expensive in a program, especially when you have all these nice registers and caches around in your CPU. Each thread is allowed to have its own cache of sorts, so for the following sequence of calls you could run into a problem.
setX(0)
setX(1)
getX()

getX()
Thread one (in green) could reasonably expect the getX() method to return 1, since thread 2 (in red) set the value to 1 previously. However, because the JVM is allowed to optimize access to field values, it can retrieve the value of x from the threads own cache which at this point contains the value 0. This code is so simple and appears in so many programs, how come we don’t get that error more often? Well, mainly because most JVM implementation will only optimize this for systems with many CPUs or hardware that has a memory management that supports this kind of optimizations. For example the x86 architecture has a memory management model that is quite conservative, so you’re unlikely to see this problem on that platform. However, that might change and if you move your application to a platform that does more memory optimizations, this problem might show up a lot more often. One of the ideas behind Java is write-once run anywhere, and that is actually possible since the Java memory model is the same independent of the underlying hardware platform or OS. So how do we fix this code?
One option is to declare the getX and setX methods as synchronized, because in addition to making sure that only one thread at a time can execute the method, the thread will synchronize its cache with the values stored in memory. This will work and you are assured that you will always get the right value of x. This is however not the best solution because this could cause quite a lot of thread contention and is overkill because you don’t want to restrict access to the method to one thread at a time, you just want the value x to be visible to all threads at all time. Therefore, consider the second option shown below.


public class ThreadSafeBean {
    
    private volatile int x;
    
    public int getX() {
        return x;
    }

    public void setX(int x) {
        this.x = x;
    }
}


The volatile keyword ensures that we will always get the in memory value of the integer x. This is lot cheaper than synchronizing the entire bean instance which might contain other values than just x. This is a good solution, a better but more constraining solution, is to make the bean immutable. This can only be done if you only intend to set the value of x once.


public class ThreadSafeBean {
    
    private final int x;

    public ThreadSafeBean(int x) {
        this.x = x;
    }
    
    public int getX() {
        return x;
    }

}


The final keyword means that the value of this field once set cannot be changed. This also has repercussions for thread safety, because Java then also guarantees that the value will be seen by all threads without the need for any locking.
These few advices should help you implement thread safe classes. We looked both at the problems of state consistency and state visibility, and the language tools at our disposition to prevent them. Generally synchronize as little as possible, synchronization is not a big performance hitter anymore, but can still introduce problems with thread contention if you have many threads running. So instead try to use volatile when possible, and ideally make classes immutable. However, too much synchronization is still better than too little. Another important aspect is to document thread safety on your classes in the Javadoc, that goes for thread safe and unsafe classes. You documenting your class clearly signals to other developers what they can and can’t do with it, and as a lucky side effect might also make them ponder thread safety issues in their own code. If you are further interested in threading issues and solutions in Java, I recommend that you readJava Concurrency in Practice by Brian Goetz. It is an excellent book, well written and complete, about multithreading in Java.

6 comments:

No comments: