Multithreaded elements of style in javascript

This other day I was just coding and stuff when it suddenly struck me – multithreaded programming! My general ideas about it always revolved around things like scalability, synchronization, and those philosophers who refuse to eat with their hands. What I completely missed was the human side of it – the way it helps developers express their thoughts in code with so much ease and elegance even when they’re not pursuing head-spinning performance. In fact, lots of multithreaded code out there has nothing to do with scaling up and is all about breaking down code into logical sections.

Core Javascript Multithreading

You’re probably thinking I’m way off here because everyone knows javascript is completely single-threaded. That’s right – those DOM events, AJAX, timed events – all executed by the same thread! (see John Resig’s article). The general idea is that javascript executes short-burst pieces of code and can afford to run all events in the same thread. By doing that, it avoids tricky synchronization issues and greatly simplifies the way we work with it.

AJAX and DOM events in javascript are a great example of using multithreaded paradigms (simulated multithreading that is) to simplify coding. The alternative would have been to write your own loop that does something like:

while(true){
runDOMEvents();
runScheduledEvents();
runAJAXRequests();
runEverythingElse();
}

Instead, the javascript runtime already does that for you, so all you need to do is supply the callbacks. In addition, it also runs each event with its appropriate call stack so the multithreaded illusion looks very real in firebug.

Dojo Promises

Building on the existing body of knowledge on futures and promises, Dojo defines the concept of a Deferred.You may have also seen this exact same concept in Python, or a very similar one in Java’s java.util.concurrent.Futures. Java Futures are literally running in a separate thread so getting the future value can afford to block. In Javascript otoh, no blocking is possible since there’s only one thread executing the whole shebang, so the alternative is to provide a callback to execute when the future value becomes available. This means the promised value always gets processed in a different call stack even though it’s all in the same physical thread. However, that’s not a big issue in Javascript since the concept of an execution context is extremely flexible thanks to functions like apply() and call() (or wrappers like dojo.hitch()).

You may have noticed that Java Futures already return an object of the expected type, so you can use the Future as a proxy for the concrete value. For example, we could invoke the process() method below using a String

process("abc");

or we can just as well give it a future String value:

Future<String> future = ...
process(future.get()); // blocks until String is ready

The interface therefore remains uniform whether we use a Future or a concrete value. Since this is not the case in Dojo, we have a solid workaround for it in the form of dojo.when(), which does the extra work of normalizing promises and concrete values for us.

A Pretty Specific Generics Problem

So today I’m like coding and stuff (but you already knew that) when I ran into this conundrum with generics. Can you see why this doesn’t compile? Assume Car is the superclass of Ford and Toyota.

List<? extends Car> list = new ArrayList<Car>();
list.add(new Ford());
...
Compile error: cannot find symbol method add(Ford)

What??? Regroup. I have a List of anything that extends Car and I can’t put a Ford in it?  To see why this is actually correct, let’s go back to the basics. Consider:

Car myCar = new Ford();

Here we have a Car reference to a specific Car implementation, namely Ford. We can use this Ford object with the Car reference according to Car’s exposed interface. However, we can’t use that reference to access Ford-specific functionality. Now take another look at the list reference:

List<? extends Car> list = ...;

What this means is that list is a reference to a List of something that extends Car (or is Car), but we don’t really know what that something is. In other words, it could be a reference to a List<Toyota>, List<Ford>, or List<Car>. The only functionality available through the List<? extends Car> reference is the lowest common denominator for all such List objects – basically little more than iterating. Hence when we tried to do

list.add(new Ford());

this had to blow up – the target list could just as well have been a List<Toyota>, which would have been really bad.

—–

Ok, two questions now. First, how do I declare a List that can store any type of Car? Second, how is this unknown type genericising even useful?

The first one is really easy. All you have to do is declare it as a List of Cars:

List<Car> list = new ArrayList<Car>();

Now the type is known – it’s Car, which includes all its subtypes like Toyota and Ford, and you can add a Ford instance to the list – that works!

The second question is a bit harder to answer because we first have to realize the limitations of the genericised List. Specifically, the following will not compile:

List<Car> list = new ArrayList<Ford>();
...
Compile error:
incompatible types found : java.util.ArrayList<Main.Ford> required: java.util.List<Main.Car>

Even though Car has the inheritance relationship with Ford and Toyota, List<Car> does not share that same relationship with List<Ford> and List<Toyota>. In other words, List<Car> is not a superclass of List<Ford> and List<Toyota> (d-uh) so it cannot reference them. If you want a generic reference to all of these lists, such as passing any of these lists to a method, you have to resort to the even more generic List<? extends Car> variant:

public void doStuffWithListOfSomeCars(List<? extends Car> theList) // accepts List<Car>, List<Toyota> and List<Ford> 

In other words, the “? extends …” construct introduces a new form of polymorphism that’s really specific to generics. A little funky but still much better than List<Object>.


A Memory Leak where you least expect it

The other day I got some heap dumps that made me rub my eyes in disbelief. There’s this Map that stores a few thousand small strings (about 30 characters each, never more than 50), which in theory should never take up more than a couple of megs of RAM. However, according to MAT (Memory Analysis Tool – you know, the Eclipse plugin), this Map was taking up about 2.5GB instead!

So I went through a bunch of those Strings thinking I’d find a few that were way bigger than 50 characters and call mystery solved. However, they all seemed correct! So what’s taking up the extra 2.49GB?

Here’s a neat little snippet that demonstrates the problem:

 List<String> list = new ArrayList();
  for(int i = 0; i < Integer.MAX_VALUE; ++i)
  {
      String substr = ONE_MB_STRING.toString().substring(0,1); // get first character
      list.add(substr); // keep track of the tiny substring
      System.out.println(i); // how far will we get before it blows up?
  }

We take in a 1MB StringBuilder, convert it to a String, and grab just the first character. The one-character String is stored in the list. This simulates the conditions surrounding the huge memory leak.

So how many iterations before we run out of a 128MB heap?

Answer:

It turns out by the end of the 59th iteration, my test blows up with a big fat OOM. All the memory is eaten up by the list even though all it contains is 59 one-character String objects, provably. What???

It’s actually an interesting Java optimization that just didn’t work too well for us. Java takes advantage of String’s immutability to avoid copying bytes around when it builds substrings. The String class has an inner char[] buffer, as you might expect, along with two integers: offset and count. Needless to say, when you call substring(), you don’t get a String with a brand new buffer but rather a new String that points to the internal buffer of the source String, with only offset and count adjusted to point to the location of the substring. The behavior of the resulting String is exactly as expected, except that it drags around the original buffer with it (technically this is what you might call the Flyweight pattern). In our case, each 1-character substring  was actually pointing to the 1 MB buffer of the original String. Give it a few dozen of those and your heap becomes crammed indeed.

Solution:

The solution is really simple – construct a new String based on the substring. While substring() and split() use the flyweight pattern, the String constructor creates a new buffer based on the toString() value of the original:

list.add(new String(substr)); // keep track of the tiny substring

Update:

I was just taking a nice bubble bath while reading some Java source code when I came across this String constructor:

public String(String original) {
    int size = original.count;
    char[] originalValue = original.value;
    char[] v;
      if (originalValue.length > size) {
         // The array representing the String is bigger than the new
         // String itself.  Perhaps this constructor is being called
         // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
     } else {
         // The array representing the String is the same
         // size as the String, so no point in making a copy.
        v = originalValue;
     }
    this.offset = 0;
    this.count = size;
    this.value = v;
    }

Check out those comments – looks like someone ran into this problem before…

Stop That Thread!

You start a long-running job in a separate thread and then decide it’s not really worth it. You’re thinking there’s got to be an easy way to just kill the thread and be done with it.

However, the thread may have acquired system resources like a database connection or a lock. It may have created a bunch of temporary files. If you stop the thread in its tracks, those resources may not be released. Worse, if it’s a lock or a semaphore, you can wind up with a deadlocked process.

The solution is to cancel jobs cooperatively. This means you code the background job so that it checks for a cancel signal as often as it can, and if that signal is set the thread cleans up and exits gracefully. The signal can be as primitive as creating a file called canceljob.txt in a predetermined location so the background job checks for it every so often and acts accordingly.

There’s a problem with that approach. If the thread is stuck waiting on a lock, your cancellation routine can wind up waiting for that thread for quite some time. Ideally you should be able to just send a signal to the thread even if it’s asleep. In Java, use Thread.interrupt() to do just that and in the thread itself check Thread.interrupted() as often as you can:

 
while(!Thread.currentThread().interrupted() && hasMoreWork)
{
    // process next batch
}