A Memory Leak where you least expect it

The other day I got some heap dumps that made me rub my eyes in disbelief. There’s this Map that stores a few thousand small strings (about 30 characters each, never more than 50), which in theory should never take up more than a couple of megs of RAM. However, according to MAT (Memory Analysis Tool – you know, the Eclipse plugin), this Map was taking up about 2.5GB instead!

So I went through a bunch of those Strings thinking I’d find a few that were way bigger than 50 characters and call mystery solved. However, they all seemed correct! So what’s taking up the extra 2.49GB?

Here’s a neat little snippet that demonstrates the problem:

 List<String> list = new ArrayList();
  for(int i = 0; i < Integer.MAX_VALUE; ++i)
      String substr = ONE_MB_STRING.toString().substring(0,1); // get first character
      list.add(substr); // keep track of the tiny substring
      System.out.println(i); // how far will we get before it blows up?

We take in a 1MB StringBuilder, convert it to a String, and grab just the first character. The one-character String is stored in the list. This simulates the conditions surrounding the huge memory leak.

So how many iterations before we run out of a 128MB heap?


It turns out by the end of the 59th iteration, my test blows up with a big fat OOM. All the memory is eaten up by the list even though all it contains is 59 one-character String objects, provably. What???

It’s actually an interesting Java optimization that just didn’t work too well for us. Java takes advantage of String’s immutability to avoid copying bytes around when it builds substrings. The String class has an inner char[] buffer, as you might expect, along with two integers: offset and count. Needless to say, when you call substring(), you don’t get a String with a brand new buffer but rather a new String that points to the internal buffer of the source String, with only offset and count adjusted to point to the location of the substring. The behavior of the resulting String is exactly as expected, except that it drags around the original buffer with it (technically this is what you might call the Flyweight pattern). In our case, each 1-character substring  was actually pointing to the 1 MB buffer of the original String. Give it a few dozen of those and your heap becomes crammed indeed.


The solution is really simple – construct a new String based on the substring. While substring() and split() use the flyweight pattern, the String constructor creates a new buffer based on the toString() value of the original:

list.add(new String(substr)); // keep track of the tiny substring


I was just taking a nice bubble bath while reading some Java source code when I came across this String constructor:

public String(String original) {
    int size = original.count;
    char[] originalValue = original.value;
    char[] v;
      if (originalValue.length > size) {
         // The array representing the String is bigger than the new
         // String itself.  Perhaps this constructor is being called
         // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
     } else {
         // The array representing the String is the same
         // size as the String, so no point in making a copy.
        v = originalValue;
    this.offset = 0;
    this.count = size;
    this.value = v;

Check out those comments – looks like someone ran into this problem before…

Post a comment or leave a trackback: Trackback URL.


  • Stephan  On August 17, 2011 at 9:28 pm

    That’s interesting. Is there any solution? Does “new String(substr)” get rid of the link to the 1MB original or does that just create an additional reference?

    • vvakar  On August 18, 2011 at 7:45 am

      Hi Stephan – yes, new String() will do it. I updated the post. Thanks!

  • Dennis Fung  On August 31, 2011 at 9:45 pm

    wow, Interesting findings val. I’ve been working on a small java program during my down time (its a log parser, so I’m dealing with lots of Strings), and I’m using substring() quite frequently. I haven’t ran into any OOME yet, but perhaps I should go back and reconsider using an alternative method…

    btw, just out of curiosity, was this the heap dump I asked you to look at a few weeks ago?

    • vvakar  On September 1, 2011 at 9:30 am

      Hi Dennis – yes, I got this from the heap dump we looked at together. Note that the OOM condition happens because there are some pretty large strings and lots of them. If your application is not having issues with your heap size it seems unnecessary to complicate the code just to be theoretically clean.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: