Category Archives: programming

A Tough Call

This other day I’m like coding and stuff when a co-worker walks by and asks to have a word. It turns out he needed some advice after he got himself in a pickle by checking in slightly more code than was supposed to. Something about refactoring a bunch of code instead of just fixing the dozen or so bugs he was assigned, which resulted in some frowns.

At this point you’re probably wondering why he came to me, given that I have nothing to do with his project. To that I would say I sometimes play the role of the bartender among developers with the exception that I don’t serve drinks and expect no tips. Hopefully that make sense.

So it turns out this guy set out to fix those bugs when he realized he’s dealing with a bunch of spaghetti with a full can of past-expiration-date bolognese sauce ton top. Being a fearless engineer, he took matters in his own hands and decided to refactor the offending code. But there’s trouble in paradise – the refactorings didn’t go too well with the management.

As we talk about this, I can’t help remembering so many past instances when we had some prototypical loose cannon make frivolous changes all over the place and breaking a bunch of stuff. The worst part was when I got stuck cleaning up after them. But there were also times when that was the right thing to do and it brought about a lot of good. The question is – which is it this time?

I put on my metaphorical referee uniform and tried to reason through this. Refactoring can definitely be risky and its benefit must be weighed against the risk. However, after deliberating for about 10 minutes I called the move – legal. The deciding factors were:

  1. This engineer stands behind his work. He expressed the willingness to “work all night” to rectify any issues that may arise as a result of this.
  2. He knows his s*%t. He made a convincing case why that old code was no good (monolithic methods, incorrect abstractions, etc), and why his code is likely to be more maintainable.
  3. He has a history of good coding practices.

In addition I decided to give his superiors yellow cards for taking the short-sighted approach of “if it aint broke don’t fix it” because they don’t know enough about the code in question (or code in general IMO) to make that call. Surely there are cases when it’s better not to fool around with messy code when it’s fully contained and debugged, but this was not one of them.

As a result of my decision, nothing happened. This was in part because we didn’t tell anyone so in the end it was just two dudes talking about some work stuff. For what it’s worth, I remain very pleased with my decision.

A Memory Leak where you least expect it

The other day I got some heap dumps that made me rub my eyes in disbelief. There’s this Map that stores a few thousand small strings (about 30 characters each, never more than 50), which in theory should never take up more than a couple of megs of RAM. However, according to MAT (Memory Analysis Tool – you know, the Eclipse plugin), this Map was taking up about 2.5GB instead!

So I went through a bunch of those Strings thinking I’d find a few that were way bigger than 50 characters and call mystery solved. However, they all seemed correct! So what’s taking up the extra 2.49GB?

Here’s a neat little snippet that demonstrates the problem:

 List<String> list = new ArrayList();
  for(int i = 0; i < Integer.MAX_VALUE; ++i)
  {
      String substr = ONE_MB_STRING.toString().substring(0,1); // get first character
      list.add(substr); // keep track of the tiny substring
      System.out.println(i); // how far will we get before it blows up?
  }

We take in a 1MB StringBuilder, convert it to a String, and grab just the first character. The one-character String is stored in the list. This simulates the conditions surrounding the huge memory leak.

So how many iterations before we run out of a 128MB heap?

Answer:

It turns out by the end of the 59th iteration, my test blows up with a big fat OOM. All the memory is eaten up by the list even though all it contains is 59 one-character String objects, provably. What???

It’s actually an interesting Java optimization that just didn’t work too well for us. Java takes advantage of String’s immutability to avoid copying bytes around when it builds substrings. The String class has an inner char[] buffer, as you might expect, along with two integers: offset and count. Needless to say, when you call substring(), you don’t get a String with a brand new buffer but rather a new String that points to the internal buffer of the source String, with only offset and count adjusted to point to the location of the substring. The behavior of the resulting String is exactly as expected, except that it drags around the original buffer with it (technically this is what you might call the Flyweight pattern). In our case, each 1-character substring  was actually pointing to the 1 MB buffer of the original String. Give it a few dozen of those and your heap becomes crammed indeed.

Solution:

The solution is really simple – construct a new String based on the substring. While substring() and split() use the flyweight pattern, the String constructor creates a new buffer based on the toString() value of the original:

list.add(new String(substr)); // keep track of the tiny substring

Update:

I was just taking a nice bubble bath while reading some Java source code when I came across this String constructor:

public String(String original) {
    int size = original.count;
    char[] originalValue = original.value;
    char[] v;
      if (originalValue.length > size) {
         // The array representing the String is bigger than the new
         // String itself.  Perhaps this constructor is being called
         // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
     } else {
         // The array representing the String is the same
         // size as the String, so no point in making a copy.
        v = originalValue;
     }
    this.offset = 0;
    this.count = size;
    this.value = v;
    }

Check out those comments – looks like someone ran into this problem before…

The Agile Man-Month, Part II

You have ten developers for a five man-year project. How long will it take to complete the project?

Analogously, you sell lemonade for 10 a glass. Yes, 10. The currency is totally immaterial. Canadian dollars, Kazakhstani tenge, or sea shells – I give ten, you give lemonade.

How are these two analogous?

It is a truth universally acknowledged that a software developer is like a ninja. Each is unique, carries a distinct set of abilities, and, surely, has a different force multiplier. One’s force multiplier varies depending on the type of work. For example, some ninjas do better during mortal-style kombats. Others excel at stealth assassinations. Others still specialize in sabotage and infiltration. The total amount of damage that can be inflicted by a team of ninjas depends on the collective force multiplier of that team for that specific type of mission.

Likewise, the total amount of work attainable by a team of developers is the sum of those individual developers’ abilities for that particular type of work.

But, you will argue, the man-month number is just an estimate, and as such is perfectly acceptable. After all, how big can be the difference in force multipliers among ninjas? A ninja is a ninja, and a developer is a resource with some variation.

There are no scientific methods to quantify the difference, but, according to Fred Brooks’ The Mythical Man-Month, “good” developers are five to ten times as productive as mediocre ones. If that’s true (and many developers would agree), one man’s year can be as valuable as another’s ten! This means one developer’s five-year project can be another’s six-month stint, and such.

You have ten arbitrarily-chosen developers for a five man-year project. How long will it take to complete the project?

 

To be continued…