Monday, July 16, 2012

StringBuilder or not StringBuilder that is the question...

Ten years ago, I achieved competence in my first programming language, Java. Now, a decade later, I found myself back in Java-land programming apps on Android. Even though Java is verbose like a bad russian novel, I found myself enjoying it in a homecoming sort of way. I've decided to periodically share some nitty gritty of Java with the world with the hope that folks will find it useful. Today I want to talk about string concatenations.

String concatenations happens all the time in our code. Most of the time, we need to generate a short message to display to the user or send some snippet of text to the log.

The problem with strings in Java is that they're immutable. Every time we concat two strings we are creating a third string. So the above code has the same effect as:

The intermediate strings from the concat operations become garbage immediately. And since strings are objects, all the overhead of object allocation and instantiation applies, as well as garbage collection at some time in the future.

To solve this problem Java designers came up with StringBuffer, and much later, StringBuilder. These classes are supposed to give programmers a more efficient way to concat strings. The biggest difference between StringBuffer and StringBuilder is that StringBuffer is thread-safe. It turns out 99.9% of the string concatenations are not done across multiple threads so synchronization is an overkill. Since synchronization is not free, it is expected that StringBuilder will outperform StringBuffer.

The previous example, using StringBuilder, becomes:

But the $100 question is: does it actually perform better?

I created a micro benchmark (code here) using caliper to compare the three different ways of concatenating strings. The first using the + operator, another using StringBuilder, and the last using StringBuffer. Since phones have limited memory, I ran the benchmark using 3 VM configurations: 16 MB, 32 MB and 512 MB. The results are surprising:

memoryMax      benchmark  ns linear runtime

  -Xmx16M StringAddition 226 =========================

  -Xmx16M  StringBuilder 246 ===========================

  -Xmx16M   StringBuffer 269 ==============================

  -Xmx32M StringAddition 143 ===============

  -Xmx32M  StringBuilder 155 =================

  -Xmx32M   StringBuffer 166 ==================

 -Xmx512M StringAddition 135 ===============

 -Xmx512M  StringBuilder 152 ================

 -Xmx512M   StringBuffer 166 ==================

As expected StringBuilder outperforms StringBuffer, but StringBuilder is about 10% worse than the + operator.

 

It turns out StringBuilder (and StringBuffer) uses an intermediate structure to store the result (most likely an array of some sort). And the underlying array have to expand if more strings are appended to it than its capacity. The default constructor creates an array of size 16. It would appear that expanding the underlying array is more expensive than creating and throwing away a few strings.

 

To prove this, I created a second benchmark (code here) that instead of using the default constructor, gave it the initial capacity of 100 (which is more than enough to fit the test result). And voila!

memoryMax      benchmark  ns linear runtime

  -Xmx16M StringAddition 217 =============================

  -Xmx16M  StringBuilder 202 ===========================

  -Xmx16M   StringBuffer 221 ==============================

  -Xmx32M StringAddition 143 ===================

  -Xmx32M  StringBuilder 126 =================

  -Xmx32M   StringBuffer 138 ==================

 -Xmx512M StringAddition 143 ===================

 -Xmx512M  StringBuilder 125 ================

 -Xmx512M   StringBuffer 137 ==================

Without array expansion, it appears that StringBuilder is around 10% better than the + operator. 
 
With the evidence in hand, I come to 3 conclusions.
  1. You almost never want to use StringBuffer
  2. StringBuilder may be more efficient, but it's tricky to use properly. If you initialize it with too small a capacity, it will be slower (due to array expansion costs) and if you give it too big a capacity, you're wasting memory.
  3. I'm going to stick to using + to concat my strings. For small number of concatenations the performance boost with StringBuilder is not worth the extra typing and I don't have to think as hard.