The lowly string concatenation operator.

So the other day I encountered the following code. (I’ve rewritten this to protect the innocent.)

        HashMap<String,String> map = ...;

        for (int i = 0; i < 20; ++i) {
            if (check(map.get("label" + i))) break;
            String thing = map.get("thing" + i);
            String other = map.get("other" + i);

Now for most applications, this is perfectly fine. I like idioms which are clear, even if they are slightly more computationally expensive. However, for a high performance application, it’s surprising how much stuff is going on behind the rather innocuous statement map.get(“label” + i);.

According to the Java spec, string concatenation is a shortcut for using a StringBuilder class. So map.get(“label” + i) becomes:

    StringBuilder tmp = new StringBuffer();

And the append(int i) call is translated into a call to Integer.toString():

    StringBuilder tmp = new StringBuffer();
    String tmp2 = Integer.toString(i);

But wait, it gets worse! The StringBuilder.append() routine has to (a) check to see if there is enough buffer space allocated to store the string, growing the array (and copying the contents using System.arraycopy() to copy the memory across if not), and (b) use System.arraycopy() to copy the contents at the end of the buffer.

So each tmp.append() call involves a call to System.arraycopy() and an array length compare.

Then the tmp.toString() call allocates another buffer, copying the internal StringBuilder copy into the string buffer. (The internal source for String is permitted to simply copy a reference to the internal array in StringBuilder over rather than copying the bytes, but it doesn’t appear to do this.)

And I haven’t even gotten started!

Then we pass tmp to the map.get() method, which does a lookup, calling into the string’s hashCode() and equals() call. And because we have a brand-new string, hashCode() has to iterate through the entire array of characters, doing a math operation on each character, accumulating the results into a hash code.

Every time we go through this routine.

The answer, of course, if you’re writing a high performance application, is not to repeatedly do the same thing over and over again, especially if you can store away the results in memory.

In our above case, it would be easy enough to pre-populate an array of strings. Thus:

        /* Do this initialization once */
        String[] labels = new String[20];
        String[] things = new String[20];
        String[] others = new String[20];
        for (int i = 0; i < 20; ++i) {
            labels[i] = "label" + i;
            things[i] = "label" + i;
            others[i] = "label" + i;

This pre-populates an array of strings with the labels we’ll be checking against. Then:

        /* Then do this a lot */
        for (int i = 0; i < 20; ++i) {
            if (check(map.get(labels[i]))) break;
            String thing = map.get(things[i]);
            String other = map.get(others[i]);

Now each time we pass through the loop, instead of creating a StringBuilder object, allocating two character arrays (or more, if our key string is too big to fit in the StringBuilder’s default buffer) for each call to get() above, which winds up allocating 60 StringBuilder objects and 120 arrays, along with 60 calls to hashMap() each time through, instead we move those internal allocations into an initializer.

And each time we go through our loop, we’ve replaced 60 StringBuilder allocations and 120 array allocations with 60 array lookups–which is a hell of a lot faster.

2 thoughts on “The lowly string concatenation operator.

  1. Small mistake

    for (int i = 0; i < 20; ++i) {
    labels[i] = "label" + i;
    things[i] = "thing" + i;
    others[i] = "other" + i;

    But we understood your point


  2. I think the original code’s use of a map is odd and confusing. At first blush it looks as If what is being represented is an ordered collection that is indexed by an integer– there are twenty in you example– and therefore you should use an ordered collection. I think this post should be about changing the tortured use of a map rather then optimizing it. Even if the map is required, the relationship between labels, things and others should be made more explicit.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s