Monday, January 22, 2018

Playing with strings and performance in C#

A few weeks ago, I was solving a problem at HackerRank and after completing a solution that ran  under 100 milliseconds, the program would still timeout during execution. HackerRank limit is usually 1 second, so something was taking more than 900ms.

It was the strings!

Yes, we know handling strings is costly, but this time there was no way around it.  At the final stage of the problem I had to construct a string over 200k chars long from an array of Boolean values that represented the bits of the number.  

Disclaimer: Strings are slow, mainly because they are immutable. Concatenating two strings, creates a third one and possibly 2 for the garbage collector to pickup.

Looking through the code I noticed I was simply using "+=" instead of StringBuilder.Append method. After using StringBuilder the solution was faster and the submission was approved. It was  fast now, but it got me thinking "how much faster indeed?" which ended with me measuring the performance of concatenating strings, because we always knew StringBuilder was faster than regular operation with strings but I still wanted to see the numbers.

I used C# and tested the same set of examples in NetCore and full .Net Framework. I did not notice any performance differences between the two of them.

What was measured?

I measured 3 ways to concatenate strings, and each of them in reverse mode. Here are the 6 variants:
  • Using the + operator. (str += "1";)
  • Using + operator to insert in front (str = "1"+str)
  • Using string.Concat (str = string.Concat(str,"1"))
  • Using string.Concat to insert in front (str = string.Concat("1",str))
  • Using StringBuilder.Append to add at the end
  • Using StringBuilder.Insert to place in front
Every operation was ran against data sets of multiple lengths, for 10, 20, 30, 40 and 50k characters. The chart below shows the results in milliseconds. 

It was a surprise to see that StringBuilder.Append is not only faster, but appears to be in constant time O(1). Also, to see that the Insert method is for certain sizes slower than other operations, so StringBuilder is not always faster.  

Now if we run the tests for a data set with 100k characters, we get a new surprise: the insert method starts being faster at a bit over 50k chars in length, with the other operations degenerating in performance. 

If you like competitive programming or have an interesting in high performance solutions, make sure to always measure. Do experiments a lot in code and test different scenarios. In my case, I had an array reversed so I walked the array backwards and used the Append method instead of Insert. That was the difference between failing or passing a problem.   

Check the code for StringBuilder at GitHub. The implementation uses string.wstrcpy, while is blazing fast, this is still O(n). 

No comments: