Treating GUID’s as strings

We all know what GUID’s are and there’s a good chance you’ve had to generate one at some point. For the most part I’ve always considered them to be a series hexadecimal characters which represent a unique string per domain. And they are, but GUID’s are just 128 bit values. I might be stating the obvious here but I do see developers from time to time treat them as strings. It makes me wonder if they consider he underlying value of a GUID.

So 987a8aff-02c5-410d-92f1-19803043fb5f and 127581288875099940198086380968295959295 are the same value, just represented differently. The former being hexadecimal and the latter being decimal.

Why are you telling me this?

I’m writing about this because recently I saw a method signature in our code base at work which which looked similar to this:

public void DoStuff(string[] guids) {
    // Method body
}

The problem here is the disconnect between the value and what’s being presented to the developer. To them, and myself at one point, saw the textual representation but didn’t think about the underlying value. With this reasoning it seems legitimate to pass around GUID’s as strings but it’s not good practice in the same way that it’s not good to use strings to pass integers between functions. The one exception here is serialization which I won’t discuss.

Strings in .NET are encoded using UTF-16 which means a single character in a GUID will consume one 16 bit word. Luckily the characters (Glyphs) which make up a GUID do not require variable length encoding provided by UTF-16. So now our 128 bit number is consuming 576 bits of memory. If you take away the hyphens it’s 512 bits. That’s three times more space which is required.

When I asked the developer at the time, he didn’t seem too fussed. It worked fast on his machine and there is no possible way that the method would ever see more than 5 GUID’s at a time. My concerns were:

  1. It’s impossible to predict how code will be used in the future. Someone might copy and paste this function and make small changes so it can be used elsewhere.
  2. There’s no reason to do it in the first place. If you’re going to pass around GUID’s, use the appropriate data type. For .NET based languages it’s the System.Guid data type.
  3. The mindset of the developer writing the code multiplied by all the lines of code written.

To me the last point is important. If you multiply this mind set by the thousands of lines of code which will be committed to the code base over time then you’ll start to see sluggish performance which will be difficult to track down. You’ll also have calls to Guid.TryParse() and Guid.ToString() scattered all over the code base. This is poor style.

It’s easy to get lazy and it often happens to me. I want to get on with the important things. For the developer to change this code to accept the System.Guid data type would have required a few modifications further up the call stack so I guess there would have been a little extra work to do but I still thought it was worth doing.

This is not about memory consumption either. There’s gigabytes of RAM on our production machine, CPU’s are getting faster and hardware upgrades which sometimes fix performance problems are cheaper than a developers hourly rate. But it’s important to remember that software systems are also growing larger and more complex at the same time. There’s also an ever increasing volume of data to process. It’s impossible to predict how long the code you write will be around for and what new pressures it will be put under.

Avoid death by 1000 paper cuts and try to do it right the first time.

For further reading on GUID’s I recommend a three part series by Eric Lippert.