Here’s a bug which took me the better part of an hour to figure out why a primary key was being violated in our database causing the code which runs the billing system to fail. Once again the error message was right in front of me telling me what the problem was, but knowing how the code worked it just didn’t seem possible for a duplicate to occur. Worse, the code had been “working” for years.
Several customers were delayed in receiving their invoices which meant they would also pay later, increasing debtor days.
You’ve probably already guessed that using date time as part of a primary key is what caused the issue, but there are two parts to this bug and one which I find fascinating. I’ll get to it soon.
Here’s a simplified version of the table schema. It’s basically a log of every e-mail sent to the customer. It’s not always one invoice per e-mail address and sometimes, it’s the same invoice to the same e-mail address but split due to restrictions on attachment size on the receiving end.
InvoiceID INT InvoiceEmail NVARCHAR(255) InvoiceSendDate DATETIME
The bug was in our .NET application layer and from the use of
DateTime.Now which (as I know now) has a very low resolution. You can think of it as a grandfather clock slowly ticking away. If you ask it for the date and time quick enough, you’ll get the same value back. And surely that broke what was once working in production for years.
So the question is, why did it stop “working” all of a sudden? Well believe it or not, an optimization to some of our billing routines made the system process invoices faster. It made the managers happy, remember debtor days? A side-effect of these changes surfaced a few bugs, mostly related to date time.
All of this happened during our billing cycle. We worked hours trying to fix and reverse some of the problems this caused, and although the solution will embarrass me, it saved my ass.
Here’s the problematic code found in several places within in a tight loop. Obviously the person who wrote this code was aware of the issue as illustrated by the
AddSeconds(1) method call. What they didn’t realise is that it was operating on the same value being returned by
table.Rows.Add(row["InvoiceID"], DateTime.Now.AddSeconds(1), email.address, 1, genDate);
Same code with a workaround applied.
var now = DateTime.Now; // Declared in the outer scope table.Rows.Add(row["InvoiceID"], (now = now.AddSeconds(1)), email.address, 1, genDate);
There are other issues with this code. For a starters it doesn’t fix the root cause of the problem and a better solution beckons. The second issue is that adding all of these seconds results in incorrect values being entered in the database. If you look at the logs today, the entries all appear as though they happened in the future.
There’s technical debt to be paid. We know the problem, it’s documented and it is now with the gods.
So the lesson? Use something with a higher resolution such as timestamp and have it generated on record insertion in the database itself. Or, discuss another solution with the team.