Microsoft has just announced that a software error in calculating dates (over leap year) caused a major outage in Windows Azure la semana pasada.
Was it really a simple error in judgement working around
DateTime.Now.AddYears(1) on a leap year?
What coding practices could have prevented this?
As dcstraw pointed out
DateTime.Now.AddYears(1) on a leap year does in fact return the correct date in .NET. So it's not a framework bug, but evidently a bug in Date calculations.
preguntado el 10 de marzo de 12 a las 14:03
Use a better date and time API
The built-in .NET date and time libraries are horribly hard to use properly. They do let you do everything you need, but you can't expreso yourself clearly through the type system.
DateTime es un lío,
DateTimeOffset may lull you into thinking you're actually preserving the time zone information when you're not, and
TimeZoneInfo doesn't force you to think about everything you ought to be considering.
None of these provide a nice way of saying "just a time of day" or "just a date", nor do they make a clear distinction between "local time" and "time in a particular time zone". And if you want to use a calendar other than the Gregorian one, you need to go through the
Calendar class the whole time.
Some points you may want to think about, which are easy to miss if you're not aware of them:
- Mapping a local date/time to one in a particular time zone isn't as simple as you might think. A specific local date/time might occur once, twice (ambiguity) or zero times (it's skipped) due to daylight saving transitions
- Time zones vary historically - more than
TimeZoneInfois generally willing to reveal, frankly. (It doesn't support a time zone whose idea of "standard time" changes over time, or which goes into permanent daylight saving time.)
- Even with the zoneinfo database, time zone IDs aren't necessarily stable. (CLDR addresses this; something I'm hoping to support in Noda Time eventually.)
- Textual representations of dates and times are a nightmare, not just in terms of ordering, but date separators, time separators, and odd things like genitive month names
- The start of the day isn't always midnight - in Brazil, for example, the spring daylight saving transition moves the wall clock from 11:59:59pm to 1am
- In some cases (well, one that I know about) a time zone can force a whole day to be skipped - December 30th 2011 didn't occur in Samoa! I suspect most developers can probably ignore this one, but...
- If you're going to use a calendar other than the Gregorian one, be careful and make sure you really know how you expect it to behave.
As far as specific development practices:
- Think about what you're really trying to represent. I expect the core benefit of Noda Time to be forcing developers to choose between various different types to represent their data. Get that right, and everything else is simpler.
- Unit test everything you can think of. That will depend on exactly what your system does, of course, but particularmente consider different time zones, what happens across daylight saving transitions, and of course leap years.
- I'd advise injecting a "clock-like interface" - a service for telling the current time - rather than explicitly calling
DateTime.UtcNow; it makes it easier (feasible!) to unit test
- If you're performing multiple operations with "now", obtain that date/time una vez and remember it, rather than repeatedly requesting "now" - otherwise the value could change in unfortunate ways between the calls.
- "Do everything in UTC" isn't always the answer either - if I want to know "when exactly does 'two weeks from now' occur in my local time zone?" then I need to store the locales date/time as well as the time zone.
It's worth noting that the bug probably wasn't due to a line like you posted:
That doesn't create an invalid date. If you run:
(new DateTime(2012, 2, 29)).AddYears(1)
you get Feb 28, 2013. I don't know what Azure's guest agent is written in but it must have been a different call that failed. A bad way to have done this in .NET would have been:
new DateTime(today.Year + 1, today.Month, today.Day)
That throws an exception if
today is leap day. However the Microsoft blog about the Azure issue said that they created an invalid date of Feb 29, 2013, which I'm not sure is possible to do with
DateTime en la red.
No estoy diciendo eso
DateTimeOffset aren't error-prone, just that I don't think they would have caused this particular issue.
How can we develop coding practices designed to protect against leap year bugs? What coding practices could have prevented this?
Unit testing specific dates as John mentioned is one code practice that will assist however nothing beats what I define as a 'manual integration test'
change the clock on your development/testbed server and watch what happens when the time ticks over.
Don't get bogged down on specifics whether this is a 'coding practice' - Obviously you can't do this for every date on the calendar - pick the dates you are concerned with, be that the 29th Feb, end-of-month dates or daylight savings changeover dates.