Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Why Store Everything in UTC is Bad Advice

2.99/5 (27 votes)
4 Aug 2021CPOL7 min read 24.9K  
Let's explain the basics so developers understand weak spots of date and time handling
Handling dates and times is not an easy part of the software development and in many cases, we see implementations working only in limited scope of scenarios, usually failing later with data malformed in a way that is hard to fix.

Introduction

Date and especially time is tricky. Of course, the most easy and convenient way would be to declare UTC as the basis for all. And many (well written) articles are providing such advice. Unfortunately, it is a shortcut which doesn’t pay off.

If you search the internet for “how to store timestamp”, many articles give you the universal mantra “store everything in UTC”. But is it really so blazingly universal?

When UTC Might Work (And I Strongly Discourage to Use)

It is still good to have a discussion when we can use UTC only and why it doesn’t work for some use cases.

So let’s assume we store, according to given advice, everything in UTC.
Let’s have the record produced by our collector software which we want to store to database:

JavaScript
{
 DeviceId: "edge-001",
 Created: "2021–08–02T05:06:07.888Z",
 Values: "{‘a’: 1, ‘b’: 2}"
}

and let’s assume, our database is PostgreSQL and the target table looks like:

SQL
CREATE TABLE edge.telemetry(
 id SERIAL,
 device_id text
 created timestamp,
 values jsonb
);
Quote:

For sake of simplicity we omit any partitioning like the one provided by TimescaleDB

What happens when we store that record with respect to created field:

  • no conversion happens for created, any timezone information in created literal is ignored
  • the timestamp is stored as UTC with no conversion

Refer to PSQL docs, TimeStamps section.

Let’s Distill What Are the Prerequisites to Have this Scenario Working

  • input timestamp must be converted from local time to UTC prior storage
  • we have to use PostgreSQL or database with compatible behaviour
  • our PostgreSQL instance runs in UTC as well as OS
  • operating system, where our software generates records, must be set on proper date, time and timezone

So we have a couple of dependencies, some are development, some are rather architectural and the others are rather operations. Pretty complex by itself.

UTC and Visualisation

It should be quite straightforward how we visualise or represent, we take the created value and we show it in proper locale using locale conversion.

Practically, let’s assume we generate the record in Prague, CET zone +2hours, at time 10AM.
Our record will be:

JavaScript
{
 DeviceId: "dfa-edge-test-001",
 Created: "2021–08–02T08:00:00Z",
 Values: "{‘a’: 1, ‘b’: 2}"
}
 
Notice conversion to UTC, time is now ‘8AM’ And it gets stored in the same way.

Later on that day, our app engineer Pedro located now in Helsinki calls dev team (L4 support) located in London due to some problems in data he spotted.

> *Pedro*: "Hey, how are you, all good?"
> *Ronda*: "Hi, yes, thanks for asking, all well…any issue you need to help with?"
> *Pedro*: "Yes, actually I do have some hickups on 2nd August, at 10AM local time"
> *Ronda*: "Hi, let me look at it, just a minute"
>
> ….after a while…
>
> *Ronda*: "No, all looks the same, normative, no problem…"
> *Pedro*: "Hmm, strange, we don’t see KPIs, something is not calculated"
> *Ronda*: "But I see you had some issue with data collection at 8, didn’t you?"
> *Pedro*: "8? No, 10…no…"
> *Ronda*: "Aha, are you in **Prague** ?"
> *Pedro*: "No, I am in Helsinki, but **the device is in Prague**"
> *Ronda*: "Ok, let me change my chart browser timezone…"

From now on, everything will be fine, except one will see 8AM and the other 10AM. But knowing the difference will let them operate successfully with a bit of inconvenience.

That’s why Grafana has timezone setting in time filter, this is the only way to deal with UTC based timestamps with no TZ so the charts are making sense to operations or production personnel.

It might look like a slightly artificial dialogue, but I attempted quite a few such a discussions.
And it will become even more evident in case Pedro is not human, but the algorithm is running every day.

UTC Timestamps and Aggregations

Imagine we have three edge devices located in Tokyo, Berlin and Boston.
They are running fine, data collection is smooth, we distribute data over MQTT to single cloud gateway and store in cloud database, of course using plain UTC according to golden advice.

After couple of weeks of smooth production product management decides to develop feature leading to aggregated calculations on the telemetry. And the aggregations shall happen on time intervals representing shifts in factories. Typical collective KPIs like number of bad and good parts, etc.

So the dev team develops aggregation algorithms, maybe in Python, maybe using something else, anyway running as lambda function in the cloud.

Shifts are defined as:

Morning:   06 - 14
Afternoon: 14 - 22
Night:     22 - 06

We can see shifts are defined in the same way, in local time of course.

So what will happen at 23:59, when the aggregations are calculated?
Strange things… first of all, what is 23:59? Well it is in cloud, presumably UTC, so while Berlin and Tokyo today aftrenoon shift will be calculated, Boston doesn’t exist. So for the complete picture, we have to wait more than half a day (13 hours between Tokyo and Boston).

But isn’t it even worse even if we have all telemetries collected?
Well, it is going to be dependent on particular edge device setting.

Do you remember, we required all devices are on local time? Well, in fact it is not the strong requirement, but highly logical. Shifts are also defined in local time, so 6:00AM Tokyo is UTC 21:00.

So do you see it? Tokyo data from 6AM will be obviously reported as UTC 21:00, and as we have no information of the source timezone (it has been lost during conversion), algorithm in cloud will produce mad results having nothing to do with shifts.

Remedies to Plain UTC Storage - Keep TZ Info

Ah, previous example is creepy, one would say we have the quick fix, we set our devices to GMT timezone.
It might help, but one can feel it is not really helpful to have GMT in Tokyo device. It is like having breakfast at 1AM. Doable, but not very comfy.

There're essentially three remedies to the above problem:

  1. Add timezone information to your UTC timestamp (precisely or timezone+localtime or UTC+offset)
    1. Use embedded timezone format as ISO 8061 specifies
    2. Use local time and add timezone or UTC time offset as metadata
  2. Know the timezone of the device and fix the time afterwards
  3. Do calculations locally

But option 3) is viable only if the managers don’t want to see all factories common KPIs. Why? Well, because many aggregations are not commutative, e.g., median.

Option 2) requires conversions in different places which is not generic and any new design will handle time and zone in the same complex way.

Option 1) variants are functionally equal.
I do prefer and recommend in all my designs use of local time in ISO8064 format with timezone info and PostgreSQL timestamptz type for some reasons:

  • Timestamp with timezone is concise and self-contained.
  • It is optimized on database level and internally stored as UTC with original TZ.
  • It gives and keeps location specific information.

Carefully With Local Time

[Edit based on comments]

Obviously, there're examples of misuse of local time or UTC + timezone or UTC offset.

One of them I discovered in July 2021, when coming back from vacations spent on famous island Tenerife. My family is fully vaccinated except the youngest daughter, so we had to manage Covid test for her upon arrival in our homeland, Czech Republic.

So I opened reservation system reservatic.com, choose the place Frýdek (city in CZ) and time of sample 7:30AM. What was my surprise when I saw the confirmation in the browser for 5:30AM, later on received email notification for time 6:30AM! Two different times, neither in one you could take the sample since they don't work that early.

Two bugs in the same time, both coming from the fact the reservation system does conversion of chosen schedule time to local Czech time based on my browser timezone (yeah, wow). What I choose is treated as user's local time and then converted. I was at Canary islands, so time difference is 1 hour, that's clearly it.

Where is the mistake is obvious. Reservation systems, whether it is for Covid tests, flights, trains, events or whatever locally scheduled, they all have to work based on schedules based on local time at the destination (or at least present local times at the destination). If you ever booked a flight, it is always saying and informing user the times are local to the destination.

And, analogically, back to startdate, if we would like to reserve a call to astronaut in spaceship, the local time at the destination might be stardate, might be UTC or whatever is valid at the destination, which is starship.

So again, think of the use case twice before implementing date and time handling. Having timezone or UTC offset in the data is mostly highly appreciated, but there're scenarios when it doesn't make a sense. That's why naming columns keeping some date and time information as "timestamp" is not a good idea, it shall always say what it means explicitly.

Conclusion

I believe all the mess comes from the naming. UTC stands for "Coordinated Universal Time", and the time is coordinated, but far away from universal.

Our space is 4 dimensional and an attempt to remove space dimensions and create unilaterally universal time coordinate just confuses people, so they tend to think "aha, UTC is my Startrek startdate, so it solves everything".

It would be nice, unfortunately it is not the case, it confirms we have to always think of the use case before choosing the solution.

References

Photo by Vince Veras on Unsplash

History

  • 3rd August, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)