Most active commenters
  • zzzeek(6)
  • welder(3)
  • jeltz(3)

←back to thread

189 points GordonS | 28 comments | | HN request time: 1.072s | source | bottom
1. zzzeek ◴[] No.19533216[source]
I completely disagree with this advice:

> However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it.

yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.

> It is merely a wall time. Maybe it’s UTC,

no, it's UTC.

> or maybe the developer stored it in local time.

right, maybe the application was not programmed correctly, that is always a possibility. But if they are using the timestamp with timezone datatype, it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions, as the complexity of its behaviors and decisions using this datatype are described in this article. Does the application come up with the wrong answer if the server it runs on, or the webserver container, is mistakenly set to Eastern Timezone rather than UTC? Is the application ensuring that it knows the correct timezone of timestamps as they enter the application, and if so, why isn't it converting to UTC at that moment? Is the application otherwise receiving datetimes of arbitrary timezone, letting them move around business and display logic in their raw form, and waiting until they enter the database to be correctly normalized? that's too late.

A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.

replies(7): >>19533282 #>>19533498 #>>19534065 #>>19534073 #>>19534203 #>>19534221 #>>19551427 #
2. grzm ◴[] No.19533282[source]
> "because I assume the application is written to correctly handle datetimes."

> "it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions"

If you're assuming the application is correctly written to handle datetimes, you can safely assume it's also correctly using time zone aware date time data types, and using their database driver correctly, so there's no worry about PostgreSQL's assumptions or implicit conversions.

Datetime handling is hard. However, most of the issues you're talking about here are independent of whether you're storing instants as timestamp or timestamptz in PostgreSQL. Persistence is only one layer in an application stack.

As an aside, I haven't looked, but if someone hasn't already created a "Zoned Datetime" datatype extension for Postgres, it'd certainly be interesting to try.

replies(1): >>19534655 #
3. colanderman ◴[] No.19533498[source]
Your assumptions are wrong. PostgreSQL's time functions provide exactly the semantics described in the article. If you pretend the two types work the other way around, you will get incorrect results when using e.g. `AT TIME ZONE`.
replies(1): >>19534151 #
4. welder ◴[] No.19534065[source]
Edit: I'm being downvoted, but I know more than you about timestamps. Check my bio before downvoting.

zzzeek's comment is correct, he should be upvoted not down.

> A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.

This is 100% true, about handling datetimes and unicode. I built a time tracking application which has proved this true many times.

> I completely disagree with this advice

Me too, the article op clearly doesn't know what he's talking about.

replies(2): >>19534406 #>>19535306 #
5. djrobstep ◴[] No.19534073[source]
Why is this correct take being downvoted?

Add consistency to your codebase by saving all timestamps in UTC.

Edited: Storage size is the same.

replies(2): >>19534077 #>>19535382 #
6. grzm ◴[] No.19534077[source]
There's no difference in storage between timestamp and timestamptz in PostgreSQL. Both are 8 bytes.
7. welder ◴[] No.19534151[source]
No, he's right. Before you downvote me, check my bio for my experience.

Literally everything in zzzeek's comment I've personally seen causing live bugs in production code. Always use UTC without time zone and always convert to UTC as soon as possible in your application.

> Your assumptions are wrong.

zzzeek isn't assuming, he's speaking from experience. He wrote the SQLAlchemy library, so he knows what he's talking about.

replies(1): >>19535277 #
8. hathawsh ◴[] No.19534203[source]
Thank you for chiming in. In my SQLAlchemy+Postgres app I've chosen to store datetimes as timestamp without timezone columns where the time zone is always UTC. I'm quite happy with that choice. The last thing I want is for Postgres to start implicitly adjusting timestamps. By making the timestamps non-timezone-aware, I'm telling Postgres that if I ever want it to do any time zone conversion, I'll ask for the conversion explicitly, never implicitly. Explicit time zone conversion works quite nicely in Postgres. I want no magic.
9. alboy ◴[] No.19534221[source]
> yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.

This seems to assume that only past events may be stored in a database. How would one schedule events at some clock-on-the-wall point in the future when either the exact applicable time zone is unknown or the mapping of a particular time zone to UTC isn't guaranteed to remain unchanged?

replies(1): >>19535665 #
10. mypalmike ◴[] No.19534406[source]
> A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application... within the application there is no ambiguity

> This is 100% true... I built a time tracking application which has proved this true many times.

The potential for timezone rule changes mean that future events specified in local timezones are fundamentally ambiguous with regard to any absolute timeframe. Thus, normalization of such events to UTC can result in errors.

Consider TicketMaster. They have a database of future events around the globe, sometimes years in advance.

Imagine a band scheduled to play at 24-Jul-2020 at 8pm in Tokyo. Japan's government was considering reintroducing daylight savings time in 2020 so as to reduce heat stress on the 2020 Olympics athletes. In August 2018, they decided against it, but it could have happened. Let's pretend it did happen.

If you had stored this band's gig in a database a few months ago (before the decision to make the change), your UTC-normalized timestamp would be 2020-07-24T11:00Z (UTC+9). But the gig is going to happen at 8pm local time regardless of the rule change. The actual event time is 2020-07-24T10:00Z (UTC+10), an hour earlier than originally scheduled.

Normalizing to UTC makes sense in so many cases that it can seem to be a silver bullet for all cases. But there are definitely business cases that do not work well with this approach.

replies(3): >>19534465 #>>19534843 #>>19535755 #
11. Izkata ◴[] No.19534465{3}[source]
You don't even need rare(ish) occurrences like rule changes to trigger this. Consider recurring events, like "every other Monday at 1pm", and what happens to them at the twice-yearly DST shifts.
replies(1): >>19535057 #
12. tofflos ◴[] No.19534655[source]
You've nailed the problem. Too many layers in the application stack are tampering with time. As the author of the application I now have to understand every layer and all their nuances. All the good intentions of the database authors, driver authors and library authors are making it worse. Just leave my time alone and let me deal with it in the application.
13. welder ◴[] No.19534843{3}[source]
Yes, I forgot to make an exception for scheduling future local times, so use utc except for future local datetimes.
replies(1): >>19534966 #
14. jasode ◴[] No.19534966{4}[source]
I didn't downvote any of your comments but "In this case [...] store the time zone string" -- seems to contradict your earlier advice of "Always use UTC without time zone".

The issue is that there are 2 types of future times:

(1) scientific-absolute times -- like future solar eclipses; you can normalize these types of times to a future UTC with no ambiguity (other than esoteric adjustments such as TAI leap seconds)

(2) cultural-relative-wall-clock times -- like future recurring appointments, future New Year's midnight celebrations, etc.; These can't be normalized to UTC perfectly because one doesn't know what the exact future UTC time will be due to unknown government changes to DST rules or TZ boundaries. If you simply normalize to UTC, you will throw away information and it will cause bugs. Your eventual addendum of "store the time zone string" basically acknowledges this edge case for future datetimes.

15. wiredfool ◴[] No.19535057{4}[source]
For an app that I'm working on, we're storing Dates/Times canonically as a a recurrence rule of Dates, plus a Time/several start times in event local time zone.

Internally for indexing purposes, we expand the dates + time into a series of datetimes at UTC, at which point we can easily search on them.

16. jeltz ◴[] No.19535277{3}[source]
No, zzzeek is wrong in this case. The general advice from all PostgreSQL contributors and consultants I have heard talking on the subject is to use timestamptz by default and only use timestamp in a few select cases. And PostgreSQL itself uses timestamptz in its system views (the pg_stat_* views for example).

Consider the following queries, both ran on a machine where everything is in UTC (I always configure my production machines in UTC):

    SELECT date(('2019-01-01T23:00:00Z'::timestamptz) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-02
   (1 row)


    SELECT date(('2019-01-01T23:00:00Z'::timestamp) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-01
    (1 row)
The first one is likely what you want. When would you write a query like this? For example when you have a non-UTC business day and want to group sales by business day. Doing that kind of aggregation in the application is ugly, error prone and removes many of the benefits of having an SQL database.
replies(2): >>19535552 #>>19535672 #
17. twic ◴[] No.19535306[source]
FWIW, i downvoted you purely because of your edit. If you're right, explain carefully why, don't appeal to your own authority.
18. jeltz ◴[] No.19535382[source]
I downnvoted because it is dangerous advice. If you use timestamp in PostgreSQL rather than timestamptz you will in run into even more time related bugs due to misusing PostgreSQL's data types, for example it will make AT TIME ZONE counterinutitve.
19. zzzeek ◴[] No.19535552{4}[source]
The blog post and this comment are oriented towards direct commandline use of PostgreSQL, and to the extent you want to use the database as your primary front facing programming API, you need all the conveniences of opinionated types and conversion functions. But when the database is in its more traditional role as persistence for a large layer of business and presentation logic running in a front facing process, having a stricter model with less chance of implicit decisions made by the database itself is better. I think that's part of the disconnect here why some people are downmodding like crazy while others are agreeing. People are coming from two different perspectives, perhaps data scientists vs backend engineers.
replies(1): >>19542466 #
20. zzzeek ◴[] No.19535665[source]
You would still not be using "TIMESTAMP WITH TIME ZONE", and you would not use a TIMESTAMP of any kind - for a schedule, it's clearest if you have three columns, for date, scheduled time, and the desired timezone in which the event is scheduled. That removes any ambiguity that when one needs to deal with these times for date arithmetic, as in the extremely common case of sorting them, the developer is forced to coerce the three values into ad-hoc UTC or TZ-aware timestamp objects in the application in order to sort them.

More realistically, for an application that is storing only near future dates within a limited geographic range, you can probably get away with using UTC for storage, as that's what we did at MLB anyway, but today I would likely not give this advice.

21. zzzeek ◴[] No.19535672{4}[source]
I don't use the "AT TIME ZONE" operator. The date I'd be feeding into my query is a UTC timestamp that I converted early in my Python application (Edited for clarity). Similarly, I get UTC datetimes from the database into my Python application and handle display logic in the application, not in the database. (and of course we are not talking about future timestamps which neither is this blog post).

edit edit: oh, you mean you want PG to actually show you groupings of business days. Well sure, that is a highly specialized kind of query that has to know where holidays are and such depending on what you are trying to do (if you need to count business days, perhaps). If I were writing such a query, I'd have to be very careful regardless so I'd ensure the appropriate casts are used against my source timestamps so that they are properly interpreted by PG's calendar-aware functions as well as how to determine what's a business day beyond simple weekend days.

Using PG's calendaring functions is nice but in reality this is more of a data analyst feature, which is what PG spends a lot of time advertising. Use timestamptz all you want in data analyst queries but in your CREATE TABLE use timestamp.

replies(2): >>19535756 #>>19570902 #
22. zzzeek ◴[] No.19535755{3}[source]
> The potential for timezone rule changes mean that future events specified in local timezones are fundamentally ambiguous with regard to any absolute timeframe. Thus, normalization of such events to UTC can result in errors.

Ok but you still dont use TIMESTAMP WITH TIME ZONE to store those either. For future events where you want to protect against time zone changes, you store the intended time which is what this thread is calling "wall clock time". I think you shouldn't use a timestamp object for such a time because it's not a timestamp at all, it's a human description of something that has a moving linkage to a timestamp. However, this makes being able to search and filter for these events more challenging if the database lacks functions to coerce these values into current UTC timestamps, or if you are trying to optimize for indexes that don't make use of these functions.

> Consider TicketMaster. They have a database of future events around the globe, sometimes years in advance.

Most of my datetime experience is from working at MLB.com with Ticketmaster servers in the early 2000s. We used Oracle and we stored the next season's worth of baseball schedules in straight UTC, in addition to storing the original "wall clock plus timezone" version of that time, after our first year online where we were stupidly storing in Oracle's default timezone which was of course EST, and on October 30th at 2 AM the whole site broke because in the perspective of some of our database queries time had moved one hour backwards.

It was not an option to store only "wall clock time" here because we needed to be able to search and sort these events in the database and Oracle did not have the calendaring / indexing capabilities to make it feasible to be coercing from wall clock to UTC on the database side. We were not too worried about timezones being changed within the US/Canada within the next year, as in the rare occurrence that this happened, we simply ran an UPDATE for the UTC timestamps in question based on their desired timezone. I notice the blog post we are talking about here makes no mention of future dates or the challenges in storing them accurately while still being able to query them efficiently; for future dates we had to store both the UTC and the wall clock data.

> Normalizing to UTC makes sense in so many cases that it can seem to be a silver bullet for all cases.

my disagreement with this blog post was first triggered in that it is an "You should always do X" kind of post. The word "future" doesn't even appear in the text of this post nor do the words "index" or "sort", which is a funny thing for an article that seems to assure the inexperienced reader that this is all you need to know about timestamps in databases.

23. colanderman ◴[] No.19535756{5}[source]
The parent's example wasn't about display logic, it was about aggregation. I trust you don't do that at the application layer.
replies(1): >>19535788 #
24. ◴[] No.19535788{6}[source]
25. jeltz ◴[] No.19542466{5}[source]
My background is a backend engineer and I would never use SQL for presentation (outside the occasional custom reporting query), but that does not mean you should not use timestamptz. You should just use a database driver which treats a timstamptz as an absolute point in time and throws away the time zone after parsing. By using timestamptz with a good PostgreSQL driver you ensure that PostgreSQL's own timefunctions work as expected while also getting sane data types from your queries, data types where time zone should be handled in the presentation layers.

In the industries I have worked in aggregating events per date (according to some time zone) has been a common operation and that would have been error prone with the timestamp type.

replies(1): >>19542916 #
26. zzzeek ◴[] No.19542916{6}[source]
I think there are mistakes to be made with the use of either type. On the Python side using "timestamp with time zone" means we're going to get back timezone-aware datetime objects, which are perfectly fine but they need to be handled correctly on the application side. I would blame the implicit decisions made by PostgreSQL of operators like "AT TIME ZONE", choosing to make the same operator do two different things based on the flavor of timestamp as well as returning the opposite type of type from that given that makes the whole area more difficult than it has to be. On MySQL, the convert_tz() function is clear and unambiguous, you pass it the timestamp, and the "from" and "to" timezones. The difference is that PostgreSQL is trying to be an "Object relational database" with a lot of SQL language complexity whereas MySQL is just trying to be a persistence engine.
27. fphilipe ◴[] No.19551427[source]
> > However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it.

> yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.

> > It is merely a wall time. Maybe it’s UTC,

> no, it's UTC.

No, you don't know that if you're looking at the database without knowledge of the app's conventions.

I don't understand your argument about ambiguity and implicit conversions. Yes, if your session or DB is configured to something other than UTC, it will convert the timestamptz values in your output to that time zone's offset, but it'll still include the offset.

Thus, a value "2019-04-01 00:00:00+00" that is output when using timestamptz with UTC (compared to "2019-04-01 00:00:00" when using timestamp), is output as "2019-03-31 19:00:00-05" when using EST. Both values represent the same moment in time, obviously, and are equivalent. Assuming your app and ORM convert everything to UTC, that shouldn't matter at all.

28. ants_a ◴[] No.19570902{5}[source]
I think the described use of timestamp is driven by shortcomings of Pythons datetime library not having proper timezone handling. I don't think an approach where database design is driven by the opinions of one single application is good. Timestamptz datatype is universally understood to store an unambiguous point in time, which is usually what is intended for recording events. If the output formatting done by the database is a problem for the application, set client_timezone configuration to UTC.