←back to thread

189 points GordonS | 4 comments | | HN request time: 0.641s | source
Show context
zzzeek ◴[] No.19533216[source]
I completely disagree with this advice:

> However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it.

yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.

> It is merely a wall time. Maybe it’s UTC,

no, it's UTC.

> or maybe the developer stored it in local time.

right, maybe the application was not programmed correctly, that is always a possibility. But if they are using the timestamp with timezone datatype, it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions, as the complexity of its behaviors and decisions using this datatype are described in this article. Does the application come up with the wrong answer if the server it runs on, or the webserver container, is mistakenly set to Eastern Timezone rather than UTC? Is the application ensuring that it knows the correct timezone of timestamps as they enter the application, and if so, why isn't it converting to UTC at that moment? Is the application otherwise receiving datetimes of arbitrary timezone, letting them move around business and display logic in their raw form, and waiting until they enter the database to be correctly normalized? that's too late.

A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.

replies(7): >>19533282 #>>19533498 #>>19534065 #>>19534073 #>>19534203 #>>19534221 #>>19551427 #
colanderman ◴[] No.19533498[source]
Your assumptions are wrong. PostgreSQL's time functions provide exactly the semantics described in the article. If you pretend the two types work the other way around, you will get incorrect results when using e.g. `AT TIME ZONE`.
replies(1): >>19534151 #
welder ◴[] No.19534151[source]
No, he's right. Before you downvote me, check my bio for my experience.

Literally everything in zzzeek's comment I've personally seen causing live bugs in production code. Always use UTC without time zone and always convert to UTC as soon as possible in your application.

> Your assumptions are wrong.

zzzeek isn't assuming, he's speaking from experience. He wrote the SQLAlchemy library, so he knows what he's talking about.

replies(1): >>19535277 #
jeltz ◴[] No.19535277[source]
No, zzzeek is wrong in this case. The general advice from all PostgreSQL contributors and consultants I have heard talking on the subject is to use timestamptz by default and only use timestamp in a few select cases. And PostgreSQL itself uses timestamptz in its system views (the pg_stat_* views for example).

Consider the following queries, both ran on a machine where everything is in UTC (I always configure my production machines in UTC):

    SELECT date(('2019-01-01T23:00:00Z'::timestamptz) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-02
   (1 row)


    SELECT date(('2019-01-01T23:00:00Z'::timestamp) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-01
    (1 row)
The first one is likely what you want. When would you write a query like this? For example when you have a non-UTC business day and want to group sales by business day. Doing that kind of aggregation in the application is ugly, error prone and removes many of the benefits of having an SQL database.
replies(2): >>19535552 #>>19535672 #
1. zzzeek ◴[] No.19535672[source]
I don't use the "AT TIME ZONE" operator. The date I'd be feeding into my query is a UTC timestamp that I converted early in my Python application (Edited for clarity). Similarly, I get UTC datetimes from the database into my Python application and handle display logic in the application, not in the database. (and of course we are not talking about future timestamps which neither is this blog post).

edit edit: oh, you mean you want PG to actually show you groupings of business days. Well sure, that is a highly specialized kind of query that has to know where holidays are and such depending on what you are trying to do (if you need to count business days, perhaps). If I were writing such a query, I'd have to be very careful regardless so I'd ensure the appropriate casts are used against my source timestamps so that they are properly interpreted by PG's calendar-aware functions as well as how to determine what's a business day beyond simple weekend days.

Using PG's calendaring functions is nice but in reality this is more of a data analyst feature, which is what PG spends a lot of time advertising. Use timestamptz all you want in data analyst queries but in your CREATE TABLE use timestamp.

replies(2): >>19535756 #>>19570902 #
2. colanderman ◴[] No.19535756[source]
The parent's example wasn't about display logic, it was about aggregation. I trust you don't do that at the application layer.
replies(1): >>19535788 #
3. ◴[] No.19535788[source]
4. ants_a ◴[] No.19570902[source]
I think the described use of timestamp is driven by shortcomings of Pythons datetime library not having proper timezone handling. I don't think an approach where database design is driven by the opinions of one single application is good. Timestamptz datatype is universally understood to store an unambiguous point in time, which is usually what is intended for recording events. If the output formatting done by the database is a problem for the application, set client_timezone configuration to UTC.