←back to thread

189 points GordonS | 8 comments | | HN request time: 0.605s | source | bottom
Show context
zzzeek ◴[] No.19533216[source]
I completely disagree with this advice:

> However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it.

yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.

> It is merely a wall time. Maybe it’s UTC,

no, it's UTC.

> or maybe the developer stored it in local time.

right, maybe the application was not programmed correctly, that is always a possibility. But if they are using the timestamp with timezone datatype, it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions, as the complexity of its behaviors and decisions using this datatype are described in this article. Does the application come up with the wrong answer if the server it runs on, or the webserver container, is mistakenly set to Eastern Timezone rather than UTC? Is the application ensuring that it knows the correct timezone of timestamps as they enter the application, and if so, why isn't it converting to UTC at that moment? Is the application otherwise receiving datetimes of arbitrary timezone, letting them move around business and display logic in their raw form, and waiting until they enter the database to be correctly normalized? that's too late.

A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.

replies(7): >>19533282 #>>19533498 #>>19534065 #>>19534073 #>>19534203 #>>19534221 #>>19551427 #
colanderman ◴[] No.19533498[source]
Your assumptions are wrong. PostgreSQL's time functions provide exactly the semantics described in the article. If you pretend the two types work the other way around, you will get incorrect results when using e.g. `AT TIME ZONE`.
replies(1): >>19534151 #
welder ◴[] No.19534151[source]
No, he's right. Before you downvote me, check my bio for my experience.

Literally everything in zzzeek's comment I've personally seen causing live bugs in production code. Always use UTC without time zone and always convert to UTC as soon as possible in your application.

> Your assumptions are wrong.

zzzeek isn't assuming, he's speaking from experience. He wrote the SQLAlchemy library, so he knows what he's talking about.

replies(1): >>19535277 #
1. jeltz ◴[] No.19535277[source]
No, zzzeek is wrong in this case. The general advice from all PostgreSQL contributors and consultants I have heard talking on the subject is to use timestamptz by default and only use timestamp in a few select cases. And PostgreSQL itself uses timestamptz in its system views (the pg_stat_* views for example).

Consider the following queries, both ran on a machine where everything is in UTC (I always configure my production machines in UTC):

    SELECT date(('2019-01-01T23:00:00Z'::timestamptz) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-02
   (1 row)


    SELECT date(('2019-01-01T23:00:00Z'::timestamp) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-01
    (1 row)
The first one is likely what you want. When would you write a query like this? For example when you have a non-UTC business day and want to group sales by business day. Doing that kind of aggregation in the application is ugly, error prone and removes many of the benefits of having an SQL database.
replies(2): >>19535552 #>>19535672 #
2. zzzeek ◴[] No.19535552[source]
The blog post and this comment are oriented towards direct commandline use of PostgreSQL, and to the extent you want to use the database as your primary front facing programming API, you need all the conveniences of opinionated types and conversion functions. But when the database is in its more traditional role as persistence for a large layer of business and presentation logic running in a front facing process, having a stricter model with less chance of implicit decisions made by the database itself is better. I think that's part of the disconnect here why some people are downmodding like crazy while others are agreeing. People are coming from two different perspectives, perhaps data scientists vs backend engineers.
replies(1): >>19542466 #
3. zzzeek ◴[] No.19535672[source]
I don't use the "AT TIME ZONE" operator. The date I'd be feeding into my query is a UTC timestamp that I converted early in my Python application (Edited for clarity). Similarly, I get UTC datetimes from the database into my Python application and handle display logic in the application, not in the database. (and of course we are not talking about future timestamps which neither is this blog post).

edit edit: oh, you mean you want PG to actually show you groupings of business days. Well sure, that is a highly specialized kind of query that has to know where holidays are and such depending on what you are trying to do (if you need to count business days, perhaps). If I were writing such a query, I'd have to be very careful regardless so I'd ensure the appropriate casts are used against my source timestamps so that they are properly interpreted by PG's calendar-aware functions as well as how to determine what's a business day beyond simple weekend days.

Using PG's calendaring functions is nice but in reality this is more of a data analyst feature, which is what PG spends a lot of time advertising. Use timestamptz all you want in data analyst queries but in your CREATE TABLE use timestamp.

replies(2): >>19535756 #>>19570902 #
4. colanderman ◴[] No.19535756[source]
The parent's example wasn't about display logic, it was about aggregation. I trust you don't do that at the application layer.
replies(1): >>19535788 #
5. ◴[] No.19535788{3}[source]
6. jeltz ◴[] No.19542466[source]
My background is a backend engineer and I would never use SQL for presentation (outside the occasional custom reporting query), but that does not mean you should not use timestamptz. You should just use a database driver which treats a timstamptz as an absolute point in time and throws away the time zone after parsing. By using timestamptz with a good PostgreSQL driver you ensure that PostgreSQL's own timefunctions work as expected while also getting sane data types from your queries, data types where time zone should be handled in the presentation layers.

In the industries I have worked in aggregating events per date (according to some time zone) has been a common operation and that would have been error prone with the timestamp type.

replies(1): >>19542916 #
7. zzzeek ◴[] No.19542916{3}[source]
I think there are mistakes to be made with the use of either type. On the Python side using "timestamp with time zone" means we're going to get back timezone-aware datetime objects, which are perfectly fine but they need to be handled correctly on the application side. I would blame the implicit decisions made by PostgreSQL of operators like "AT TIME ZONE", choosing to make the same operator do two different things based on the flavor of timestamp as well as returning the opposite type of type from that given that makes the whole area more difficult than it has to be. On MySQL, the convert_tz() function is clear and unambiguous, you pass it the timestamp, and the "from" and "to" timezones. The difference is that PostgreSQL is trying to be an "Object relational database" with a lot of SQL language complexity whereas MySQL is just trying to be a persistence engine.
8. ants_a ◴[] No.19570902[source]
I think the described use of timestamp is driven by shortcomings of Pythons datetime library not having proper timezone handling. I don't think an approach where database design is driven by the opinions of one single application is good. Timestamptz datatype is universally understood to store an unambiguous point in time, which is usually what is intended for recording events. If the output formatting done by the database is a problem for the application, set client_timezone configuration to UTC.