Timestamps and Time Zones in PostgreSQL (2016)

(phili.pe)

189 points GordonS | 3 comments | 30 Mar 19 21:47 UTC | HN request time: 0.792s | source

Show context

zzzeek ◴[31 Mar 19 01:50 UTC] No.19533216[source]▶

>>19532090 (OP) #

I completely disagree with this advice:

> However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it.

yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.

> It is merely a wall time. Maybe it’s UTC,

no, it's UTC.

> or maybe the developer stored it in local time.

right, maybe the application was not programmed correctly, that is always a possibility. But if they are using the timestamp with timezone datatype, it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions, as the complexity of its behaviors and decisions using this datatype are described in this article. Does the application come up with the wrong answer if the server it runs on, or the webserver container, is mistakenly set to Eastern Timezone rather than UTC? Is the application ensuring that it knows the correct timezone of timestamps as they enter the application, and if so, why isn't it converting to UTC at that moment? Is the application otherwise receiving datetimes of arbitrary timezone, letting them move around business and display logic in their raw form, and waiting until they enter the database to be correctly normalized? that's too late.

A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.

replies(7): >>19533282 #>>19533498 #>>19534065 #>>19534073 #>>19534203 #>>19534221 #>>19551427 #

colanderman ◴[31 Mar 19 02:59 UTC] No.19533498[source]▶

>>19533216 #

Your assumptions are wrong. PostgreSQL's time functions provide exactly the semantics described in the article. If you pretend the two types work the other way around, you will get incorrect results when using e.g. `AT TIME ZONE`.

replies(1): >>19534151 #

welder ◴[31 Mar 19 06:02 UTC] No.19534151[source]▶

>>19533498 #

No, he's right. Before you downvote me, check my bio for my experience.

Literally everything in zzzeek's comment I've personally seen causing live bugs in production code. Always use UTC without time zone and always convert to UTC as soon as possible in your application.

> Your assumptions are wrong.

zzzeek isn't assuming, he's speaking from experience. He wrote the SQLAlchemy library, so he knows what he's talking about.

replies(1): >>19535277 #

jeltz ◴[31 Mar 19 12:38 UTC] No.19535277[source]▶

>>19534151 #

No, zzzeek is wrong in this case. The general advice from all PostgreSQL contributors and consultants I have heard talking on the subject is to use timestamptz by default and only use timestamp in a few select cases. And PostgreSQL itself uses timestamptz in its system views (the pg_stat_* views for example).

Consider the following queries, both ran on a machine where everything is in UTC (I always configure my production machines in UTC):

    SELECT date(('2019-01-01T23:00:00Z'::timestamptz) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-02
   (1 row)


    SELECT date(('2019-01-01T23:00:00Z'::timestamp) AT TIME ZONE 'Europe/Berlin');
        date    
    ------------
     2019-01-01
    (1 row)

The first one is likely what you want. When would you write a query like this? For example when you have a non-UTC business day and want to group sales by business day. Doing that kind of aggregation in the application is ugly, error prone and removes many of the benefits of having an SQL database.

replies(2): >>19535552 #>>19535672 #

1. zzzeek ◴[31 Mar 19 13:44 UTC] No.19535552[source]▶

>>19535277 #

The blog post and this comment are oriented towards direct commandline use of PostgreSQL, and to the extent you want to use the database as your primary front facing programming API, you need all the conveniences of opinionated types and conversion functions. But when the database is in its more traditional role as persistence for a large layer of business and presentation logic running in a front facing process, having a stricter model with less chance of implicit decisions made by the database itself is better. I think that's part of the disconnect here why some people are downmodding like crazy while others are agreeing. People are coming from two different perspectives, perhaps data scientists vs backend engineers.

replies(1): >>19542466 #

2. jeltz ◴[01 Apr 19 12:04 UTC] No.19542466[source]▶

>>19535552 (TP) #

My background is a backend engineer and I would never use SQL for presentation (outside the occasional custom reporting query), but that does not mean you should not use timestamptz. You should just use a database driver which treats a timstamptz as an absolute point in time and throws away the time zone after parsing. By using timestamptz with a good PostgreSQL driver you ensure that PostgreSQL's own timefunctions work as expected while also getting sane data types from your queries, data types where time zone should be handled in the presentation layers.

In the industries I have worked in aggregating events per date (according to some time zone) has been a common operation and that would have been error prone with the timestamp type.

replies(1): >>19542916 #

3. zzzeek ◴[01 Apr 19 13:18 UTC] No.19542916[source]▶

>>19542466 #

I think there are mistakes to be made with the use of either type. On the Python side using "timestamp with time zone" means we're going to get back timezone-aware datetime objects, which are perfectly fine but they need to be handled correctly on the application side. I would blame the implicit decisions made by PostgreSQL of operators like "AT TIME ZONE", choosing to make the same operator do two different things based on the flavor of timestamp as well as returning the opposite type of type from that given that makes the whole area more difficult than it has to be. On MySQL, the convert_tz() function is clear and unambiguous, you pass it the timestamp, and the "from" and "to" timezones. The difference is that PostgreSQL is trying to be an "Object relational database" with a lot of SQL language complexity whereas MySQL is just trying to be a persistence engine.

↑