I guess to store a timestamp and a timezone I should use a timestamp a second column with the timezone value (maybe a string, but I'd like a clean way of telling the database I want an 'enum' but to dynamically just add new values as it sees them; yes for PostgreSQL I'd do that with a secondary lookup table and an insert/update trigger).
It is interesting that a timestamp without timezone is internally represented a timestamptz at UTC.
Postgres is also has some great functionality converting between timezones:
https://www.postgresql.org/docs/11/functions-datetime.html#F...
> However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it.
yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.
> It is merely a wall time. Maybe it’s UTC,
no, it's UTC.
> or maybe the developer stored it in local time.
right, maybe the application was not programmed correctly, that is always a possibility. But if they are using the timestamp with timezone datatype, it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions, as the complexity of its behaviors and decisions using this datatype are described in this article. Does the application come up with the wrong answer if the server it runs on, or the webserver container, is mistakenly set to Eastern Timezone rather than UTC? Is the application ensuring that it knows the correct timezone of timestamps as they enter the application, and if so, why isn't it converting to UTC at that moment? Is the application otherwise receiving datetimes of arbitrary timezone, letting them move around business and display logic in their raw form, and waiting until they enter the database to be correctly normalized? that's too late.
A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.
> "it is 100% more likely they have made mistakes in doing so because it is much more confusing to have to fight against PostgreSQL's assumptions and implicit conversions"
If you're assuming the application is correctly written to handle datetimes, you can safely assume it's also correctly using time zone aware date time data types, and using their database driver correctly, so there's no worry about PostgreSQL's assumptions or implicit conversions.
Datetime handling is hard. However, most of the issues you're talking about here are independent of whether you're storing instants as timestamp or timestamptz in PostgreSQL. Persistence is only one layer in an application stack.
As an aside, I haven't looked, but if someone hasn't already created a "Zoned Datetime" datatype extension for Postgres, it'd certainly be interesting to try.
If datetime libraries were more thoroughly patterned after that of Java 8, there would be less need for the 'misconceptions programmers believe about datetimes' posts, and less need for the 'just use UTC' or 'maybe always-UTC isn't the right advice' posts, because the data-types are thoroughly reflective of their intended use, and their misuse is harder than in other APIs.
[1] https://docs.oracle.com/javase/8/docs/api/java/time/package-... [2] https://news.ycombinator.com/item?id=12364805
What are you trying to achieve?
1) the record naturally has the semantics of a timestamp that is chiefly used for comparisons between values (e.g. to sort, min, max, subtract to get an approximate interval) and whose conversions to human-meaningful calendrical datetime are not needed or only need to be approximate (given unix timestamps' lack of leap seconds),
AND 2) you've thought about overflow (of both unix time and your column), signed-ness, sentinel and special values, and NULL handling,
AND 3) you're not enthused by the ANSI SQL definitions of TIMESTAMP types or think you, your RDBMS, your database driver, your programming language, your datetime library, or the code's next maintainer might screw something up.
If only the first factor isn't true, use an RFC 3339 UTC timestamp in a string column instead. If only the second factor isn't true, use one of the SQL TIMESTAMP types instead, so you can pick up a better-defined suite of behaviors.
Out of these, 'TIMESTAMP' and 'TIMESTAMP WITH TIME ZONE' do exactly what you'd expect without needing a big tutorial.
'DATE' is a 'TIMESTAMP' that can't store fractional seconds.
And 'TIMESTAMP WITH LOCAL TIMEZONE' is an unmitigated footgun that you'll probably misuse.
You want a column that can store both timestamp and timezone, so why not just have 2 columns? You already have to convert the timestamp into the user's timezone when rendering because the user's timezone can change. Storing timestamps with timezone just complicates things, for ex:
display_in_user_zone(TIMESTAMP UTC)
display_in_user_zone(TIMESTAMP_WITH_TIMEZONE)
The above is exactly the same, but the UTC one will prevent bugs in the long run.
The only time it's acceptable to store time with timezone is with DATES that do not have a TIME component.
If you really need to display the timestamp in the timezone the user originally entered, store the original timezone along with every timestamp.
It really should have been called "TIMESTAMP IN UTC" or "NORMALIZED TIMESTAMP" or something else instead. It's especially dangerous because the results you get back from using it look correct with casual testing and you might not notice that your database is subtly wrong for a long time.
zzzeek's comment is correct, he should be upvoted not down.
> A correctly coded application does explicit normalization of datetime values the moment they enter the boundaries of the application, just like it would also convert encoded bytes into a unicode-aware strings up front, so that within the application there is no ambiguity, both when persisting and when handling.
This is 100% true, about handling datetimes and unicode. I built a time tracking application which has proved this true many times.
> I completely disagree with this advice
Me too, the article op clearly doesn't know what he's talking about.
What's the difference? Duration-time does not have leap-seconds.
I'll say it again: Duration-time does not have leap-seconds.
Once more: Duration-time does not have leap-seconds.
For duration-time without leap-seconds, give me TAI:
> International Atomic Time (TAI) is a statistical atomic time scale based on a large number of clocks operating at standards laboratories around the world that is maintained by the Bureau International des Poids et Mesures; its unit interval is exactly one SI second at sea level. The origin of TAI is such that UT1-TAI is approximately 0 (zero) on January 1, 1958. TAI is not adjusted for leap seconds. It is recommended by the BIPM that systems which cannot handle leapseconds use TAI instead.
https://tycho.usno.navy.mil/leapsec.html
Document that it is TAI.
Include in the documentation that TAI does not include leap-seconds.
Thank you.
This should be "A Valid Use Case For timestamp with time zone".
This article gets it all wrong. The only time it's acceptable to store time with time zone is DATES that do not have a TIME component.
Literally everything in zzzeek's comment I've personally seen causing live bugs in production code. Always use UTC without time zone and always convert to UTC as soon as possible in your application.
> Your assumptions are wrong.
zzzeek isn't assuming, he's speaking from experience. He wrote the SQLAlchemy library, so he knows what he's talking about.
This seems to assume that only past events may be stored in a database. How would one schedule events at some clock-on-the-wall point in the future when either the exact applicable time zone is unknown or the mapping of a particular time zone to UTC isn't guaranteed to remain unchanged?
> This is 100% true... I built a time tracking application which has proved this true many times.
The potential for timezone rule changes mean that future events specified in local timezones are fundamentally ambiguous with regard to any absolute timeframe. Thus, normalization of such events to UTC can result in errors.
Consider TicketMaster. They have a database of future events around the globe, sometimes years in advance.
Imagine a band scheduled to play at 24-Jul-2020 at 8pm in Tokyo. Japan's government was considering reintroducing daylight savings time in 2020 so as to reduce heat stress on the 2020 Olympics athletes. In August 2018, they decided against it, but it could have happened. Let's pretend it did happen.
If you had stored this band's gig in a database a few months ago (before the decision to make the change), your UTC-normalized timestamp would be 2020-07-24T11:00Z (UTC+9). But the gig is going to happen at 8pm local time regardless of the rule change. The actual event time is 2020-07-24T10:00Z (UTC+10), an hour earlier than originally scheduled.
Normalizing to UTC makes sense in so many cases that it can seem to be a silver bullet for all cases. But there are definitely business cases that do not work well with this approach.
Let's say I'm writing a calendar application. Naturally, the user wants to enter all events in their own timezone, which in my case is Europe/Amsterdam. So I can just convert that to UTC, and store it along with a timezone string? Well, that's going to break horribly when the EU abolishes summer time next year, because all of those UTC stamps will be wrong!
The only sane option here is to store local time, and a string representing the timezone, which is what the name "timestamp with timezone" suggests it is. Yes, I could do the conversion manually, but it would be a lot more convenient to have it in the database directly.
Personally I find the API confusing and arbitrary. Why include a Year class? Why include both OffsetDateTime and ZonedDateTime? What is the difference between Instant and LocalDateTime for my application that doesn't travel across time zones? Why isn't there an Range class with a start and a stop?
I guess I find "just use UTC" much easier to deal with.
The issue is that there are 2 types of future times:
(1) scientific-absolute times -- like future solar eclipses; you can normalize these types of times to a future UTC with no ambiguity (other than esoteric adjustments such as TAI leap seconds)
(2) cultural-relative-wall-clock times -- like future recurring appointments, future New Year's midnight celebrations, etc.; These can't be normalized to UTC perfectly because one doesn't know what the exact future UTC time will be due to unknown government changes to DST rules or TZ boundaries. If you simply normalize to UTC, you will throw away information and it will cause bugs. Your eventual addendum of "store the time zone string" basically acknowledges this edge case for future datetimes.
At least for me, writing an article like that takes time and lots of thought (plus forces you to grok the stuff deeply). I wish I could write more often (or at least, carve out the time and energy for it), haven’t written much since.
Internally for indexing purposes, we expand the dates + time into a series of datetimes at UTC, at which point we can easily search on them.
CREATE TYPE datetimetz AS (
dt timestamptz,
tz varchar
);
Then in your table def you can call your new type to avoid multiple columns: CREATE TABLE example (
id integer,
created_at datetimetz
);
He'd be someone I want on any new RFC effort.
In neither case is timezonetz useful or necessary, and will actually end up causing more bugs by converting timezones to/from the database instead of just returning the value and letting your application deal with it.
[0] https://phili.pe/posts/timestamps-and-time-zones-in-postgres...
Consider the following queries, both ran on a machine where everything is in UTC (I always configure my production machines in UTC):
SELECT date(('2019-01-01T23:00:00Z'::timestamptz) AT TIME ZONE 'Europe/Berlin');
date
------------
2019-01-02
(1 row)
SELECT date(('2019-01-01T23:00:00Z'::timestamp) AT TIME ZONE 'Europe/Berlin');
date
------------
2019-01-01
(1 row)
The first one is likely what you want. When would you write a query like this? For example when you have a non-UTC business day and want to group sales by business day. Doing that kind of aggregation in the application is ugly, error prone and removes many of the benefits of having an SQL database.SQL timestamp == java.time.LocalDateTime
SQL timestampz == java.time.Instant, but converted into java.time.OffsetDateTime on read
?
That is, neither of them actually stores a timezone. But timezone is a date and time that needs to be interpreted in the context of some separately-supplied timezone, whereas timezonetz is an actual instant in time?
IME, it's rare to need LocalDateTime. I work in finance. I often need LocalTime, to specify things like "the exchange opens at 07:30 every day", where the timezone is implicitly that of the exchange, and if the exchange moved (lol Brexit), its timezone might change even if its opening time didn't. I also often need LocalDate, mostly when storing or retrieving data in archives organised by date, because we tend to use the local date for that, so that all the data for one session of an exchange, from open to close, is on one date, wherever in the world it is [1]. But i very rarely need LocalDateTime. Perhaps if i worked in calendaring instead, i would use it all the time.
It's a shame that SQL doesn't have a ZonedDateTime or OffsetDateTime, but as others have pointed out, you can just store a timestamp along with a timezone or offset.
[1] Actually, it's worse than this, in that we use the trade date, which is sometimes not the same as the local date - for example, on the Chicago Mercantile Exchange, the market for the E-mini S&P 500 future opens at 17:00 and closes at 16:00 the next day [2]. But that entire session is considered to have a trade date of the calendar date it closes on. In the olden days, the market would open at 07:00 and close at 16:00 (so you had an hour before the banks closed to settle any cash movements, or possibly because traders had very big lunches in those days). Trade date was calendar date. But then the 21st century came along, and they decided to open all day. But because there was so much crucial stuff that happened at the close of trading, they did it by moving the open back to the previous evening, rather than moving the close forward.
Finally, boycott Oracle's table server. It's horrible in this respect.
I think this is spot on, and is the reason why it's so hard to design time libraries, particularly in language standard libraries. How do you allow all users to express what they need, while not requiring any users to express what they do not need?
Bear in mind that Joda/java.time are still oversimplifying, because they take UTC as their ground truth, which means they can't account for leap seconds. A really thorough time library would take TAI as the ground truth, then have a layer which knew about leap seconds to convert to UTC [1], then a layer which knew about timezones to convert to local times. Not knowing about leap seconds means you can't calculate the time between two Instants accurately - if there is a leap second in the interval, the time will be off by one. However, since most people don't mind about gaining or losing a second every now and then, UTC is a decent tradeoff.
> Personally I find the API confusing and arbitrary. Why include a Year class?
So you can't accidentally pass the year and month, which would otherwise both be integers, the wrong way round to a method which takes those, as i did earlier this week.
> Why include both OffsetDateTime and ZonedDateTime?
There's a time format which looks like '1918-11-11T11:00+01:00'. You can parse that to an OffsetDateTime, but you don't have enough information to parse it to a ZonedDateTime. Was that Paris winter time, or Lagos standard time? If you wanted to find the exact six-month anniversary of that moment, you would need to know, because it would be at 11:00 local time [2], which could be 1919-05-11T11:00+02:00 in Paris but 1919-05-11T11:00+01:00 in Lagos, and those are different moments in time.
> What is the difference between Instant and LocalDateTime for my application that doesn't travel across time zones?
If your application, or one of its users, resides in one of the timezones which have summer and winter times, then you still have two offsets to think about. Every Instant identifies a unique moment in time for you (up to leap seconds!), but a LocalDateTime does not, because there is one hour a year of LocalDateTime values which identify two different moments an hour apart, and one hour a year of values which don't identify anything at all.
[1] inb4 someone goes off about relativity
[2] because everyone knows that the armistice was signed at 11 o'clock on the 11th day of the 11th month, so if you rock up and say "bonjour mes amis, le cérémonie de l'anniversaire c'est à 10h" you will get bayonetted
Because it has units of time (seconds, often) and it's done using time-stamps, as in begin-time and end-time.
> It seems obvious that for an event that lasted 5.43s for example, (a "duration", better wording) that it's not counted with leap-seconds.
Obvious until you try to get someone to understand why their time-stamps can't be UTC if they're going to be used for duration-time instead of wallclock-time.
Obvious until you try to get a library or language implementer to understand the UTC isn't Universal.
More realistically, for an application that is storing only near future dates within a limited geographic range, you can probably get away with using UTC for storage, as that's what we did at MLB anyway, but today I would likely not give this advice.
edit edit: oh, you mean you want PG to actually show you groupings of business days. Well sure, that is a highly specialized kind of query that has to know where holidays are and such depending on what you are trying to do (if you need to count business days, perhaps). If I were writing such a query, I'd have to be very careful regardless so I'd ensure the appropriate casts are used against my source timestamps so that they are properly interpreted by PG's calendar-aware functions as well as how to determine what's a business day beyond simple weekend days.
Using PG's calendaring functions is nice but in reality this is more of a data analyst feature, which is what PG spends a lot of time advertising. Use timestamptz all you want in data analyst queries but in your CREATE TABLE use timestamp.
Ok but you still dont use TIMESTAMP WITH TIME ZONE to store those either. For future events where you want to protect against time zone changes, you store the intended time which is what this thread is calling "wall clock time". I think you shouldn't use a timestamp object for such a time because it's not a timestamp at all, it's a human description of something that has a moving linkage to a timestamp. However, this makes being able to search and filter for these events more challenging if the database lacks functions to coerce these values into current UTC timestamps, or if you are trying to optimize for indexes that don't make use of these functions.
> Consider TicketMaster. They have a database of future events around the globe, sometimes years in advance.
Most of my datetime experience is from working at MLB.com with Ticketmaster servers in the early 2000s. We used Oracle and we stored the next season's worth of baseball schedules in straight UTC, in addition to storing the original "wall clock plus timezone" version of that time, after our first year online where we were stupidly storing in Oracle's default timezone which was of course EST, and on October 30th at 2 AM the whole site broke because in the perspective of some of our database queries time had moved one hour backwards.
It was not an option to store only "wall clock time" here because we needed to be able to search and sort these events in the database and Oracle did not have the calendaring / indexing capabilities to make it feasible to be coercing from wall clock to UTC on the database side. We were not too worried about timezones being changed within the US/Canada within the next year, as in the rare occurrence that this happened, we simply ran an UPDATE for the UTC timestamps in question based on their desired timezone. I notice the blog post we are talking about here makes no mention of future dates or the challenges in storing them accurately while still being able to query them efficiently; for future dates we had to store both the UTC and the wall clock data.
> Normalizing to UTC makes sense in so many cases that it can seem to be a silver bullet for all cases.
my disagreement with this blog post was first triggered in that it is an "You should always do X" kind of post. The word "future" doesn't even appear in the text of this post nor do the words "index" or "sort", which is a funny thing for an article that seems to assure the inexperienced reader that this is all you need to know about timestamps in databases.
If you are recording a duration you have another unit. If you use begin-time and end-time you are calculating a duration, which is different of course.
If you calculate durations, then you might consider leap seconds, if they are important for your measurement. If not, then you just ignore them. I don't see a problem here.
Besides, leap-seconds are known, so removing them from a calculated duration is not a problem.
I have a side project that stores radios' schedules and started with timestamptz (while storing in UTC anyway) and thought I did the right thing.
Then I converted part of the project to Elixir and discovered that Ecto use timestamp by default. I thought it was odd but looked into the reasoning and discovered the misnaming of the "with timezone" part and that José Valim was right as usual ;)
What I ended up using in my select queries is something along the lines of :
> s.date_time_start at time zone 'UTC' at time zone 'Europe/Paris' as start_at
Unrelated but at work Friday (while reading this article, as fate would have it) I had to debug a erroneous time displayed to some users by text. I found one line in the service that forgot to specify the timezone while formatting the datetime. It used to work so what changed ? Well our users are increasingly international and we recently move those cron jobs from one server in Ireland to multiple AWS zones so what seemed to work for years before showed up now as a bug. I also know they had a hectic debug day six months ago during the last DST event. In conclusion, dates are hard.
Right now, supposing you have two columns, EventTime TIMESTAMPTZ and EventTimeZone TEXT, you'd have to do something like the following in your query to get the data returned to the application properly:
SELECT (EventTime AT TIME ZONE EventTimeZone) || ' ' || EventTimeZone
When you insert into a 'TIMESTAMP', it's as if it did input LocalDateTime(x).atZone(UTC).toInstant(), and saved that; and when you read it, it's as if it does storedInstant.atZone(UTC).toLocalDateTime().
A 'TIMESTAMPTZ' is stored the same way, but on string output is converted to the session timezone, and an underspecified input is enriched with the session timezone.
When you insert into a 'TIMESTAMPTZ' it'd take the input OffsetDateTime or ZonedDateTime, or enrich the input with the session zone until one were obtained, call .toInstant(), and save it. On read, it'd do storedInstant.atZone(sessionZone) to obtain a ZonedDateTime, and its default output format only includes the offset.
LocalDateTime is useful for calendaring, and as an intermediate step when joining a LocalDate and a LocalTime and working towards a Zoned or Instant.
Having the database store anything other than what you provide is always a source of bugs.
In the industries I have worked in aggregating events per date (according to some time zone) has been a common operation and that would have been error prone with the timestamp type.
> yes I do, it's in UTC, because I assume the application is written to correctly handle datetimes.
> > It is merely a wall time. Maybe it’s UTC,
> no, it's UTC.
No, you don't know that if you're looking at the database without knowledge of the app's conventions.
I don't understand your argument about ambiguity and implicit conversions. Yes, if your session or DB is configured to something other than UTC, it will convert the timestamptz values in your output to that time zone's offset, but it'll still include the offset.
Thus, a value "2019-04-01 00:00:00+00" that is output when using timestamptz with UTC (compared to "2019-04-01 00:00:00" when using timestamp), is output as "2019-03-31 19:00:00-05" when using EST. Both values represent the same moment in time, obviously, and are equivalent. Assuming your app and ORM convert everything to UTC, that shouldn't matter at all.