←back to thread

115 points saisrirampur | 1 comments | | HN request time: 0.211s | source
Show context
Joker_vD ◴[] No.44568895[source]
The tl;dr is that Postgres, as any long-running "server" process (especially as a DBMS server!) does not run with SIG_DFL as the handler for SIGTERM; it instead sets up the signal handler that merely records the fact that the signal has happened, in hopes that whatever loops are going on will eventually pick it up. As usual, some loops don't but it's very hard to notice.
replies(2): >>44569721 #>>44570211 #
1. namibj ◴[] No.44570211[source]
Feels like they should come with watchdog timers that alert when a loop fails to check in on this status often enough? Then the only thing that still needs manual checking to cover these more-exotic states is that the code path for loop watchdog-and-signal-checkin will indeed lead to early exit from the loop it's placed in (and all surrounding loops).

Sure, it's more like fuzzing then, but running in production on systems that are willing to send in big reports for logged watchdog alerts would give nearly free monitoring of very diverse and realistic workloads to hopefully cover at least relevant-in-practice codepaths with many samples before a signal ever happens to those more-rare but still relevant code paths...