Pdeathsig is almost never what you want

scottlamb 3 months ago

Feeling deja vu. There was a previous discussion of this article, though it had a different title and URL then.

https://news.ycombinator.com/item?id=43268713

https://web.archive.org/web/20250306200642/https://www.recal...

nine_k 3 months ago

Well, yes, the difference between a "process" and a "thread" under Linux is that after a fork() the new "process" has COW data pages, while a new "thread" just gets access to the same data pages. Otherwise they are basically the same thing. No wonder that a Chromium process spawned from a thread lists that thread as a parent process, and gets reaped when the thread exits, whether it's the main thread, or some other thread.

And no, listing the main thread automatically as the parent won't be significantly more logical. The main thread can detach spawned threads and then exit via thread_exit() (not regular exit()), and then the spawned threads will continue to run. So the Linux kernel does the most straightforward thing. The same PDEATHSIG approach would likely work if the spawning thread then reparented the Chromium child process to to the main thread.

wahern 3 months ago

> Otherwise they are basically the same thing.
There's definitely more that differentiates threads from processes in the kernel. Process signals, for example, where the kernel walks the thread list for the process to choose which thread to deliver the signal--some might have it masked. Process groups and sessions are another--setpgrp and setsid effectuates a change across all threads atomically. And let's not forget descriptor tables. In fact, every thread in the kernel has pointers to these shared data structures.
The issue is that while the Linux kernel maintains the process abstraction for some interfaces, for others it doesn't, seemingly haphazardly, out of convenience, or sheer oversight, not out of some sophisticated architectural vision, though perhaps with some handwaving to performance or flexibility (i.e. semantics that don't fit into the traditional Unix process/thread dichotomy). setuid, for example, only changes the UID of the thread, not all threads in the process. So to maintain the semantics of setuid, libc's (glibc, musl, etc) userspace pthreads implementation has to go through herculean efforts to effectuate the change across all threads. The semantics of the kernel's setuid might be useful in theory, but the overwhelming majority of people just want the UID to change atomically process-wide. The kernel could also provide the ability to change the UID process-wide in addition to its more "flexible" interface. It has to do this for alot of other stuff, as previously mentioned. It just chooses not to. So what we get today is a hodge-podge of often unintuitive behaviors and in many cases the worst of all worlds--alot of "flexibility", except for the semantic everybody wants.
The behavior of PDEATHSIG seems more accidental than anything. As mentioned in another thread[1], it's probably a casualty of LinuxThreads, the first approach to implementing POSIX Threads semantics on Linux. Contra the wisdom about how the kernel is currently architected (i.e. what you describe), LinuxThreads leaned more heavily on implementing threading semantics in userspace, keeping the kernel task-oriented (i.e. in the parlance of 1990s Linux, user space thread == kernel process). But eventually the kernel bit the bullet and moved more of the semantics into the kernel, for example signal handling, where process semantics (in the POSIX sense) could be implemented more cleanly and performantly.
[1] See https://news.ycombinator.com/item?id=43155583, and especially the last comment where they dug up an old mailing list post)

mperham 3 months ago

I use Pdeathsig but with SIGTERM, not SIGKILL, to ensure clean shutdown of a child Redis process. Seems crazy to use SIGKILL.

cryptonector 3 months ago

The it should be called PTDEATHSIG not PDEATHSIG, and there should be a way to ask for the child to get killed when the parent _process_ exits -- that is (would be) very useful.

mmastrac 3 months ago

There was a recent blog post on this exact issue from another company wasn't there?

https://news.ycombinator.com/item?id=43153901, I think

nine_k 3 months ago

Indeed, and it correctly explains the issue in one of the close-top-top comments: https://news.ycombinator.com/item?id=43162857