I think multi-threaded exec behavior needs to be illegal:
https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.15.pidfs&id=56f235da15d02c550c0ae25da9c62ecfe7222d38
de_thread() is an insult to humanity.
@brauner can we also make mremap() of anon mappings illegal too?
Thanks
Might spoil my topic at lsf though
@brauner if at least only the main thread could exec, that'd be an improvement...
1/n
I don't think that would be out of the question really. I mean, I'm writing tests for non-thread-group leader exec and I'm sure that there are similar tests out there but how common is it in actual workloads to exec from a non-thread-group leader? I don't think very. It feels like a very strong anti-pattern.
The problem with non-thread-group leader exec is that it makes zombie and reaping information undefined or at least semantically vague...
2/n
Polling is a good example for that. A pidfd can be polled for two states: exit and reaped. The reaped notification is well-defined but exit isn't because of multi-threaded exec.
If you hold a thread-specific pidfd to the thread-group leader, i.e., pidfd_open(thread-group-leader-pid, PIDFD_THREAD) then exit notification for the specific thread is vague because the struct pid will be reused...
@ljs @jann Yes, in that case all other threads in the thread-group are SIGKILLed by the kernel but they all exit with a success error code and are autoreaped. The execing thread then takes over the struct pid of the thread-group leader aka assumes its PID number and starts a new thread-group.
Think of it as a parasite taking over. :D
3/n
So if a non-thread-group leader thread execs and you poll for exit on the thread-specific pidfd for the thread-group leader before the non-thread-group leader thread did go through with the exec you get notified about the old thread-group leader exiting. If you poll after the thread went through with the exec you don't and block until the new thread-group exits.
And that's just one of the quirks
@jann The lazy thing to do would be to add a prctl() that is inherited similar to NNP that would restrict exec to the thread-group leader. A Kconfig with CONFIG_SANE_THREAD_GROUP_EXEC or the honest thing: deprecate it.
@brauner @jann Better yet, there are threaded programs doing suid exec! :) https://lore.kernel.org/lkml/20220910211215.140270-1-jorge.merlino@canonical.com/
But yeah, the threaded exec is weird. Where does pidfd actually attach for the polling? Exec itself has workarounds for the de_thread mess and ptrace, that did recent archaeology on:
https://lore.kernel.org/lkml/0B25310A-0907-481E-8ADF-EEFA78927BFF@kernel.org/
Is something like this needed for pidfd, or is this changing threadid due to de_thread the same cause for pidfd's problem?
@wrybane @ljs @jann In general, you cannot wait() on individual threads and there's no SIGCHLD generated for them.
The parent's wait works just fine. The execing thread will take over the role of the orignal thread-group leader so from wait's perspective you automagically wait on the correct task, i.e., the thread-group leader.