What's that mysterious workaround?
Core Huff6 decode step is described in https://fgiesen.wordpress.com/2023/10/29/entropy-decoding-in-oodle-data-x86-64-6-stream-huffman-decoders/
A customer managed to get a fairly consistent repro for transient decode errors by overclocking an i7-14700KF by about 5% from stock settings ("performance" multiplier 56->59).
It took weeks of back and forth and forensic debugging to figure out what actually happens, but TL;DR: the observed decode errors are all consistent with a single instruction misbehaving.
I've hinted at this a bit, but it's finally to a point where I feel comfortable with other people using it.
I've spent the last 8 months iterating on different ways to debug really large and complicated applications. systing is the tool that's come out of this work. I wrote a post about it introducing it, and there's some documentation in the repo for it. It's really just designed for people like me who have a pretty deep knowledge of the kernel and how userspace interacts with the kernel. Hopefully my fellow kernel developers find this useful.
https://josefbacik.github.io/kernel/systing/debugging/2025/05/08/systing.html
I've done a series that adds support for AF_UNIX sockets in coredumps. Userspace provides an AF_UNIX socket path via core_pattern and the kernel connects to it, shuts down the read side and writes the coredump to the socket.
This means no more super privileged usermode helper upcalls and makes for a very nice API experience. I captured coredumps simply via socat:
https://lore.kernel.org/20250430-work-coredump-socket-v1-0-2faf027dbb47@kernel.org
The receiver can use SO_PEERPIDFD to get a stable handle on the crashed process.