'"A mysterious CPU spike in ClickHouse Cloud on GCP led to months of debugging, revealing a deeper issue within the #LinuxKernelβs memory management. What started as random performance degradation turned into a deep dive into #kernel internals, where engineer Sergei Trifonov uncovered a hidden livelock. His journey through #eBPF tracing, perf analysis, and a reproducible test case ultimately led to a surprising fix - only for another kernel bug [β¦]"'
https://clickhouse.com/blog/a-case-of-the-vanishing-cpu-a-linux-kernel-debugging-story #Linux
@kernellogger Adding this to my "Tough bugs" favorites. Right now joining these other two:
https://dirtypipe.cm4all.com/?utm_source=thenewstack&utm_medium=website&utm_campaign=platform
@kernellogger βIt is a bit of a mystery why unlock shows up in the flamegraph and not lock, which is where CPU should be spent while waiting on a spinlock.β
No, this is no mystery at all. It is _raw_spin_unlock_irq, which means the whole critical path, including the matching _raw_spin_lock_irq was running with interrupts disabled. The profiling interrupt could not be delivered until after re-enabling interrupts during unlock.
@ptesarik @kernellogger hmm I thought perf sampling interrupts were NMI, no?
@vbabka @kernellogger As usual: it depends. Hardware PMUs indeed trigger NMIs, but generic software counters are implemented with hrtimers. The described symptom suggests that hardware PMU counters could not be used for some reason here.
@pavel If hardware counters cannot be used, perf falls back to hrtimers. The article refers to some old kernel versions, so I suspect the environment is a VM under a hypervisor which does not virtualize PMC. But that's merely a guess; there may be other reasons.
@kernellogger