Conversation

Thorsten Leemhuis (acct. 1/4)

'"A mysterious CPU spike in ClickHouse Cloud on GCP led to months of debugging, revealing a deeper issue within the ’s memory management. What started as random performance degradation turned into a deep dive into internals, where engineer Sergei Trifonov uncovered a hidden livelock. His journey through tracing, perf analysis, and a reproducible test case ultimately led to a surprising fix - only for another kernel bug […]"'

https://clickhouse.com/blog/a-case-of-the-vanishing-cpu-a-linux-kernel-debugging-story

2
1
0

@raulinbonn

/me now wonders if there is a repo on a git forge somewhere containing a README.md linking "Tough, surprising, funny, and odd bugs" 🧐

0
0
0

@kernellogger β€œIt is a bit of a mystery why unlock shows up in the flamegraph and not lock, which is where CPU should be spent while waiting on a spinlock.”

No, this is no mystery at all. It is _raw_spin_unlock_irq, which means the whole critical path, including the matching _raw_spin_lock_irq was running with interrupts disabled. The profiling interrupt could not be delivered until after re-enabling interrupts during unlock.

2
1
0

Vlastimil Babka πŸ‡¨πŸ‡ΏπŸ‡ͺπŸ‡ΊπŸ‡ΊπŸ‡¦

@ptesarik @kernellogger hmm I thought perf sampling interrupts were NMI, no?

1
0
0

@vbabka @kernellogger As usual: it depends. Hardware PMUs indeed trigger NMIs, but generic software counters are implemented with hrtimers. The described symptom suggests that hardware PMU counters could not be used for some reason here.

0
0
0
@ptesarik @kernellogger I thought profiling is normally done from NMI?
1
0
0

@pavel If hardware counters cannot be used, perf falls back to hrtimers. The article refers to some old kernel versions, so I suspect the environment is a VM under a hypervisor which does not virtualize PMC. But that's merely a guess; there may be other reasons.
@kernellogger

0
0
1