social.kernel.org

Conversation

Thorsten Leemhuis (acct. 1/4)

'"A mysterious CPU spike in ClickHouse Cloud on GCP led to months of debugging, revealing a deeper issue within the #LinuxKernel’s memory management. What started as random performance degradation turned into a deep dive into #kernel internals, where engineer Sergei Trifonov uncovered a hidden livelock. His journey through #eBPF tracing, perf analysis, and a reproducible test case ultimately led to a surprising fix - only for another kernel bug […]"'

https://clickhouse.com/blog/a-case-of-the-vanishing-cpu-a-linux-kernel-debugging-story #Linux

Raul

raulinbonn@social.treehouse.systems

29 days ago

Reply to @kernellogger@fosstodon.org

@kernellogger Adding this to my "Tough bugs" favorites. Right now joining these other two:

https://dirtypipe.cm4all.com/?utm_source=thenewstack&utm_medium=website&utm_campaign=platform

https://lore.kernel.org/regressions/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

29 days ago

Reply to @raulinbonn@social.treehouse.systems

@raulinbonn

/me now wonders if there is a repo on a git forge somewhere containing a README.md linking "Tough, surprising, funny, and odd #Linux #kernel bugs" 🧐

Petr Tesarik

ptesarik@fosstodon.org

27 days ago

Reply to @kernellogger@fosstodon.org

@kernellogger “It is a bit of a mystery why unlock shows up in the flamegraph and not lock, which is where CPU should be spent while waiting on a spinlock.”

No, this is no mystery at all. It is _raw_spin_unlock_irq, which means the whole critical path, including the matching _raw_spin_lock_irq was running with interrupts disabled. The profiling interrupt could not be delivered until after re-enabling interrupts during unlock.

Vlastimil Babka 🇨🇿🇪🇺🇺🇦

vbabka@mastodon.social

27 days ago

Reply to @ptesarik@fosstodon.org

@ptesarik @kernellogger hmm I thought perf sampling interrupts were NMI, no?

Petr Tesarik

ptesarik@fosstodon.org

27 days ago

Reply to @vbabka@mastodon.social

@vbabka @kernellogger As usual: it depends. Hardware PMUs indeed trigger NMIs, but generic software counters are implemented with hrtimers. The described symptom suggests that hardware PMU counters could not be used for some reason here.

Pavel Machek

pavel

27 days ago

Reply to @ptesarik@fosstodon.org

@ptesarik @kernellogger I thought profiling is normally done from NMI?

Petr Tesarik

ptesarik@fosstodon.org

27 days ago

Reply to @pavel

@pavel If hardware counters cannot be used, perf falls back to hrtimers. The article refers to some old kernel versions, so I suspect the environment is a VM under a hypervisor which does not virtualize PMC. But that's merely a guess; there may be other reasons.
@kernellogger

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org