Conversation

I've been debugging an approx. 10% regression in iperf3. It's not fully analysed yet, but I can already share this neat trick to revert about one third of it (IOW 3 percent perf boost):

echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink
4
2
1
@ptesarik Wow. Last time I debugged crash in wifi, I ended up findings bug in cursor handling...
1
0
3

@penguin42 Standard process. Use perf to find when process is off-CPU. Discover context switches to a kworker thread. Trace the workqueue:workqueue_queue_work event and see something like this:

workqueue:workqueue_queue_work: work struct=0xffff961d092d84d8 function=fb_flashcursor workqueue=events_power_efficient req_cpu=8192 cpu=207

That said, it's suboptimal that this work runs on the same CPU as iperf3. It didn't work like that in v6.4 (hence the regression), and I suspect it's an unintended side effect of NO_HZ improvements.

https://docs.kernel.org/timers/no_hz.html

1
0
0

@pavel I ran into my first cursor-blink memory corruption on resume from disk about 15 years ago. Blinking cursors have come straight from hell.

1
0
1

@ptesarik @pavel with a recent enough kernel (not sure which), the vt cursor on my sparc machine is off by like four characters or so :(

1
0
1

@lkundrak Is it indeed only a SPARC (i.e. 32-bit), or is it in fact an UltraSPARC?
@pavel

1
0
0

@ptesarik @pavel it is certainly ultrasparc, 64bit

1
0
0

@ptesarik Ah nice; I assume you are actually at the fb console rather than in a GUI, so at least it makes sense it is flashing.

1
0
0

@penguin42 No, this is a dual-socket AMD EPYC 7713 (total 128 cores, 256 threads), mounted in a rack, so I'm logged in with SSH.
But, you're right, GUI is not active. It's not even installed.

1
0
0

@penguin42 In fact, this is part of the grief: There are 100+ idle cores that could do the job, but the kernel decides to steal process time from the iperf3 benchmark. Le sigh.

1
0
0

@ptesarik @pavel isn't it a little bit disappointing that after supersparc and ultrasparc came ultrasparc II instead of turbosparc

1
0
0

@ptesarik i thought perhaps megasparc and then cancel the project??

0
0
0

@ptesarik Is this something like the cursor flash gets put on a queue for the future, then your iperf core gets an interrupt for a packet or timer, and while the kernel is dealing with that it cleans up other outstanding things before it returns?

1
0
0

@ptesarik Nice find! Just checked the current cursor_blink value on three systems: one has 0 (this is a desktop, so VT is not visible anyway), one has -1 (a headless server running on real hardware; it has HDMI output but no screen is connected), one has 1 (a virtual server at some commercial hoster; no idea where the VT would be displayed, if at all). Any idea what these cursor_blink values actually mean?

1
0
0

@ollibaba The meaning is more or less what you observe:

  • 1: cursor blinks
  • 0: cursor doesn't blink
  • -1: not available (e.g. no fb console)
0
0
0

@penguin42 It's more subtle than that. Cursor blinking is a “power-efficient event”. Among other things, it means it won't wake up a CPU that is currently in a power-saving state. The iperf3 benchmark (with a single client) is a bit special, because all CPUs are stopped except the one which runs the benchmark.

I'm glad you ask. My original toot contains hidden criticism of benchmarks, because they are often detached from reality. Now, it has already gained some boosts, so disabling cursor blink may become one of many cargo cults that certain people blindly apply to any scenario. In short, my toot is a joke for those who know, and trolling to those who are merely being smart.

Last but not least, it's very likely that the Linux kernel will be fixed for this specific scenario, making this “advice” obsolete even in situations where it may actually make a difference.

1
0
0

@ptesarik @penguin42 Two contributing factors are that almost all Linux text consoles these days are fbcon consoles, not VGA consoles, and that Linux mostly turned off blanking the (text) console by default some time back¹. So if you install a Linux server, you get a fbcon console that's always on and potentially always blinking the cursor, even for racked servers.

¹ my notes say 2017 for the change in the kernel defaults, in 4.12.

0
0
0

Continuing through the analysis, it seems that every other part in the puzzle has become just a tiny bit shittier between SLE15-SP6 and SL-16.0. The deeper you dig, the smaller (and less reliably reproducible) the difference…

Is this my first time telling you that understanding kernel performance regressions is a challenging job?

1
0
0

@ptesarik oh damn I can already see the headlines:

SUSE Kernel Performance Engineer Publicly Admits Enshittification of Company Flagship Product

1
0
0

@ljs @ptesarik ah right. Well it's annoying enough to step into it with shoes on, can't even imagine being shoeless.

1
0
0

@vbabka @ljs
SUSE Kernel Engineer Publicly Admits Stepping Shoeless In Company Flagshit

Have I merged all patches correctly?

0
0
0