Conversation

Vlastimil Babka

Achieve 3888.9% performance improvement thanks to this one weird trick https://lore.kernel.org/all/202411072132.a8d2cf0f-oliver.sang@intel.com/
4
5
11

@vbabka So my endless loop will finish 10 seconds before it will start.

1
0
1

@vbabka I maintain that a lot of this performance testing and reporting is opaque voodoo.

1
1
1

@vbabka Huh, it's not "use LD_PRELOAD to nerf sync and fsync away" this time.

0
0
1
@brauner sure, but it can be sometimes useful. Also this only fixes a regression from 6.7 but don't tell phoronix, shh...
1
0
2

Vlastimil Babka

Edited 9 days ago
@hyeyoo @brauner haha amazing. We discussed it yesterday on IRC and seems the test does malloc(128MB), where malloc will prepend a header with size and mmap(sizeof(header)+128MB), which before the commit would be aligned to the 2MB boundary so the header write of few bytes would fault in the whole THP, which is of course more costly than 4KB (mostly due to memory zeroing). The test doesn't access the malloc()'d memory at all otherwise. So yeah, very specific corner case.
1
0
2
Edited 9 days ago
@vbabka @brauner

haha but most people won't focus on how specific/synthetic it is. The number just kinda attracts attention.

I still think it's really hard to demonstrate how a patch will affect realistic scenarios or how much of a clear win it is in most situations... as we all have limited resources to evaluate them.
0
0
2

@vbabka it goes to 11 now!!!

0
0
1