I had a feeling that kernel compilation got slower recently and tried to find the slowest file across randconfig builds. It turned out to be arch/x86/xen/setup.c, which takes 15 seconds to preprocess on a reasonably fast Apple M1 Ultra.
This all comes from one line "extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), extra_pages, max_pages - max_pfn);" that expands to 47MB of preprocessor output after commits 80fcac55385c..867046cc70277.
@arnd Opening arch/x86/xen/setup.i and going to the end of the offending line looks like a good DoS-attach on vim.
@arnd 47M from a single composition of max()?? I am terrified to go look now ...
@arnd But ... how? Even if it expanded to 47 kilobytes, that would be excessive, but 47 *megabytes*? 🤯
@KeyJ it nests min() multiple levels deep with the use of min3(), and each one expands its argument 20 times times now (up from 6 back in linux-6.6). This gets 8000 expansions for each of the arguments, plus a lot of extra bits with each expansion. PFN_DOWN(MAXMEM) contributes a bit to the initial size as well.
See https://pastebin.com/MmfWH7TM for the first few pages of it.
@ljs Unfortunately, it seems a lot of code started relying on the added features, so it's not as easy as reverting it, and the original version wasn't great either.
Not sure how to best fix it, maybe there is a chance to make a neat version that doesn't produce a constexpr and change all the callers that actually need a constexpr to use a simpler macro.
@dirksteins @arnd @KeyJ It's a huge constant expression that gets evaluated by the compiler. The final code is (probably) fine.
If anyone wonders what other files are bad, this is the list of all files with over five second compile times: https://pastebin.com/raw/fXJe7y9P.
The worst case for this particular file was 44 seconds, three times slower than in defconfig.
The median compile time across all files is 0.26s.
@dirksteins @arnd @KeyJ There's no reason to doubt the correctness, but as has been noted, it is somewhat slow.
Now as shocking as this might be, it is quite tame compared to the boost C++ library.
@arnd I guess the AMD ones are all the huge register definitions they have?
@conor the other thing to realize about this driver is that it's about the same size as linux-2.6.12, 5.8M lines of code. I had a look at what makes it so slow to compile and as far as I can tell it's just the size of the individual files and how they contain functions with thousands of lines and dozens of arguments.
@ljs I think @oblomov is suggesting an intermediate variable in the specific caller, which is a trivial fix and will of course work as it reduces the 47MB to a few KB. The thing we can't easily do is have a temporary variable inside of the min()/max() macros themselves because that prevents them from being used as a C constant expression, e.g. as an array length inside of a type definition.
@ljs @oblomov Not a lot. In an x86 allmodconfig build I had to fix up these 11 files with a simplified min()/max(). There are probably another dozen for other configs.
drivers/edac/sb_edac.c
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
drivers/gpu/drm/drm_color_mgmt.c
drivers/input/touchscreen/cyttsp4_core.c
drivers/md/dm-integrity.c
drivers/net/can/usb/etas_es58x/es58x_devlink.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
fs/btrfs/tree-checker.c
lib/vsprintf.c
net/ipv4/proc.c
net/ipv6/proc.c
@arnd @ljs yeah, my idea was to do it in the caller, not in the macro. stuff like (types made up. I have no idea what they are):
const int pfn_down_max = PFN_DOWN(MAXMEM);
const int extra_ratio = EXTRA_MEM_RATIO*min(max_pfn, pfn_down_max);
const int leftover = max_pages - max_pfn;
extra_pages = min3(extra_ratio, extra_pages, leftover);
or whatever. Might even increase readability (esp. with comments) in addition to limiting macro expansion blowout.
@ljs @arnd ah, that makes perfect sense.
It's a bit frustrating because this would be “trivial” to fix with C++ thanks to templates and constexpr functions (would even work “everywhere” including array indexing).
I remember there used to be a discussion on LKML about having an automatic selector for constant vs non-constant expressions with some fancy tricks, and it was specifically to do this kind of selection, but, did it got anywhere?
@ljs @arnd I found the discussion I was thinking of. Thread starts here:
https://lkml.org/lkml/2018/3/20/805
and this message specifically is what I was thinking of: