Conversation

I had a feeling that kernel compilation got slower recently and tried to find the slowest file across randconfig builds. It turned out to be arch/x86/xen/setup.c, which takes 15 seconds to preprocess on a reasonably fast Apple M1 Ultra.

This all comes from one line "extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), extra_pages, max_pages - max_pfn);" that expands to 47MB of preprocessor output after commits 80fcac55385c..867046cc70277.

6
19
4

@arnd Opening arch/x86/xen/setup.i and going to the end of the offending line looks like a good DoS-attach on vim.

0
0
0

@arnd 47M from a single composition of max()?? I am terrified to go look now ...

0
3
0

@arnd But ... how? Even if it expanded to 47 kilobytes, that would be excessive, but 47 *megabytes*? 🤯

1
0
0

@KeyJ it nests min() multiple levels deep with the use of min3(), and each one expands its argument 20 times times now (up from 6 back in linux-6.6). This gets 8000 expansions for each of the arguments, plus a lot of extra bits with each expansion. PFN_DOWN(MAXMEM) contributes a bit to the initial size as well.

See https://pastebin.com/MmfWH7TM for the first few pages of it.

1
0
0
@arnd we going to revert that series then?

I feel there is pedantry at play...
2
0
1

@ljs Unfortunately, it seems a lot of code started relying on the added features, so it's not as easy as reverting it, and the original version wasn't great either.

Not sure how to best fix it, maybe there is a chance to make a neat version that doesn't produce a constexpr and change all the callers that actually need a constexpr to use a simpler macro.

1
0
0
@arnd worth posting to the linux-mm ML at least?

I mean this is unacceptable
0
0
3

@ljs @arnd eh I don't think it's fair to call this pedantry. This looks like a legitimate quality of life improvement for a common annoyance with min/max, but given the consequences, it's fair to say that it's not worth it.

1
0
0
@osandov @arnd Regardless, this is an unacceptable build regression.

Chatted to @arnd on IRC who suggests he is tied up with stuff to dig into this more so I'll have a wee look :)

EDIT: Think that'll be more likely a report to list + leave others to solve as this seems quite involved and I have a ton of things to do :>)
3
1
7

@arnd @KeyJ and this is effective and efficient code? Doesn’t look like it to me - but I’m not a kernel developer.

1
0
0

@dirksteins @arnd @KeyJ It's a huge constant expression that gets evaluated by the compiler. The final code is (probably) fine.

1
0
0

If anyone wonders what other files are bad, this is the list of all files with over five second compile times: https://pastebin.com/raw/fXJe7y9P.

The worst case for this particular file was 44 seconds, three times slower than in defconfig.

The median compile time across all files is 0.26s.

3
2
2

@arnd clicked through and pretty much got what I expected 😔

0
0
0

@mansr @arnd @KeyJ probably. But I still ask myself if a 47MB expression is useful and if the compiler is able to process this correctly (it probably is, but is it efficient?).

1
0
0

@dirksteins @arnd @KeyJ There's no reason to doubt the correctness, but as has been noted, it is somewhat slow.

Now as shocking as this might be, it is quite tame compared to the boost C++ library.

0
0
0

@arnd I guess the AMD ones are all the huge register definitions they have?

0
0
0
@arnd so you're telling me that, not only does amdgpu produce most of the build issues that I run into, it also is one of the worst things to enable when it comes to my build time? Hmmge
1
0
3

@ljs @osandov @arnd bro just get a faster pdp-11

1
1
1

@conor the other thing to realize about this driver is that it's about the same size as linux-2.6.12, 5.8M lines of code. I had a look at what makes it so slow to compile and as far as I can tell it's just the size of the individual files and how they contain functions with thousands of lines and dozens of arguments.

0
0
1

@lkundrak @ljs @osandov @arnd All the arguments against preprocessor are always won by preprocessor.

0
0
1
@osandov @arnd For those following, posted https://lore.kernel.org/linux-mm/c83c17bb-be75-4c67-979d-54eee38774c6@lucifer.local/T/#u summarising situation + showing some results on my local box (ignore cringe hostname of said box)
0
2
10

@ljs @osandov @arnd haven't seen the code, but would it possible to cut this down by using some static const intermediary values?

1
0
0
@oblomov @osandov @arnd I'm no expert on the behaviour of the pre-processor or C const behaviour + don't really have time to dig deep but based on discussions on IRC - there's some really subtle complexity around const behaviour (esp. in scenarios like array indices where it truly has to be const) that I believe makes a relatively straight forward solution trickier.

There are bits of existing code that do this but again, seems super subtle.

Anyway I will leave the details of a solution to others :)
1
0
3

@ljs I think @oblomov is suggesting an intermediate variable in the specific caller, which is a trivial fix and will of course work as it reduces the 47MB to a few KB. The thing we can't easily do is have a temporary variable inside of the min()/max() macros themselves because that prevents them from being used as a C constant expression, e.g. as an array length inside of a type definition.

2
0
1
@arnd @oblomov ah right yeah.

The problem is we have tons of callers so actually doing that would be a giant pain.

I feel like a way forward may be to have a specific macro for the extremely strict case (type decl) and a looser one for everywhere else that eliminates this.

But question is - how often are we using min()/max() + friends in places like array lengths where we must strictly be const?
1
0
3

@ljs @oblomov Not a lot. In an x86 allmodconfig build I had to fix up these 11 files with a simplified min()/max(). There are probably another dozen for other configs.

drivers/edac/sb_edac.c
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
drivers/gpu/drm/drm_color_mgmt.c
drivers/input/touchscreen/cyttsp4_core.c
drivers/md/dm-integrity.c
drivers/net/can/usb/etas_es58x/es58x_devlink.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
fs/btrfs/tree-checker.c
lib/vsprintf.c
net/ipv4/proc.c
net/ipv6/proc.c

1
0
0
@arnd @oblomov well this solution seems viable then? maybe have a cmin()/cmax() for these scenarios or something like that.
0
0
2

@arnd @ljs yeah, my idea was to do it in the caller, not in the macro. stuff like (types made up. I have no idea what they are):

const int pfn_down_max = PFN_DOWN(MAXMEM);
const int extra_ratio = EXTRA_MEM_RATIO*min(max_pfn, pfn_down_max);
const int leftover = max_pages - max_pfn;
extra_pages = min3(extra_ratio, extra_pages, leftover);

or whatever. Might even increase readability (esp. with comments) in addition to limiting macro expansion blowout.

1
0
0
@oblomov @arnd Not criticsing this to be clear that's a fine suggestion [text can be shit for conveying this] :)

But in my tests I found that it's not just xen that's the problem, so you'd have to do this everywhere, which isn't really practical.

I mean it's worth whacking this egregious case but it's a band aid, we definitely need a general solution.

I am pretty certain people have also whacked other egregious cases individually too, got major de ja vous over this.
1
0
3

@ljs @arnd ah, that makes perfect sense.

It's a bit frustrating because this would be “trivial” to fix with C++ thanks to templates and constexpr functions (would even work “everywhere” including array indexing).

I remember there used to be a discussion on LKML about having an automatic selector for constant vs non-constant expressions with some fancy tricks, and it was specifically to do this kind of selection, but, did it got anywhere?

1
0
1
@oblomov @arnd yeah there's stuff like https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm_types.h#n507 which maybe that refers to

But not sure how useful that'd be here, unless you were thinking of something else?
1
0
2

@ljs @arnd I found the discussion I was thinking of. Thread starts here:

https://lkml.org/lkml/2018/3/20/805

and this message specifically is what I was thinking of:

https://lkml.org/lkml/2018/3/21/223

1
0
1

@ljs @arnd on second thoughts, it might not help in this case unless __builtin_choose_expr does some trick on macro expansion to avoid the multi-megabyte nested hell in the non-const case.

Wrong way, damn 8-(

1
0
1