Conversation

Recently, stable kernel became anon-sense. Those hundreds or thousands of patches are not properly reviewed before being pulled in. Every release cycle its the same: lets pull a shitload of crap and let the user figure out what broke ans why. Stable kernels should be used by those bots that generate the stream of patches for them only. No wonder XFS prohibited stable backports, maybe, other subsystems should just do the same.

3
0
1

@oleksandr

or maybe people should just help improving the situation which has various reasons (I'm fighting one of them: regression fixes that are queued for the next cycle instead of being applied immediately).

Things already slowly got better, the some of the early stable kernels had 1000+ patches; I haven't checked this for real, but I got the impression that this number slowly shrinks.

3
0
0

@oleksandr

and btw, I'd say "not properly reviewed" is somewhat hyperbolic or not valuing what maintainers did, as 99+% of those patches were applied by the appropriate subsystem maintainers.

At least when they were merged for mainline. For the stable backport (did you maybe mean that?) it's different, but no subsystem maintainer is obliged to participate in the maintenance of the stable and longterm kernels. That's maybe the real problem here, but that's how it is for now.

1
0
0

@kernellogger I'm currently looking at 4baf12181509, for instance. Even if it is not the cause of my issue, I'm still not quite sure why it was backported into the stable kernel. There's no Cc: that'd say "take it into stable". Yes, it got Fixes:, but this is not an urgent regression fix absence of which would render thousands of machines unusable.

1
0
1

@kernellogger I'm talking about stable kernel backports only.

0
0
0

@kernellogger where I personally can help, I do help. I soak all the stable releases through my machine I use daily before deploying them across the server fleet and, what's more important, my wife's laptop. If I find something, I rant loudly, I go to LKML, I collaborate, things get fixed eventually, but hear me out please, this is not what stable kernel is for. I do not understand why do I get slapped by hundreds of backported patch blindly as a user. I may just go and use -rc releases instead with more success, because they are consistent and do not contained randomly picked patches with the help of some AI bot.

1
0
1

@oleksandr

I'm out here. Yes, there is a lot to criticize and to improve wrt to stable trees, but you want to check your claims before doing them:

That AI bot is called "autosel". It suggested some backports to 6.6.y; about 180 afaics: https://lore.kernel.org/all/?q=%22%5Bpatch+autosel+6.6%22 At least some (maybe all) of them are not yet in 6.6.y afaics.

Fixes tags were afaics the main reason why most of the patches went into 6.6.2:

git log v6.6.1..v6.6.2 --grep='Fixes' --grep 'C[Cc].*stable' --oneline | wc -l
554

1
0
0

@kernellogger cherry-picking commits using the "Fixes:" tag is some sort of AI as well. And FWIW:

$ git log --oneline v6.6.1..v6.6.2 --all-match --grep 'Fixes:' --grep 'C[Cc].*stable' | wc -l
6
2
0
0

@oleksandr

I wouldn't call Greg and his scripts an AI 😬

And the lack of stable tags is well known: many developers assume that a Fixes: tag is enough.

1
0
0

@kernellogger as if AI is something exclusively bad? As for the lack of stable tags, this, of course, deserves more coverage, be it an LWN article, summit discussion or whatever. But maybe sometimes an absence of a stable Cc: tag means the fix is not intended to be backported?

2
0
1
@oleksandr @kernellogger No, it is not an AI. It's simple and fixed decision: Is this commit fixing a bug in previous release? If yes, then let's backport to fix that bug.

Now you claim that "Fixes" tag is for commits not fixing bugs or bugs are not important. In the first case, Fixes tag would be added incorrectly to the commit. It is purely for fixing known bugs. In the second case, how do you know which bugs are important and which are not? All bugs are bugs which we want to fix...
2
0
1

@kernellogger @krzk this particular fix comes from a series of 16 patches (https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@linux.intel.com/) titled as "xhci features". There's another patch in that series, a5d6264b638e, which looks tightly coupled to 4baf12181509, but it was not picked (of course, as it doesn't have Fixes: tag). Again, I'm not claiming this is the cause of my issue (I've just rebuilt the kernel with v6.6.2 + reverted 4baf12181509 and will test it for a couple of days), but this approach looks completely wrong: pick a random patch that happens to have Fixes: tag from a series of 16, which is titled as a "features" series, and do not pick another one which is tightly related. This backport was not properly reviewed. Yes, probably, the -next submission was not properly tagged either.

1
0
1

@oleksandr

The absence can mean various things. Most notably it can mean "I only care about mainline and don't want to have anything to do with a backport".

I think I once suggested "nostable" tag to greg (or maybe even on a maintainers summit), but new tags are (rightfully) frown upon.

2
0
0

@kernellogger My humble and ignorant opinion is that the stable backport should happen only if it is approved/reviewed/requested by a subsystem_maintainer/patch_author/user_who_verified_it, and this procedure would not need any new tags. The key word here is "reviewed" however, because looking at hundreds of patches flowing into the stable kernel as backports I do not think they are really reviewed properly, that's physically impossible.

1
0
1
@oleksandr @kernellogger That's not the problem of stable backporting process. At that stage it is not known that this was part of 16 patches features set. It is backported from Git. At this stage all the patchset-hierarchy is gone and not really important. What's in the Git tree is important.

The problem here was the original submission:
1. Mixing fixes with features in same patchset.
2. Putting fixes into the middle of a patchset.
3. And maybe: tagging commit as "Fixes" for something not being fix.

Therefore please complain to the submitter and optionally to the maintainer, not to the stable backport fix.

Otherwise please explain me why a fix for a known bug should not be backported, as I described in previous post.
1
0
0

@oleksandr @kernellogger I think you start to appreciate why @suse kernel engineers don't care much about stable kernels and do their own tracking and backporting.

Now, they are a whole team of engineers who work full time mostly on doing that and get paid for the job…

4
0
1

@ptesarik @oleksandr @suse

which brings us to the old claim "stable kernels would be a lot better and good enough for almost every use case, *if* all the effort that companies invest in locally maintaining some old version would be spend on helping improving stable kernels" 🥴

2
2
1

@kernellogger @krzk First of all, I do not complain to anyone in particular at this point (and please don't tell me what to do within my microblog account). I will bisect it as needed, and I will work with appropriate people once I gather enough evidences to approach them in a proper way, publicly, via email + mailing list. Second, I cannot agree the patch hierarchy is gone — there's lore.kernel.org that preserves it. Third, as expressed previously, I do agree the original submission is likely flawed as well. Fourth, I do insist that reviewing what's being backported is necessary because of what likely happened in this very case: a fix might be mislabeled and split into two commits, one of which may be missed.

0
0
1

@kernellogger @oleksandr @suse which brings up the old question how the maintenance of such a commodity kernel should be funded 🤔

0
0
1
@oleksandr @kernellogger mm has this arrangement that only Cc: stable can be taken and we apply it very deliberately. Every once in a while the arrangement is forgotten and we have to explain again that yes, we want to keep it.
1
0
1

@kernellogger @vbabka set joke_mode on; From every -rc I'll pick only Cc: stable@ patches and call it linux-steady.git; set joke_mode off. Or maybe this should not be a joke?..

1
0
1
@oleksandr I always found it weird that there's two "stable"s and that "longterm" kernel exists at all, these shouldn't be upstream's responsibility at all.
0
1
0

@kernellogger that may be true but isn’t what the https://www.kernel.org/doc/html/latest/_sources/process/stable-kernel-rules.rst.txt doc says. If Cc stable is optional, then someone should post a patch to update the text.

I strongly agree with @oleksandr. Stable kernel are too aggressive on backporting fixes and so many times I’ve seen that causing more harm than good. Probably I’m biased since I only see the fallouts but still the process should match the documented rules IMO.

1
1
1

@javierm @oleksandr

It is not true and Greg occasionally reminds people about that. But he afaics has to deal with reality. And in that there afaics are a lot of fixes that *definitely should* be backported (like regression fixes) that for one reason or another lack a stable tag.

FWIW, don't get me wrong, I partly agree with "stable kernel are too aggressive on backporting fixes" as well[1]. But at the same time I can understand why it's like that under the current circumstances.

[1] That…

1
0
1

@javierm @oleksandr

…being said, I think the bigger problem is somewhere else: Greg from my point of view backports some fixes too quickly (e.g. before normal persons has a chance to find and report a regression in mainline). But that afaics is also mainly due to current reality, as there is no easy way for Greg to tell "quick backport needed" and "can wait a bit" patches apart.

1
0
2
@krzk @oleksandr @kernellogger which bugs are important? The rules say it quite explicitly https://www.kernel.org/doc/html/latest/_sources/process/stable-kernel-rules.rst.txt
But in practice anything goes, I don't really get it.
1
0
1

@kernellogger @oleksandr the process should be followed IMO, whether needs to be adapted to “reality” or not I don’t know but people need to know what to expect. I’m considering just not using Fixes: anymore, unfortunately is something that’s useful for distros who usually do due diligence on the fixes that are backported.

3
0
0

@javierm @oleksandr

Which process or rule do you mean when you say "process should be followed"?

1
0
0

@kernellogger @oleksandr the one that’s documented in the stable kernel rules text I shared.

1
0
0

@javierm @oleksandr

"I’m considering just not using Fixes: anymore". That way we'd make the problem worse. In fact it's how it already made the problem worse, as I know some people already sopped using stable tags because a fixes tag seemed to be enough.

0
0
1

@javierm @oleksandr

but it does not say that the stable team is not allowed to pick up other fixes as they see fit.

1
0
0

@kernellogger @oleksandr it does not say that they will either. And is exactly what I’m arguing, that they are using a different criteria than what’s documented in their own rules.

1
0
1

@javierm @oleksandr

Maybe.

Let's leave it at that.

From the outside my toots will look like I'm defending the stable process at it is and actually like it that way. But that is not the case at all, I have my beef with it as well. I just tried to bring in the nuanced view of why things are as they are currently.

I'll put it on my list of things to bring up wrt to regressions on next years maintainers summit; but the list is already long. 🥴

3
1
0

@kernellogger @oleksandr I’m not saying that is bad just that the nuances should be documented.

0
0
1

@javierm
On the contrary, you should use Fixes: as much as possible, BUT also use Cc: stable@ where appropriate. This, along with reframing the criteria for stable backports, would make stable kernels much more predictable.
@kernellogger

1
0
1

@kernellogger
Last time I had a chance to discuss this with the Right Honourable member of linux-stable.git I was told that if I wanted the stable kernel updates to shrink, I should have asked upstream developers to send less fixes. Which is an utter non-sense. I hope the Right Honourable member of linux-stable.git is aware of all the nuisances and caveats popped up in this discussion, or in case he isn't, there's definitely an opportunity to talk about it more intensively and extensively, and I thank you in advance for putting this on your List of Things.
@javierm

0
0
1

@oleksandr @kernellogger yes I know, but if that documented process is not followed then I prefer to just opt-out from the automatic backporting and just sent another patch and Cc stable when I consider it. All I’m asking is the _real_ process and expectations to be documented in the stable rules, that’s all.

0
1
0
@ptesarik @oleksandr @kernellogger @suse Oh yes, the franken-kernels with selected fixes for 4 year old kernel... Or maybe even older, because Suse customers do not like to update and test their stuff (reasonable, no one likes testing!).
1
0
2

@krzk
Given the current real-world practices, "stable" kernels are similarly "frankenkernels" too, just much less tested and reviewed.
@suse @kernellogger @ptesarik

1
0
0
@oleksandr @suse @kernellogger @ptesarik "Frankenkernel" is an very old kernel which consists of thousands of backports, thus like Frankenstein.
Stable kernels are not that, because they cease to exist at some point. You need to move to newer kernel, thus backporting stops. Beyond that point, it's the distro (Redhat, Suse, also Canonical for some extended support) who creates the Frankenkernel.

And if you ever want to call stable upstream a "Frankenkernel", then Suse and Redhat create Uber-super-Franken-master-kernel...
1
0
2
@oleksandr What do you mean by "properly reviewed"? They have all passed the normal review process in that they are in Linus's tree. If they are good enough for the next release, why are they not good enough for the previous one?

And as always, reviews for things that you think should not be included are greatly appreciated, we can't do any of this without you!
0
0
0
@oleksandr @kernellogger That commit was tagged that way so it would be properly backported, if it's buggy then please let the developers know about it!
0
0
0
@oleksandr @kernellogger @vbabka You can't do that, as many many developers do not properly tag real bugfixes with cc: stable, which is why we now take anything with Fixes: on it, when they seem sane.

Again, the patch that caused you problems here was marked this way so that it did get some "soaking" in linux-next and and delayed the stable backport for a few weeks on purpose. It flowed into stable into the correct way, this is as designed.

Well, except for the breakage, but that's what normally happens with hardware, go blame the vendors for that :)
0
0
0

@krzk
There's v4.14.330, and there's v4.18.0-513.5.1.el8_9. "Stable" LTS can be as old.
@suse @kernellogger @ptesarik

1
0
0
@oleksandr @suse @kernellogger @ptesarik
v4.14 is a SLTS, so it is old indeed, but we do not compare it with RHEL v4.18. Few years ago (2020) RHEL was still updating v3.10. This is the Uber-Franken-kernel we talk about.
In 2018 RHEL released v2.6 kernel. v2.6, can you imagine?

And do you know that also features get backported to enterprise distro Frankenkernels?
2
0
1
@oleksandr @kernellogger @ptesarik @suse Although to be honest, that v2.6 (v2.6.32 to be specific) was kept as SLTS by the community till 2016.
0
0
0

@krzk
I work for RH in one of the Kernel teams, I do know stuff :).
@suse @kernellogger @ptesarik

1
0
1
@ptesarik @oleksandr @kernellogger @suse Appreciate @suse kernels? That's bollocks! Suse has the same 4baf12181 commit in their SLE15-SP6 tree! Whole team of engineers backported the same commit questioned here, instead of working on upstream stable backports, and this should be the argument to "appreciate" @suse.
You get absolutely nothing with enterprise kernels. If bug is in the mainline or upstream stable, Suse and Redhat have the bug as well.
0
0
0

@oleksandr @krzk @suse @kernellogger But only Red Hat managed to release a 2.6.40 kernel. 😜

2
0
0
@ptesarik @oleksandr @suse @kernellogger v2.6.40 is a development kernel, so it's actually perfectly fine and valid release. :)
1
0
0

@krzk @oleksandr @suse @kernellogger I'm just teasing you, of course. We both know the true reason: Too many tools assumed that the kernel was always 2.6.x, and this was the easiest way out.

0
0
1
@kernellogger @javierm @oleksandr but the last summit was just week ago :O
2
0
2

@vbabka @oleksandr @kernellogger there’s no need to wait for a summit though. I don’t understand why updating a process doc to match reality should be a controversial take.

1
0
1

@vbabka @oleksandr @javierm

But I wasn't there. And even if, for now I have other things higher on my priority list anyway. 🥴

0
0
1

@javierm @vbabka @oleksandr

Then why don't you submit a patch? Yes, I touched that document last, but only because I saw people struggling with it in the scope of regressions and because I needed a distraction on a long train ride.

1
0
0

@kernellogger @vbabka @oleksandr I won’t submit a patch because I don’t honestly know what the process is or the criteria that stable kernel maintainers use to pick the patches for backporting. Someone who already knows this should document that.

0
0
0