Recently, stable kernel became anon-sense. Those hundreds or thousands of patches are not properly reviewed before being pulled in. Every release cycle its the same: lets pull a shitload of crap and let the user figure out what broke ans why. Stable kernels should be used by those bots that generate the stream of patches for them only. No wonder XFS prohibited stable backports, maybe, other subsystems should just do the same.
or maybe people should just help improving the situation which has various reasons (I'm fighting one of them: regression fixes that are queued for the next cycle instead of being applied immediately).
Things already slowly got better, the some of the early stable kernels had 1000+ patches; I haven't checked this for real, but I got the impression that this number slowly shrinks.
and btw, I'd say "not properly reviewed" is somewhat hyperbolic or not valuing what maintainers did, as 99+% of those patches were applied by the appropriate subsystem maintainers.
At least when they were merged for mainline. For the stable backport (did you maybe mean that?) it's different, but no subsystem maintainer is obliged to participate in the maintenance of the stable and longterm kernels. That's maybe the real problem here, but that's how it is for now.
@kernellogger I'm currently looking at 4baf12181509, for instance. Even if it is not the cause of my issue, I'm still not quite sure why it was backported into the stable kernel. There's no Cc: that'd say "take it into stable". Yes, it got Fixes:, but this is not an urgent regression fix absence of which would render thousands of machines unusable.
@kernellogger I'm talking about stable kernel backports only.
@kernellogger where I personally can help, I do help. I soak all the stable releases through my machine I use daily before deploying them across the server fleet and, what's more important, my wife's laptop. If I find something, I rant loudly, I go to LKML, I collaborate, things get fixed eventually, but hear me out please, this is not what stable kernel is for. I do not understand why do I get slapped by hundreds of backported patch blindly as a user. I may just go and use -rc releases instead with more success, because they are consistent and do not contained randomly picked patches with the help of some AI bot.
I'm out here. Yes, there is a lot to criticize and to improve wrt to stable trees, but you want to check your claims before doing them:
That AI bot is called "autosel". It suggested some backports to 6.6.y; about 180 afaics: https://lore.kernel.org/all/?q=%22%5Bpatch+autosel+6.6%22 At least some (maybe all) of them are not yet in 6.6.y afaics.
Fixes tags were afaics the main reason why most of the patches went into 6.6.2:
git log v6.6.1..v6.6.2 --grep='Fixes' --grep 'C[Cc].*stable' --oneline | wc -l
554
@kernellogger cherry-picking commits using the "Fixes:" tag is some sort of AI as well. And FWIW:
$ git log --oneline v6.6.1..v6.6.2 --all-match --grep 'Fixes:' --grep 'C[Cc].*stable' | wc -l
6
I wouldn't call Greg and his scripts an AI 😬
And the lack of stable tags is well known: many developers assume that a Fixes: tag is enough.
@kernellogger as if AI is something exclusively bad? As for the lack of stable tags, this, of course, deserves more coverage, be it an LWN article, summit discussion or whatever. But maybe sometimes an absence of a stable Cc: tag means the fix is not intended to be backported?
@kernellogger @krzk this particular fix comes from a series of 16 patches (https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@linux.intel.com/) titled as "xhci features". There's another patch in that series, a5d6264b638e, which looks tightly coupled to 4baf12181509, but it was not picked (of course, as it doesn't have Fixes:
tag). Again, I'm not claiming this is the cause of my issue (I've just rebuilt the kernel with v6.6.2 + reverted 4baf12181509 and will test it for a couple of days), but this approach looks completely wrong: pick a random patch that happens to have Fixes:
tag from a series of 16, which is titled as a "features" series, and do not pick another one which is tightly related. This backport was not properly reviewed. Yes, probably, the -next submission was not properly tagged either.
The absence can mean various things. Most notably it can mean "I only care about mainline and don't want to have anything to do with a backport".
I think I once suggested "nostable" tag to greg (or maybe even on a maintainers summit), but new tags are (rightfully) frown upon.
@kernellogger My humble and ignorant opinion is that the stable backport should happen only if it is approved/reviewed/requested by a subsystem_maintainer/patch_author/user_who_verified_it, and this procedure would not need any new tags. The key word here is "reviewed" however, because looking at hundreds of patches flowing into the stable kernel as backports I do not think they are really reviewed properly, that's physically impossible.
@oleksandr @kernellogger I think you start to appreciate why @suse kernel engineers don't care much about stable kernels and do their own tracking and backporting.
Now, they are a whole team of engineers who work full time mostly on doing that and get paid for the job…
which brings us to the old claim "stable kernels would be a lot better and good enough for almost every use case, *if* all the effort that companies invest in locally maintaining some old version would be spend on helping improving stable kernels" 🥴
@kernellogger @suse @ptesarik in Red Hat we do the same :)
@ptesarik @suse @kernellogger cannot disagree with this
@kernellogger @krzk First of all, I do not complain to anyone in particular at this point (and please don't tell me what to do within my microblog account). I will bisect it as needed, and I will work with appropriate people once I gather enough evidences to approach them in a proper way, publicly, via email + mailing list. Second, I cannot agree the patch hierarchy is gone — there's lore.kernel.org that preserves it. Third, as expressed previously, I do agree the original submission is likely flawed as well. Fourth, I do insist that reviewing what's being backported is necessary because of what likely happened in this very case: a fix might be mislabeled and split into two commits, one of which may be missed.
@kernellogger @oleksandr @suse which brings up the old question how the maintenance of such a commodity kernel should be funded 🤔
@kernellogger @vbabka set joke_mode on
; From every -rc
I'll pick only Cc: stable@
patches and call it linux-steady.git
; set joke_mode off
. Or maybe this should not be a joke?..
@kernellogger that may be true but isn’t what the https://www.kernel.org/doc/html/latest/_sources/process/stable-kernel-rules.rst.txt doc says. If Cc stable is optional, then someone should post a patch to update the text.
I strongly agree with @oleksandr. Stable kernel are too aggressive on backporting fixes and so many times I’ve seen that causing more harm than good. Probably I’m biased since I only see the fallouts but still the process should match the documented rules IMO.
It is not true and Greg occasionally reminds people about that. But he afaics has to deal with reality. And in that there afaics are a lot of fixes that *definitely should* be backported (like regression fixes) that for one reason or another lack a stable tag.
FWIW, don't get me wrong, I partly agree with "stable kernel are too aggressive on backporting fixes" as well[1]. But at the same time I can understand why it's like that under the current circumstances.
[1] That…
…being said, I think the bigger problem is somewhere else: Greg from my point of view backports some fixes too quickly (e.g. before normal persons has a chance to find and report a regression in mainline). But that afaics is also mainly due to current reality, as there is no easy way for Greg to tell "quick backport needed" and "can wait a bit" patches apart.
@kernellogger @oleksandr the process should be followed IMO, whether needs to be adapted to “reality” or not I don’t know but people need to know what to expect. I’m considering just not using Fixes: anymore, unfortunately is something that’s useful for distros who usually do due diligence on the fixes that are backported.
Which process or rule do you mean when you say "process should be followed"?
@kernellogger @oleksandr the one that’s documented in the stable kernel rules text I shared.
"I’m considering just not using Fixes: anymore". That way we'd make the problem worse. In fact it's how it already made the problem worse, as I know some people already sopped using stable tags because a fixes tag seemed to be enough.
but it does not say that the stable team is not allowed to pick up other fixes as they see fit.
@kernellogger @oleksandr it does not say that they will either. And is exactly what I’m arguing, that they are using a different criteria than what’s documented in their own rules.
Maybe.
Let's leave it at that.
From the outside my toots will look like I'm defending the stable process at it is and actually like it that way. But that is not the case at all, I have my beef with it as well. I just tried to bring in the nuanced view of why things are as they are currently.
I'll put it on my list of things to bring up wrt to regressions on next years maintainers summit; but the list is already long. 🥴
@kernellogger @oleksandr I’m not saying that is bad just that the nuances should be documented.
@javierm
On the contrary, you should use Fixes:
as much as possible, BUT also use Cc: stable@
where appropriate. This, along with reframing the criteria for stable backports, would make stable kernels much more predictable.
@kernellogger
@kernellogger
Last time I had a chance to discuss this with the Right Honourable member of linux-stable.git
I was told that if I wanted the stable kernel updates to shrink, I should have asked upstream developers to send less fixes. Which is an utter non-sense. I hope the Right Honourable member of linux-stable.git
is aware of all the nuisances and caveats popped up in this discussion, or in case he isn't, there's definitely an opportunity to talk about it more intensively and extensively, and I thank you in advance for putting this on your List of Things.
@javierm
@oleksandr @kernellogger yes I know, but if that documented process is not followed then I prefer to just opt-out from the automatic backporting and just sent another patch and Cc stable when I consider it. All I’m asking is the _real_ process and expectations to be documented in the stable rules, that’s all.
@krzk
Given the current real-world practices, "stable" kernels are similarly "frankenkernels" too, just much less tested and reviewed.
@suse @kernellogger @ptesarik
@krzk
There's v4.14.330, and there's v4.18.0-513.5.1.el8_9. "Stable" LTS can be as old.
@suse @kernellogger @ptesarik
@krzk
I work for RH in one of the Kernel teams, I do know stuff :).
@suse @kernellogger @ptesarik
@oleksandr @krzk @suse @kernellogger But only Red Hat managed to release a 2.6.40 kernel. 😜
@krzk @oleksandr @suse @kernellogger I'm just teasing you, of course. We both know the true reason: Too many tools assumed that the kernel was always 2.6.x, and this was the easiest way out.
@vbabka @oleksandr @kernellogger there’s no need to wait for a summit though. I don’t understand why updating a process doc to match reality should be a controversial take.
But I wasn't there. And even if, for now I have other things higher on my priority list anyway. 🥴
Then why don't you submit a patch? Yes, I touched that document last, but only because I saw people struggling with it in the scope of regressions and because I needed a distraction on a long train ride.
@kernellogger @vbabka @oleksandr I won’t submit a patch because I don’t honestly know what the process is or the criteria that stable kernel maintainers use to pick the patches for backporting. Someone who already knows this should document that.