I fully understand why #Linux mainline developers do not have to care about stable #kernel maintenance (IOW: backporting to earlier, still supported series) at all.
And I'm mostly fine with it. But I think it's wrong when it comes to recent mainline regressions that bother users.
Especially when they cause severe damage like disk corruption (as seen by multiple reporters), as it this case:
https://lore.kernel.org/all/20241003160454.3017229-1-Basavaraj.Natikar@amd.com/
Backstory: https://bugzilla.kernel.org/show_bug.cgi?id=219331 & https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/
@kernellogger This is why I learned to value stability above all else on Linux over the years.
I think we should stop using the word "stability" when it comes to Software.
* Some people read it as "don't change anything, unless there is a damn strong reason".
* Some people read it as "changes are fine, as long as everything works reliable and does not force me to do any work (like adjusting configuration files)".
* Most people think of different points somewhere in between those two.
That leads to people talking past each other.
@kernellogger Do you have anything watching rates of reports on different CPU types? It was surprising that turned out as AMD until the bisect pointed the finger.
Nope. 🙁
Hopefully sooner or later someone will first revive/rewrite kerneloops[1] and afterwards makes things like that possible, too.
2/ BTW, if a kernel developer tells you something like "looks like the crash is not related to my change"[1], try to validate that statement and object if needed.
In this particular case the problem then most likely would have been noticed and fixed more that two months ago!
I partly blame myself that it took so long to get this fixed (which afaics would have avoided a lot of trouble for some people!), as I should have told the reporter back then to do the above.
[1] https://lore.kernel.org/all/da9ccae0-504a-48d3-ade5-a16e53b4a9b5@amd.com/
@kernellogger IMHO maintainers very much need to (and should) care about stable! It's what people run, they don't run your development tree or the pristine new release from Linus. It's what distros rely on. Maintainers saying they don't care about stable is lazy and irresponsible.
@axboe @kernellogger I run Linus releases only exactly because I'm fed up with the "stable" stability.
@oleksandr @kernellogger probably these two problems go hand in hand. If maintainers took better care of stable, this would not be an issue.
@ljs: reminder: autosel and the script that quickly backports nearly everything that contains a Fixes: tag are two different things. And I think the former is not much of a problem these days due to some adjustments Sasha did -- but that is just a feeling, I have no stats to back that up.
@vbabka @oleksandr @kernellogger you can just do that if you want. I just watch the queue and complain if something should not get auto picked. Either way works, and I rarely see things that should not get picked.
@ljs @oleksandr @kernellogger @vbabka There's no one true way, and I definitely don't care if maintainers say "don't include us in stable!" if it means that they handle it themselves and send whatever needs to go into stable to stable. What I find problematic is autosel + maintainers ignoring it, and that's clearly an autosel issue. I think it does more good than harm, but it definitely picks up patches it should not because dependencies aren't understood.
@ljs @oleksandr @kernellogger @vbabka auto-select of patches should probably be a subsystem opt-in type of deal, rather than a default thing because of that.
The main issue here is saying stable is, well, unstable. If that's the case, then either maintainers messed up, or the process needs improving (or both). There are bugs in everything, and we need stable point releases. That part is fact. Hence we need to make it work.
@pavel @oleksandr @kernellogger @ljs @vbabka checking it is not necessarily easy. It may look fine and apply fine, but it depends on behavior introduced by a patch that isn't referenced at all, and which may have gone in in an earlier release. So I don't think blaming Sasha is fair, there's just no way anyone can make a 100% safe call on a patch across all subsystems in the kernel.
Maybe autosel should be emailed out, and depend on the maintainer(s) acking it to actually get included.
@pavel @kernellogger Of course they care about stable, implying otherwise is nonsense. You may disagree with how they do that work, but that's a different argument.
@gregkh @axboe @kernellogger @ljs @vbabka Two maintainers and a script cannot be smarter than ten other maintainers.
@ljs @axboe @kernellogger @gregkh @vbabka Don't call it "stable" then?
@ljs @axboe @kernellogger @gregkh @vbabka Both are unstable. And yes, if you are doing things other people should do, those people will not start doing things themselves because why would they. Same with children.
@gregkh @oleksandr @axboe @kernellogger @ljs @vbabka from my perspective, certainly not enough patches are being backported to stable which makes quality a real issue for stable kernels. Products can't move to the latest and greatest mainline easily. Most people who hit bugs don't end up reporting them. Assuming it is an obvious bug. Performance and power regressions are harder to quantify for example. Stable kernels are what real end users are on after all, not mainline kernel