social.kernel.org

Conversation

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Jeremy Allison writes:

'" The data shows that “frozen” vendor #Linux kernels, created by branching off a release point and then using a team of engineers to select specific patches to back-port to that branch, are buggier than the upstream “stable” Linux #kernel created by Greg Kroah-Hartman. '"

https://ciq.com/blog/why-a-frozen-linux-kernel-isnt-the-safest-choice-for-security/ #LinuxKernel

6

6

1

Reto

reto@pleroma.labrat.space

Reply to @kernellogger@fosstodon.org

@kernellogger is this really surprising anyone?

1

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @reto@pleroma.labrat.space

Edited 1 year ago

Within certain circles: rarely.

Outside of them: a lot I'd say.

Side note: https://xkcd.com/1053/

0

0

1

Vegard

vegard@mastodon.social

Reply to @kernellogger@fosstodon.org

@kernellogger On the other hand, what's the difference between a distro branching off and backporting stuff from mainline and upstream stable branching off and backporting stuff from mainline? Why can the upstream stable maintainers do this and "a team of engineers" cannot? I think the difference could be better characterized and (if you pardon the expression) makes all the difference.

1

0

1

Greg K-H

gregkh

Reply to @vegard@mastodon.social

@vegard @kernellogger "a team of engineers" COULD do that (hint, Android does, by just taking the stable updates), but the paper shows that a specific "team of engineers" currently does NOT do that, which puts the users of those kernels potentially at a greater risk.

Read the paper, it's interesting.

Disclosure, I read it before it was published, but had no influence on it at all.

2

1

5

Pavel Machek

pavel

Reply to @kernellogger@fosstodon.org

@kernellogger a) distros want no-regressions, not no-bugs. b) -stable people are CVE authority.

1

0

3

bluca

bluca@fosstodon.org

Reply to @kernellogger@fosstodon.org

Edited 1 year ago

@kernellogger as usual, the point is not that these are bug free, but that they are regression free. The kernel upstream releases break userspace on every new release, and kernel maintainers don't care. See https://github.com/torvalds/linux/commit/a1912f712188291f9d7d434fba155461f1ebef66 for example, as Daan just found out, which removed a mount option without caring that it is still being used, so since 6.8 every btrfs device can no longer be mounted by systemd

3

1

0

Eric Sandeen

sandeen@infosec.exchange

Reply to @gregkh

@gregkh @kernellogger @vegard
If I am reading the whitepaper correctly, the count of "missing fixes" includes fixes for subsystems that are disabled and not shipped in the RHEL kernel, of which there are many:

"In addition, some of these bugs may be in code paths that are disabled via kernel config file settings. No analysis has been done on which bugs may be enabled or disabled for a specific vendor kernel config."

Not restricting the analysis to shipped code makes it very hard to take the paper seriously.

In filesystems alone, there are over 1500 upstream "Fixes" commits for filesystems which are not shipped in RHEL8.8.

That's fully 1/3 of the 4594 "unfixed bugs" they cite.

Am I missing something?

1

0

2

Lars Marowsky-Brée 😷

larsmb@mastodon.online

Reply to @kernellogger@fosstodon.org

@kernellogger That's to be expected, but it is also not the point of them.
I agree they shouldn't need to exist, but the realities of how many many an organization manages their IT necessitates their existence.
The industry doesn't want to go through the withdrawal phase of building a better world.

0

1

0

John Wyatt 🐧

sageofredondo@mastodon.social

Reply to @sandeen@infosec.exchange

@sandeen @gregkh @kernellogger @vegard haven't read the paper yet.

A few things: RHEL kernel developers do not backport all security fixes. We do not mandate backporting moderates and lower (unless they are FedRamp) because customers pay for EUS (which 8.8 is) for stability reasons and backporting certain security fixes may affect stability for this limited term release.

1

0

2

John Wyatt 🐧

sageofredondo@mastodon.social

Reply to @sageofredondo@mastodon.social

@sandeen @gregkh @kernellogger @vegard

Also "In addition, some of these bugs may be in code paths that are disabled via kernel config file settings. No analysis has been done on which bugs may be enabled or disabled for a specific vendor kernel config."

This is a big gap. There are a lot of things we do not support. Like btfs that is disabled in RHEL.

1

0

2

ljs

Reply to @sageofredondo@mastodon.social

@sageofredondo @sandeen @gregkh @kernellogger @vegard also the "Our data shows that there is a large overlap between these new CVEs and the ongoing growth of unresolved bugs in the RHEL kernels." statement, when these CVEs are just 'oh this is something that got fixed, we assume it might have been a security bug so we're marking it as a CVE'.

Which basically makes the argument 'unless you take stable you have bugs' which is a bit circular...

And keeping in mind stable is anything but stable, and thus subject to err... regressions which is just ignored you're left with what? This seems very weak.

I guess it's why it's a 'white paper' rather than an actually peer-reviewed paper or something more substantial.

1

1

5

John Wyatt 🐧

sageofredondo@mastodon.social

Edited 1 year ago

@ljs @kernellogger @sandeen @vegard @gregkh Regressions is a good point. I just had to report an rt networking issue we found in RHEL8 that was also in upstream. It was verified, a fix was written for preempt-rt, and Linus even pulled it in an rc. This fix was cleanly cherry-picked back to RHEL9 and 8. Our backports can be so good that our testing on 5+ year old RHEL8 finds issues in the upstream kernel.

2

0

2

ljs

Reply to @sageofredondo@mastodon.social

@sageofredondo @kernellogger @sandeen @vegard @gregkh if only there were somebody in this thread who was somehow concerned with regressions *cough* Thorsten *cough* ;)

0

1

2

John Wyatt 🐧

sageofredondo@mastodon.social

Reply to @sageofredondo@mastodon.social

@ljs @kernellogger @sandeen @vegard @gregkh this also while maintaining a limited kABI guarantee that our customers need for hardware and software compatibility. (Don't get me wrong, I would be very happy if all that was upstreamed.) This is one of the reasons why customers prefer and pay for our kernels.

1

0

2

Greg K-H

gregkh

Reply to @sageofredondo@mastodon.social

@sageofredondo @ljs @kernellogger @sandeen @vegard Android kernels provide a kABI guarantee as well, and have been taking the LTS releases for many many years now (using the tools that a RH developer helped create to ensure a stable kABI, and contributing to them to make them better for you to use as well), so that's really not a valid reason to refuse stable updates, sorry :)

0

0

1

Martyn Welch

MWelchUK@mastodon.social

Reply to @gregkh

@gregkh @kernellogger @vegard As an example, I'm looking into the NXP SDK for their QorIQ Layerscape SoCs. Their released Yocto based system is based on an old revision (3-4 revisions out of date), the latest the seem to have in their public git isn't much better, it's based on a version that gets EOLed this month.

Kernel wise, the latest trees I've found are based on the v6.1.y stable tree. But the latest version merged in is v6.1.55. I believe the upstream kernel tree is currently on V6.1.91.

1

0

0

Martyn Welch

MWelchUK@mastodon.social

Reply to @MWelchUK@mastodon.social

@gregkh @kernellogger @vegard Actually, that not quite right. The latest kernel tree they have is based on v 6.6.y, however that's not in one of their releases and also isn't up to date with stable releases.

0

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @bluca@fosstodon.org

Well, to claim "kernel maintainers don't care" you have to at least report the bug to them[1]. That afaics has not happened yet (or I could not find it).

"since 6.8 every btrfs device can no longer be mounted by systemd": then why was this only noticed 2+ months after a release with that commit went out? This raises the question: what kind of problem did users actually run into?

[1] yes, sure, ideally they would have done a code search first, but we are all imperfect…

1

0

0

Nik | Klampfradler 🎸🚲

nik@toot.teckids.org

Reply to @bluca@fosstodon.org

@bluca @kernellogger It has been deprecated for three years according to the commit message?

1

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @nik@toot.teckids.org

what is considered "deprecated" by the developers afaics does not matter much when it comes to Linus' interpretation of the Linux kernels "no regressions" rule.

At the same time there is neither a stable API or ABI; so things are free to change (like in case of the culprit), as long as nothing breaks.

0

0

1

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @pavel

reg. the "distros want no-regressions, not no-bugs":

from my point of view the whole situation could be a lot better if distros would spend some of the money they currently invest in CI instead invest in working on workflow improvements and some others stuff to ensure regressions do not happen in the first place or are quickly resolved.

1

0

0

Pratham Patel

thefossguy@fosstodon.org

Reply to @kernellogger@fosstodon.org

@kernellogger @pavel I’d be interested in knowing how you would improve the workflows. What’s missing, what can be improved and what shouldn’t be done. I would love to help with this however I can. :)

1

0

0

bluca

bluca@fosstodon.org

Reply to @kernellogger@fosstodon.org

@kernellogger well, the kernel doesn't have a bug tracker - not for real anyway, bugzilla.kernel.org might as well be pointed to /dev/null, so no idea what "reporting" would even mean in this case. I do not use BTRFS so I am not affected, just sharing what was reported to me. It looks like it was reported against the Debian kernel package too now: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071420

3

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @bluca@fosstodon.org

reg. bug reporting:

https://docs.kernel.org/admin-guide/reporting-issues.html

https://docs.kernel.org/admin-guide/reporting-regressions.html

Some of it does not apply in this case.

I also make sure to handle regressions that are submitted to bugzilla.kernel.org

0

0

0

フェリックスたん

felix@misskey.io

Reply to @bluca@fosstodon.org

@kernellogger@fosstodon.org @bluca@fosstodon.org but, following upstream, passthrough patches, and maintain downstream compatibility, is the duty of a middleware, like systemd, from kernel to user interface, no?

1

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @felix@misskey.io

duty? no!

nice behaviour, cooperative, up to some point kinda expected, and what most people do: sure.

0

0

1

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @bluca@fosstodon.org

and thx for the link to the debian bug tracker; but I want to see more details first what went wrong there, as I'd expect it would be pretty unlikely that this is the first debian btrfs user that updated to 6.8 or higher; so why did it break for that user, but apparently not for the others?

2

0

1

bluca

bluca@fosstodon.org

Reply to @kernellogger@fosstodon.org

@kernellogger 6.8 has just arrived in Debian unstable 3 days ago: https://tracker.debian.org/pkg/linux

1

0

0

vbabka

Reply to @bluca@fosstodon.org

@bluca @kernellogger so why doesn't the same thing happen with opensuse? also I grepped old messages and never saw the warning of deprecated option (I have btrfs root). Is the option passed only under some circumstances that might be distro specific?

1

0

1

d4nuu8

d4nuu8@fosstodon.org

Reply to @vbabka

@vbabka @bluca @kernellogger I've just updated my Arch installation with (encrypted) btrfs root. Seems to be no problem.

1

0

1

bluca

bluca@fosstodon.org

Reply to @d4nuu8@fosstodon.org

@d4nuu8 @vbabka @kernellogger I have no clue, I don't use BTRFS, I just get bug reports

0

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @thefossguy@fosstodon.org

@thefossguy @pavel

There is no easy answer here, as it are lots of details; but there is a decent chance I need to write this up soon anyway; if I do, I'll get back to you!

0

0

0

James Bottomley

jejb@mastodon.online

Reply to @kernellogger@fosstodon.org

@kernellogger @bluca I can confirm 6.9-rc5 is running just fine for me with openSUSE and a btrfs root filesystem on my main laptop so it looks like this may be specific to something Debian did.

0

0

1

James Bottomley

jejb@mastodon.online

Reply to @kernellogger@fosstodon.org

@kernellogger I'm afraid I can't support the counting methodology in the paper either. Besides the not applicable because of config issues RH people cite, there's also the fact that not everything that has a cc: stable tag is an exploitable bug. Plus every fix backported carries risk (just look at the number of regressions in stable due to backports) so that risk has to be set against the benefit of the backport. A general rule would be if it's not exploitable don't backport it.

0

4

3

Daniel Micay

DanielMicay@grapheneos.social

Reply to

@triskelion @kernellogger I didn't warn against using the upstream LTS branches although the older ones do get much less backported.

0

0

0

bluca

bluca@fosstodon.org

Reply to @bluca@fosstodon.org

@kernellogger this is now being reverted, fortunately: https://lore.kernel.org/all/44c367eab0f3fbac9567f40da7b274f2125346f3.1716285322.git.wqu@suse.com/

1

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @bluca@fosstodon.org

Edited 1 year ago

thx, yeah, I already have been watching that.

1/ FWIW, I think you owe the kernel developers an apology, as you made a lot of noise and claimed "kernel maintainers don't care", when they clearly do once the problem was properly reported -- and quite quickly even. And yes, sure, in the ideal world they would have cared some more and performed a code-search before removing this option to prevent it in the first place. But we are all imperfect and make mistakes. Same for @pid_eins, who…

1

2

3

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @kernellogger@fosstodon.org

Edited 1 year ago

@bluca @pid_eins

2/ …wrote "And my main beef here is that they claim they wouldnt do it ever..."[1], as that is not even true. They often try changes or removals to see if it breaks something – and if it does, it's reverted. Even the removal of the support for the original i386 was handled like that by Linus himself.

[1] https://mastodon.social/@pid_eins/112456728724286680

2

0

1

Lennart Poettering

pid_eins@mastodon.social

Reply to @kernellogger@fosstodon.org

@kernellogger @bluca sure, but then the rule is not "we never break userspace" but more "move fast and break things, and sometimes revert where people protest too loudly".

I mean, that's fine by me, but maybe they should communicate it like that then.

The thing is that removing a widely documented mount option is very *obviously* a compat breakage. You cannot discount that. It's not just a "mistake" to remove something like that, it's an *obvious* attempt to break compat.

2

0

0

🐧sima🐧

sima@chaos.social

Reply to @pid_eins@mastodon.social

@pid_eins @kernellogger @bluca yeah in graphics we go with a 10 year delay for the obvious compat breakages

so either wait 10 years after the last known user was updated to the new interfaces (where we know of them, which is the usual case since it's all open source)

or 10 years after the replacement shipped for more script interfaces like some of the stuff in sysfs

1

0

0

🐧sima🐧

sima@chaos.social

Reply to @sima@chaos.social

@pid_eins @kernellogger @bluca 10 years seems to be enough where the only people you would end up breaking are those who don't upgrade kernels anyway, ever

1

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @pid_eins@mastodon.social

@pid_eins @bluca

the exact relevant rule is just "no regressions", nothing more. Everything else is just left to the interpretation by people (in typical Linus manner you might say). 🥴

1

0

1

🐧sima🐧

sima@chaos.social

Reply to @sima@chaos.social

@pid_eins @kernellogger @bluca of course there have been screw-ups and misses. but when those happen we try to put the references to the relevant userspace we broke into the reverts, so that people can start the 10 year clock at the right time

0

0

0

Lennart Poettering

pid_eins@mastodon.social

Reply to @kernellogger@fosstodon.org

@kernellogger @bluca

Actually, the exact relevant rule is "WE DO NOT BREAK USERSPACE", all in uppercase.

https://lkml.org/lkml/2012/12/23/75

I find the sound of that mail quite different from your much weaker "let's maybe undo the worst shit if people complain too loudly"... And of course "uh, sometimes we fucked up so hard, we cannot fix it anymore, let's add a new api instead" (which is what happened in the block device capabilities/media change api).

2

0

0

Lennart Poettering

pid_eins@mastodon.social

Reply to @pid_eins@mastodon.social

@kernellogger @bluca

(again, I actually find it OK if API is broken from time to time, just be honest about it, and communicate properly, and do a bit of research first. Don't claim that uppercase extremism and then do not even superficially follow through)

0

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @pid_eins@mastodon.social

Edited 1 year ago

@pid_eins @bluca

hmmm:

$ grep -ri 'no regressions' Documentation/ | wc -l
13

$ grep -ri 'not break userspace' Documentation/ | wc -l
0

Also:

"WE DO NOT BREAK USERSPACE": 2 hits – https://lore.kernel.org/all/?q=f%3ATorvalds+%22WE+DO+NOT+BREAK+USERSPACE%22

"no regresssions": 44 hits –https://lore.kernel.org/all/?q=f%3ATorvalds%20%22no%20regressions%22

0

0

1

SchwarzeLocke

SchwarzeLocke@ohai.social

Reply to @kernellogger@fosstodon.org

@kernellogger @pid_eins My impression, having more of an outside perspective and working with higher level languages: should deprecations perhaps always be gated with a config flag, perhaps even a common one similar to BROKEN?

With Java/Scala, it's always quite clear for me where deprecated methods are used. Also I can have builds fail due to that or not, so that I notice new deprecations when building / in CI.

1

0

0

Thorsten Leemhuis (acct. 1/4)

kernellogger@fosstodon.org

Reply to @SchwarzeLocke@ohai.social

@SchwarzeLocke @pid_eins

there are various things that can work and I guess it depends on the situation what reasonable and effective.

For the kernel I something think "add delays (together with a msg in the logs) that grow longer and longer over time when people use deprecated stuff, at some point people get curious and will investigate" might be something that might help, OTOH it's a kind of stupid idea 😂

0

0

0

Daniel Micay

DanielMicay@grapheneos.social

Reply to

@triskelion GrapheneOS has responded to it.

0

0

0

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org