Conversation

This is such a bad bad API compat breakage:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e81cd5a983bb35dabd38ee472cf3fea1c63e0f23

It's used all over the place in userspace. In systemd we use it:

1. to detect if a block device has partition scanning off or on
2. In our udev test suite, to validate devices are in order
3. udev rules use it for some feature checks (in older versions of systemd).

And it's even a frickin documented userspace API:

https://www.kernel.org/doc/html/v5.5/block/capability.html

So much about that nonsensical "we don't break userspace" kernel mantra.

6
8
1

@pid_eins impossible: the kernel never, ever breaks userspace compatibility. You must have been holding the sysfs wrong.

0
0
0
@pid_eins "without ever adding a value to an UAPI header"

This not correct? Or did you hardcode?
1
0
0

Anyone knows where the kernel's github/gitlab project is? Would love to file an issue or placeholder revert PR, but somehow I cannot find it! Anyone?

(Yes, this is a joke, I am fully aware of the concept of mailing lists – as a historical concept from the 2005 era... Yes, I am too lazy to figuring out how to report this properly. Hence social media it is.)

5
1
0

@pid_eins At least there is a place where issues won't get solved, but we can complain. In our office it's usually at the coffee machine.

Have you tried complaining on Reddit?

0
0
0

@pid_eins Imho the Linux development process has to be one of the biggest turn offs from contributions.

It's nuts how a piece of software that underpins vital systems has such a backwards and opaque development process, mailing lists in 2024.

0
0
0

@ljs Look at the docs link I provided. It documents literally that the documentation for the bits is to be found in include/linux/genhd.h.

Sorry, but the ship has sailed. Adding such a comment to the documentation and expecting this not to be public API doesn't work.

Also, iirc the way I understood Linus the "we don't break userspace" actually is supposed to mean "we don't break userspace". Here it ceratinly breaks things all over the place.

1
0
0

@zygoon Which noone looks at supposedly, and which is scheduled to be turned off soon, because mailing lists are apparently much better.

0
0
0

@pid_eins

There you go:
BLOCK LAYER
M: Jens Axboe <axboe@kernel.dk>
L: linux-block@vger.kernel.org
S: Maintained
F: block/

1
0
1

@cJ I tried to click on this, but it didn't open a bug report form. What am I doing wrong?

(Dude, I know, I am just making fun of kernel development mechanisms, it's such a turn-off.)

1
0
0
@pid_eins A header that no longer exists? :)

It's not great that that's referenced, and the doc page in general isn't wonderful (nor was the idea of 'just exposing' something without abstracting in uAPI).

But at the same time, surely relying on you hardcoding kernel-specific value might be a clue that it's not a great interface to use?

So on one hand I agree with you the doc is bad and shouldn't have implied you can do that, but on the other hand a user relying on this should have been more than a little wary.
3
0
1

Let me mention that I have sent an email to linux-block ML btw. I am not sure it went through though, can't find it on any mailing list archives.

It's the reliability and synchronous feedback I particular love about submitting bug reports, patches, and reviews via email.

3
1
0

@pid_eins Don't forget trying to deduce if you need to subscribe or not, or if they use moderation.

I once got a lecture from an angry maintainer (non-Linux) for "not appreciating timezones" after resending, 'cause it turned out they had undocumented moderation??

0
0
0

@ljs Well, how are userspace folks supposed to navigate kernel apis then? I mean, a good chunk of kernel apis are not documented at all, another chunk is pretty badly documented, and the stuff that actually *is* somehwat documented is something one should not have trusted, as you say?

I mean, the stuff is not just documented, it's also part of sysfs, i.e. one of the primary concepts for exposing kernel APIs to userspace which even has a rough concept of introspection and evrything

1
0
0

@ljs this is not a frickin ioctl of some niche subsys. It's very very core stuff exposed via sysfs, and documented (yes, badly documented, but still documented), with no other known way to get this very basic info.

Come on!

1
0
0

@ljs Also, the header definitely *did* exist when we started checking for the capability field.

0
0
0
@pid_eins mate I'm writing a literal book about -mm I'm all on board with the documenting things more train 🤣

Like I said, I agree the doc page was not clear and referencing a kernel header was a bad idea. With you on that.

But if you're hardcoding values you've copied from a kernel header, that right away means you are relying on internal kernel implementation details no?

We can agree entirely that saying to do that in the doc was not great. I would say the doc was just wrong to do that.
0
0
1

@pid_eins

You may not know know that I know you know.

But as you say, you're making fun of it.

You know that the kernel has a history of striving to have a distributed and "down to the essentials" development workflow, and sending and receiving plain-text e-mails can be done securely and by anyone.

The way you're making fun, it looks like you're simply diminishing other people's core values.

Maybe you can be the one to find something better than mailing lists (which are older than 2005....)?

1
0
0

@cJ @pid_eins I cut my software engineering teeth in the era of sending patches to mailing lists. There are many good reasons I don't do that any more.

There are just so many better solutions. In my previous role we used Gerrit, currently it's GitHub. Gerrit was superior, imho, and is self-hosted. Both are so vastly superior to mailing lists they're not in the same class. It's like the difference between git and CVS.

1
0
0

@mattb @pid_eins

The low-level protocol uses e-mail messages, what prevents those who really want it from creating a front-end over it to get a "forge experience"?

2
0
0
@pid_eins @cJ @mattb this same conversation comes up on a regular basis with neither side listening to one another and round and round we go on the merry-go-round.

Personally I think the way forward is incremental because it won't change overnight, and some people do make efforts to try to improve things (with little praise!).

I do think over time things will evolve.

Also the 'kernel does the simplest dev thing' is not a great argument when you look back at the story state of kernel source control in the past. Git was a colossal improvement, and the kernel was able to shift.

In theory subsystems could do development via forge/github/wherever and push things upstream via scripts so could have bottom-up improvement. For top-down it'll need people at the top to push for it, which I think is unlikely to happen soon.

At least we can all agree systemd is the most sensible init system without any controversy or flamewar though right? ;)
1
0
7

@ljs @mattb @pid_eins

I've been an early systemd user and have included it most of my embedded Linux works too.
Still, I don't think there's a point in qualifying systemd as "the most sensible" unless you mean "in most cases" or for a particular purpose, otherwise the only answer is "it depends".

In the same way that I'm a "low-digits" github and gitlab user.

I am, like you, annoyed when people want to force their thing onto others while ignoring their valid concerns.

1
0
1
@cJ @mattb @pid_eins I think the best way forward is dialogue.

I am personally a fan of systemd and find the arguments against it utterly uncompelling.

But, whether about kernel dev practices or init systems, I think civil dialogue wins out, and shouting past each other as if one side is morally superior to the other is just useless.

I mean I quite like drama and such, but I don't think it's hugely productive :)

Mostly the email dev is difficult to change for legacy reasons, but there are some legit resilience concerns there.

Probably you could address the latter with a well-run forge (I'm not expert so I'll say probably here), or something similar.

The former needs incremental change like tooling from e.g. b4, lei etc.

Then bottom-up changes and advocating to top people and it'll probably change eventually.

Maybe when (if) Linus gets on board with it...
1
0
4

@ljs @mattb @pid_eins @cJ be kind lorenzo

1
0
2
@pid_eins If you are too lazy to report it properly, I'm sure someone else in you company can do better job.
0
0
0

@ljs btw, the flag we used actually *was* fully documented even:

https://www.kernel.org/doc/html/v5.15/block/capability.html

Quoting:

"This file documents the sysfs file block/<disk>/capability."

"GENHD_FL_NO_PART_SCAN (0x0200): partition scanning is disabled. Used for loop devices in their default settings and some MMC devices."

That's such a bad API break

1
0
1

Small update: it's worse than originally thought:

1. Kernel broke API not once, but twice already on this.

2. The sysfs API was actually fully documented, but that didn't matter. Full docs are here:

https://www.kernel.org/doc/html/v5.16/block/capability.html

0
2
0
@pid_eins Apparently you are more interested in trolling then in getting the regression fixed. Noone says kernel development is easy but you are not even trying to do good job. :-(. (hint: describe real thing it breaks, cc hch, cc linus).
2
0
3

@pavel @pid_eins no i am saying kernel development is easy
i mean if @ljs can do it
🍷

2
1
3
@pid_eins yeah ok that makes it much worse...

But the big mistake was that we exported kernel-specific values like that without abstracting in uAPI.

The fact a whole sysfs node just got dropped isn't great either...

I feel like this is a big doc fail because the fact that existed wasn't even taken into account.

Ugh!
0
0
1
@pavel @pid_eins yeah of course Mr. well-known-troll is a well-known-troll (I mean I can't speak TOO much because I do enjoy a good shit post myself).

This is a bad break now I'm convinced after seeing that, before I thought it was a bit of a shit interface with some questionable docs, but the fact it's a whole sysfs node AND that doc pages literally says 'here use these values' means it's not great.

If you filter out the usual whining about email development I think he did email the list in the end anyway.

And I think @kernellogger is tracking...
1
0
3
@pid_eins seems nobody yet pointed it out or I missed it. But note the "we don't break userspace" rule doesn't mean it can't ever happen - that would mean no bugs can ever happen as well. What it means is that if it happens and is reported (the sooner the better), it has to be fixed (with some exceptions that I think wouldn't apply here).
2
0
3
@vbabka @pid_eins does "whining on social media because sending an email is hard" count as reporting? 🤣
0
0
2

@vbabka well, i a actually somewhat OK with braking APIs if need be. I am just a bit annoyed of carrying this claim "we don't break userspace" around all the time and not actually being really any good at it at all, and when it comes to actually breaking things, we are usually brushed off, called "whiney", and told that we "did it wrong" in the first place, and theat kernel's own docs where just "unfortunate", and it's not a bug, because the docs should not have been written like that anyway.

2
0
1

@vbabka I mean, systemd has APIs too, it's hard to keep them stable, we try, and every now and then we fuck up. But we never made it our frickin' mantra, and claimed we actually were really good at it. We do have invested a bit in trying to be OK at it though, i.e. have test suites, integration tests and shit, that validate the APIs automatically. It's not perfect, but it exists.

So, if the kernel would actually try to live up to its claim that they don't break userspace

1
0
1

@vbabka … then maybe have a testsuite for this, that actually checks this, and make maintainers care, to google for their interfaces before dropping them to see if they are used.

But as it appears right now, there's a massive difference between what the kernel community claims and what it actually does about it.

The "unbind" uevent thing was by far worse btw, showed total ignorance from kernel folks about any userspace API issues.

So this isn't really an isolated event, it's happens regularly

2
1
1

@pid_eins @vbabka now that the kernel folks can get cve numbers for cheap they should assign some to the abi breakages too

1
1
2

@ljs @pid_eins @pavel

I do, but I decided to only watch this thread and not become involved (but I was tempted to reply a few times already, but until now I was able to resist…). 😄

0
0
4
@lkundrak @pid_eins @vbabka we avoided an abi break by removing the api first
0
1
4
@pid_eins @vbabka FYI from my point of view regardless of the fact the doc exposure was 'unfortunate', this was definitely a regression. That's not mutually exclusive.

As somebody who's likely to do a lot more in the kernel, soon, I am actually taking this as a lesson that, if I touch any API, I will very carefully examine usage _including grepping systemd source_ as well as checking for any docs first.

I also firmly believe in improved testing (+ will do my best to actually add some).

So some of us listen and pay attention ;)

Not sure whining about sending an email shows an equal level of sincere effort though.
1
0
2
@vbabka @lkundrak @pavel @pid_eins "can" break -next several times
0
0
1

@ljs @pid_eins @vbabka the search function on Github has improved massively in the past year or so, and it's now really powerful. Given pretty much everything is at least mirrored if not developed on GH these days, a quick search for the API in question before removing it would most likely save _a lot_ of future headaches.

2
0
2
@bluca @pid_eins @vbabka indeed, that's a good shout too.
0
0
1

@pid_eins @vbabka There is in theory full coverage of at least the syscalls interface (I don’t think I trust it since it was churned out to spec but it’s there), I’m never sure why the people doing that stuff aren’t as excited by stuff like sysfs

0
0
1