Conversation

hm I thought there was a blog or rant or kernel documentation page about persistent device naming, similar to stable kernel abi nonsense from @gregkh but I can't find it, was I dreaming?

https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst

2
0
0

@sima @gregkh
I remember a mailing post by Greg which pointed at systems where PCI slot numbers or similar would vary randomly between different boots.

1
0
0
@sourcejedi @sima Yeah, no blog posts, just loads of lkml emails over the years, sorry.

That machine that randomly reassigned PCI ids at boot time is no longer with me, but it was great for testing. I'm sure you can do the same with virtual machines these days, passing in different virtual pci devices to them.

Persistent naming is for userspace to handle, the kernel just uses "grab the next free number" as it's naming "policy" as that's all it can really do.
2
0
3
@gregkh @sima @sourcejedi Was the ordering stable though?
Although I guess with PCIe being somewhat hotplug it would still not be reliable.
2
0
0

@lanodan @sima @sourcejedi @gregkh I had modules loaded in different order, and got different interface names. Switched to persistent naming after that.

(Though when drivers were built-in rather than modules, ordering was stable.)

1
0
0

@sima @gregkh What should one use to get a persistent identifier for PCI devices? Anyone doing PCI device passthrough that persists across reboots needs this.

1
0
0

@gregkh @sima @sourcejedi I really wish that the kernel never reused /dev nodes or major:minor numbers. RIght now, one must do verification after calling open() if one wants to avoid race conditions. Of course, lots of programs do not do that.

1
0
0

@sima @gregkh What is actually wanted is to be able to detect the physical topology of the system, as determined by what card is plugged into what slot.

1
0
0
@alwayscurious @sima Then do that,the kernel provides you this information through sysfs, that is what it was explicitly designed for.

But yes, the race condition of "parse the topology, determine the device node, and go to open it" when the device is removed and a different one added right between those last two steps is real. Luckily in real-hardware situations, almost extremely rare if not physically impossible due to hardware debounce times, and one that we explicitly did not care about when we created sysfs and udev (i.e. physical access trumps anything).
1
0
0
@alwayscurious @sima @sourcejedi That ship sailed decades ago as we had to support device node reuse a long time ago, it was a requirement! But obviously not your requirement :)

You have full control over device node creation in userspace, that's what udev gives you if you want (or any udev-like program). set up a whole different /dev/ with just your naming/numbering scheme. The kernel gives you the interface and the information to do this, why not take advantage of it if you need it?
1
0
0

@gregkh @sima Some more questions:

  1. Which entry in sysfs correspond to physical (as opposed to logical) topology? Is it the path under /sys/devices?
  2. How can I go from this path to a PCI bus/slot/function?
  3. Will this path change when other cards are added or removed or if the system firmware is updated?
  4. Is there a way for driver probing to be deferred until after userspace can check the device against the actual topology of the machine? That would allow checking if the device that claims to be a serial port in slot X is actually supposed to be a serial port, or if it is a GPU passed through to a VM that the VM compromised and is now pretending to be a serial console. In the latter case the device would never be allowed to be used except for passthrough.
1
0
0

@gregkh @sima @sourcejedi Even if the path is never reused, the device major and minor number can still be reused. Right now I think one needs a custom FUSE filesystem if one wants opening e.g. /dev/disk/by-diskseq/1 to be race-free, and that’s bad.

1
0
0
@alwayscurious @sima PCI devices, at the bus/slot/function level do not have device nodes, so I don't understand the issue here.

They might have a specific PCI driver bound to them, at the function level, and if so, the parent of the device node for that class device (i.e. input, tty, drm, etc.) will then point to that PCI function. But PCI slots don't always match up to PCI bus and device numbers either, as that's a physical thing and many PCI systems don't expose or even know that information (i.e. the BIOS doesn't know.)

Also, PCI bus numbers can change at boot, so you can't know what is happening.

Driver probing can be deferred at any time by userspace for USB devices, and I think that was recently added for PCI devices too, look for the "trusted" device information in the documentation somewhere.

Good luck!
1
0
0
@alwayscurious @sima @sourcejedi Again, reuse of major/minor numbers was a design requirement at the time. And, you know the path / label / metadata / whatever for the block device before you mount it, so go off of that information if you don't trust the device major/minor number information.
1
0
0

@gregkh @sima The problem is that whether a device should be trusted depends on what slot it is plugged into 😞. Are there systems that do expose slot information? If so, which ones are they, and is there a way for userspace to get it on these systems?

0
0
0

@gregkh @sima @sourcejedi Is this because dev_t was 32 bits back then?

0
0
0