Conversation

reading vfs code is wild

2
0
1

Vlastimil Babka πŸ‡¨πŸ‡ΏπŸ‡ͺπŸ‡ΊπŸ‡ΊπŸ‡¦

@sima yep it's all @brauner 's fault

2
0
1

@vbabka @sima /me hides behind Al.

1
0
1

@vbabka @sima I've heard that complaint from Simona not too long ago about mm code. For all we know this could all just be PEBCAK. πŸ˜›

1
0
1

@brauner @vbabka they're wild for different reasons, mm feels like standing on what's not even quick sand, vfs feels like digging through an endless dungeon of layers

I was trying to stitch together the magic open on procfs fd files and how that ends up in no_open for dma-bufs. I think with a bit of hollering and some blog posts as guidance I did walk the entire path (pun intended)

2
0
0

Vlastimil Babka πŸ‡¨πŸ‡ΏπŸ‡ͺπŸ‡ΊπŸ‡ΊπŸ‡¦

@brauner @sima everyone hides behind AI these days

0
0
1

Christian Brauner 🦊🐺

Edited 11 days ago

@sima @vbabka it's inode>f_op == empty_fop that causes ENXIO. It's how socket and so on prevent from being reopened through procfs or any open function that expects to call f_op->open().

1
0
0

Christian Brauner 🦊🐺

Edited 11 days ago

@sima @vbabka that specific thing trips up a lot of people. But it's crucial to prevent stuff from being opened again. There's other ways to prevent this but that's the main one. Though I always found the ENXIO errno to be a bit strange.

1
0
0

@sima @vbabka also you can just ask me instead of sifting through the web ofc.

1
0
0

@sima @brauner @vbabka and both MM and VFS have exciting code that deliberately sometimes violates what a purist might consider memory safety (specifically the SLUB allocator fastpath in MM and d_path() in VFS)

1
0
1

@jann @sima @vbabka Come on, you forgot to mention our SLAB_TYPESAFE_BY_RCU insanity, combined with dead and saturation zones for refcounts. __fget_files_rcu() is a piece of art?

2
0
1

Christian Brauner 🦊🐺

Edited 11 days ago

@jann @sima @vbabka OOOOOOR: our struct fd handling where we steal bits from the pointer alignment to add a fastpath for the single-threaded case. :D It's always fun when people discover this for the first time.

1
0
0

@brauner @sima @vbabka somehow it feels more offensive to me to deliberately let d_path() sometimes copy out-of-bounds kernel memory because reading a pointer+length pair atomically is too slow

1
0
1

@jann @sima @vbabka oh, and I love the absolutely bonkers allocation scheme in getname_flags() and getname_kernel().

1
0
0

@jann @sima @vbabka copy_from_kernel_nofault() and our lord and savior sequence counters

0
0
0

@jann @sima @vbabka fwiw, I think @paulmckrcu's RCU put sequence counters on steroids because it made it almost trivial to futz around with pointers protected by sequence counters.

1
0
1

@brauner @jann @vbabka @paulmckrcu I might be too kernel-brained, but these all feel fairly benign. they're very clever, but their complexity doesn't leak. it's the stuff where you need to load massive amounts of context and myriads of callchains into your brain first before you can even start to ask the right questions, much less find the answer

2
0
0

@brauner @vbabka so aside from getting derailed a bit by nd_jump_link() and how the name lookup machinery works it's honestly all very easy to read and follow, some really nice code. just a pile of context to load before I could piece the story together in its entirety

0
0
0

@sima @brauner @vbabka @paulmckrcu yeah, I agree with that. when a new local optimization is added that has a simple-looking diff but actually relies on adding a special case to the undocumented safety rules observed by the whole subsystem, things get real exciting...

1
1
0

@ljs @sima @brauner @vbabka yeaaah but as some bugs have shown there has also been some "we'll add this special rule over here to make things work, probably fine, and that special rule over there to speed things up, also probably fine, and oops nobody considered how those two interact with each other"...

0
0
0

Christian Brauner 🦊🐺

Edited 11 days ago

@jann @sima @vbabka @paulmckrcu I mean that's one of our absolute major tasks: push back on anything that bleeds VFS guts into anything not-core VFS because it is just going to be an absolute shit show.

And it's absolutely proven that it's going to be a shit show. Take the d_path() example: bpf exposed d_path() via bpf_d_path() to random bpf programs in userspace and it caused hilarious crashes. Only to then come back and demand a "speculative" variant that exposed even more VFS internals.

1
0
0

Christian Brauner 🦊🐺

Edited 11 days ago

@jann @sima @vbabka @paulmckrcu That's one of the major "Christian is an asshole" moments for most people I'm pretty sure. They think I pushback aggressively because I dislike whatever it is that they're working on when really I push back because I don't want to expose complex internal VFS behavior to the whole world.

0
0
0
@sima @brauner @jann @vbabka RCU is benign? May I please get that in writing? ;-)
1
0
2

@paulmckrcu @sima @jann @brauner @vbabka whoever writes this for Paul, please use rcu_assign_pointer()

0
0
1

@sima bro, have you seen net code yet?

0
0
0