Conversation
Edited 1 year ago

Also earlier, found another core kernel bug (in the IOMMU subsystem, again). Turns out if a device is assigned to multiple IOMMUs, and the first one probes but the second one defers, everything breaks.

I knew that code smelled when I last had to track down a race condition in it...

Edit: removed the edit.

4
1
0

@marcan Do you have an idea why you encounter all these bugs? Is the ARM Mac using such "obscure" code paths? From a listener perspective it sounds like x64 desktops aren't affected.

1
0
0

@cpy Well, x64 desktops tend to barely use IOMMUs even today and wouldn't have two IOMMUs for the same device. As for all the PHY/Type C stuff, desktops don't use any of that.

The reason why x64 desktops work is the code is a buggy mess to begin with, but the bugs that affect the major platforms get fixed. All the others don't.

1
0
0

@marcan So I imagine you mean a dt node a bit like this?
foo@12340000 {
<stuff>
iommus = <&some_iommu 0>,
<&different_iommu 1>;
}
And that different_iommu hasn't probed yet?

If that's it, I can say from Qualcomm DTs that I've never seen something like this, often different SIDs from the same iommu are used, but most devices even only have one iommu nowadays I think. And on older ones there's e.g. a GPU iommu that is only used by GPU and no other iommu is used there.

1
0
0

@z3ntu Yes, like that. Apple devices have multiple split IOMMUs for USB and ISP, and we also unify all individual display controller IOMMUs under one domain/virtual device to make the DRM stuff a lot less painful than trying to manage them individually for each display controller. All the IOMMUs are completely independent, at widely spread MMIO addresses (usually adjacent to whatever device they're attached to)

Apple themselves implement this with a stupid hack in the IOMMU driver to replicate register accesses in the backend and treat it like a single IOMMU, and that's probably what most random Linux embedded vendors would've done too (instead of fixing the bugs in Linux multi-IOMMU support), but we do things the right way here, so...

1
0
0

@z3ntu Note that this only works (by design) if the IOMMUs use the same driver, since you do end up with a shared IOMMU domain, shared page tables, etc. It wouldn't work with heterogeneous IOMMU types, but that seems rather crazy. But yes, they are separate IOMMU device instances using the same driver.

0
0
0
@marcan tbh I find this and related posts quite toxic. The IOMMU maintainers are nice guys (one of them is my team colleague) and they don't deserve to have people reading your posts getting the impression that their subsystem is crap.

In the thread you mention doing some unusual (or maybe first time) scenario with it. It's completely normal to encounter bugs in such situations and simply fix them along the way, no need to gloat about it.
2
4
25

@vbabka @marcan I somewhat second this comment. Marcan you are excellent at finding and exposing and fixing bugs, and your philosophy of doing things the right way can only be praised. Mainstream platforms get the fixes, priorities always at play, things evolve, some software falls back and becomes broken... It is how it is. It just needs someone to do the fixing some day. Whether that person attacks that reality aggressively or passive aggressively does not really improve anything. So it could indeed be regarded as somewhat toxic to attack that state of affairs.

0
0
2

@vbabka The aforementioned smelly code was already identified as smelly long ago and patches were recently posted to the mailing list to improve it. I don't know yet whether they fix this particular bug, but this entire part of the code had already been identified as problematic.

There's a difference between criticizing code and criticizing people. And I have no shame in criticizing kernel code, because I'm a) part of the team, and b) subject to the inane kernel submission process when I contribute elsewhere than my corner, and a subset of kernel maintainers still believe that deliberate gatekeeping keeps code quality up, and that's just a patent lie that needs to be destroyed.

2
0
0

@marcan Excuse my ignorance with this, but why would you need to do that? I was under the impression that IOMMU groups were for virtual machines.

1
0
0

@gudenau Because Apple SoCs sometimes have more than one IOMMU ganged together for a peripheral (USB, ISP). Also because ganging up all the display controller IOMMUs into one virtual display device that handles all the outputs is a lot easier than trying to dynamically map and unmap framebuffers to individual ones.

0
0
0

@marcan @vbabka There's indeed a difference between criticizing code and people. However, implying that criticizing code is no big deal is toxic. You can't possibly expect someone who worked on something for weeks, months or years to take a critique on what they did like it's no biggie. It does have a very real impact.

1
4
3

@mripard @vbabka If the Linux kernel maintainers want to be treated more politely they need to start by treating contributors more politely. There's a very real toxicity problem here and it's not me. Plus the Linux kernel isn't someone's little pet project where a single person is responsible for code quality either.

2
0
2

@marcan @mripard @vbabka You are being called out here for your behaviour Marcan. It's good to acknowledge that and take the feedback on board.

1
0
2

@shenki @mripard @vbabka 🤷‍♂️

Sorry, I don't have any patience left for the kernel contribution process and the attitude of certain maintainers and how that drags the project down. Just a few minutes ago I ran into a kernel build issue for which a fix was submitted in January and, in traditional kernel fashion, was never merged.

I don't do this with other projects because other projects have functional communities and contribution models and a much less toxic atmosphere. The only way I have left to vent steam about this mess is on Fedi. If people are going to hold *me* to a higher standard while nothing gets done I'm just going to stop contributing to the kernel entirely.

Edit: updated the top post with an explanation of exactly why that code is bad.

1
0
1

@marcan @shenki @mripard @vbabka you should learn to model the behaviour you'd like to see in others, not switch your behaviour between toxic and non-toxic depending on the project. If I read this sort of stuff and was a highlevel kernel maintainer who didn't know any better, I'd think you were someone I'd prefer not to deal with.

1
2
3

@airlied @shenki @mripard @vbabka Because I said that code "smelled"? I know it's not a shining compliment but come on... this happens all the time, there's bad code in the kernel and we run into it and fix it.

What annoys me is there is a contingent (both inside and outside the kernel) that believes that the gatekeeping leads to higher code quality, which is clearly nonsense.

2
0
0

@marcan @vbabka So it's not about code anymore? We can have that conversation though. Some maintainers are indeed impolite, and I believe it hurts Linux and its community as well. From experience, it's not the majority, so saying that we all deserve to be treated impolitely is a bit of a stretch. And I agree with you that the quality and contribution process can be improved. But how can you possibly expect to convince someone when you open with "what you did during most of your career is trash"?

1
0
2

@mripard @vbabka It's about code, but I find it seriously ridiculous that the kernel community would call out someone for saying there is bad code in the kernel instead of dealing with all its much worse people/process issues first. I'm not personally insulting any of the IOMMU maintainers here, and I wouldn't feel personally insulted if you point out bad code in one of my projects.

I really don't know how people are reading "there is smelly code in the IOMMU subsystem" as "what you did during most of your career is trash", especially kernel developers. Come on. These are the kinds of comments you abstain from when dealing with a new contributor's code who would be discouraged by such feedback. If you've been doing this for years and you're offended when someone points out bugs, you're doing it wrong.

What I'm trying to do with these comments is 1) Make it clear that the kernel isn't a panacea of perfect code, 2) shut up the people (and they do exist and have used this argument against me) who say the kernel process is great and keeps code quality up, 3) Give our users some insight into just how much of the work we do is fixing common code vs. writing our own drivers/platform code.

If the #2 people didn't exist, then yeah, maybe I'd have a slightly different tone about it all, but if you want that you need to fix those people first. And it wouldn't change the fact that pointing out bad code in general is not a capital offense here, and shouldn't be.

0
0
1

@marcan @airlied @shenki @mripard @vbabka critiquing the process and gatekeeping is absolutely on point, and I've done plenty of that

but I agree that shredding the code isn't much better than shredding people directly

like there's plenty of damp corners in drm, but a) they're generally known b) there's generally solid pragmatic reasons, most often the fact we don't have limitless amounts of contributors and infinite time, and so "dp mst w/ races&lifetime oopsies" beats "no code, black screen"

1
0
2

@marcan @airlied @shenki @mripard @vbabka and it took years to get the worst bugs out of the dp mst stack, and we still have plenty of work left to do ...

1
0
1

@marcan @airlied @shenki @mripard @vbabka The fact that this kind of post is repeated over and over again doesn't help tbh.

1
0
2

@sima @airlied @shenki @mripard @vbabka Sure, and that's why I pointed out the bad code and didn't say anything about the people. If the bad code is known, where's the offense?

1
0
0

@emersion @airlied @shenki @mripard @vbabka Because we keep running into these things over and over again...

I've had very positive experiences from these posts, e.g. a conversation with the relevant maintainer leading to both a quick fix on my end and a proper one on his shortly thereafter. But if you think I should stop, fine 🤷‍♂️

2
0
0

@marcan @airlied @shenki @mripard @vbabka I don't think it was your intention, but given your general stance your other tweet right at the same time can easily be read as "beat up the people":

"... what's left is beating into submission the relevant Linux kernel core subsystems ..."

https://social.treehouse.systems/@marcan/111425916438403801

together with the IOMMU rant it does not read great, because "subsystem" here could as easily mean the code as the maintainers controlling patch merging

the IOMMU rant alone is entirely ok

1
0
1

@marcan @airlied @shenki @mripard @vbabka or in other words

you're a public figure, some people are guaranteed to misread everything you type and say maximally

2
0
0

@sima @airlied @shenki @mripard @vbabka Sigh.

FWIW, the other toot was referring to the USB/TypeC/Thunderbolt mess. Which is indeed a mess. Nothing to do with IOMMUs, the IOMMU thing was just a drive by "here we go again, random bug in a random place breaking something unrelated" situation.

And I certainly didn't intend that to read as "beat the maintainers into submission". If anything I expect fixing that mess to be less of a fight on the personal level, it's just a LOT to fix.

1
0
0

@marcan @airlied @shenki @mripard @vbabka Some people will show a lot of good faith and will react this way when their work is criticized. Some people not.

0
0
1

@marcan @airlied @shenki @mripard @vbabka 🤷‍♀️

in public discourse it doesn't matter what you think, it doesn't matter what you thought you've said, it doesn't even matter what you actually said

it also doesn't matter what people actually understood

all that matters is what people think you've thought

and in this case here I can easily see how some people read this all as "yet another kernel maintainer I need to go and beat into shape", and that's not a good look

2
3
1

@marcan @airlied @shenki @mripard @vbabka my rants don't look like it, because it would take away some of the magic

but a lot of them have gone through days, weeks, sometimes months of private peer review and critique to make sure the risk for misunderstandings is minimal. it's effing hard work

I still fuck up on the regular

1
0
1

@sima @airlied @shenki @mripard @vbabka I don't have the spoons left for that kind of communication around kernel stuff. Guess I'll just shut up then 🤷‍♂️

1
0
0

@sima @marcan @airlied @shenki @mripard @vbabka
Well, let me tell you that you shouldn't assume what I (other ppl) have thought. Bc in no way I am thinking he is going against someone, but rather against a portion of code.

It sounds to me that you guys are the ones taking it more personal, whixh I assume is givenby how close to the project you are. But if something is "smelly", it is and that's a fact.

We are now the loser that won't get these informative threads. So thank you...

1
0
0

@sima @marcan @airlied @shenki @mripard @vbabka

Highlighting that comment by sima as quite important:
"some people are guaranteed to misread everything you type and say maximally."

As a bystander, and having joined this instance basically still just days ago, I can say Marcan that quite a high percent of your posts that I've read in the last few days have been attacks at some journalist's negligence or maliciousness, or attacks at Apple's negligence, or attacks at some other coder's negligence. I'm sure you might be technically correct on all of that, but that's besides the point. The bulk of your posts in just a few days having the heft of mostly attacks and rants is the point. Rants don't really improve things, just communicate an angry/frustrated stance, each rant is in essence a packet of negativity. And as the comment above indicates, everything you say ought to be misread maximally, so I myself might be magnifying everything, but that even contributes to the point. I am just one of your many followers.

0
0
1

@hector_sab @marcan @airlied @shenki @mripard @vbabka I forgot to add the qualifier "some" in this toot, earlier ones had it already

it's fixed now

thanks for pointing this out

also somewhat interesting for you to assume I took this personal given that I have a years long track record of massively criticizing the linux kernel community, while still being a major contributor myself. it's absolutely possible to do both, just really hard to do this for years without burning out

0
0
1

@marcan @sima @airlied @shenki @mripard @vbabka the point isn't to get you to shut up about topics like that, just to simply tone it down a little.

We aren't machines and some people will feel personally attacked by harsh comments like that. Is it valid to point out upstream process fails? Sure, it is.

But we are humans, and most take pride in the work they are doing and needlessly strong language can be perceived as hostile.

Your technical explanations on the bugs are great and informative.

0
3
3

@jambya @sima @marcan @airlied @shenki @mripard @vbabka
Yes those emotions are valid and necessary. Yet a sustained flow of rants with no concrete plan or proposal/roadmap to improve anything, again, is not bound to improve anything. On the contrary, it ends up being just a stream of negativity packets thrown at whoever is listening out there. Something that can easily start to bring to mind the word "toxic."

0
0
1

@marcan @emersion @airlied @shenki @mripard @vbabka I just want to say that language can be pretty difficult when dealing on a global scale with different cultural differences. What marcan said seemed perfectly fine to me. Obviously it can sound different to others with different experiences.

0
0
0

@marcan @cpy Actually, we've recently started hitting weird IOMMU bugs on x86_64 on AMD systems, especially when using an AMD GPU. It dogged the Linux 6.5 cycle in Fedora.

They are getting fixed as you said, but it looks like we're in for a wild ride for a few kernel releases...

0
0
0

Neal Gompa (ニール・ゴンパ) fedora

@Tionisla @emersion @marcan @airlied @shenki @mripard @vbabka Let me ask this question: As a user who often encounters Linux kernel related problems, what am I supposed to do with them? For most projects, there's somewhere to report them that would be handled. The Linux kernel lacks this. I could report it to my distribution's bug tracker, and this *mostly* works in Fedora. But other distributions can do literally nothing with them. So now what?

1
0
1

@Tionisla @emersion @marcan @airlied @shenki @mripard @vbabka As a "somewhat" contributor to Linux, this problem makes everything worse. At the core of it, it is literally impossible to consistently and coherently identify flaws, report them, and track the progress of them being fixed. Every other project I work on has some coherent way of doing this.

It would not surprise me if a large part of the frustration among kernel contributors is because of this. It massively increases the busywork!

1
0
1

@Tionisla @emersion @marcan @airlied @shenki @mripard @vbabka The lack of "culture of error" is a problem too. We are humans and make mistakes, but a lot of people keep trying to make contributors work as if they aren't. It's practically begging for burnout. I've seen more give in super-critical commercial development. Because `git revert` is a thing that we can use in the end. Thankfully, this is not a universal problem in the kernel, but it's enough of one.

1
0
0

@Tionisla @emersion @marcan @airlied @shenki @mripard @vbabka And at the end of the day, I posit that one of the reasons this is tolerated is that the majority of the Linux kernel maintainers have been around since the early days, so the Overton window is shifted quite a bit from what everyone else has.

So what happens when they aren't around for one reason or another? I don't know.

0
0
0

@marcan @vbabka
I'm not defending the gatekeeping approach. In fact, I remember a video with Kroah Hartman saying something like "hey you send four or five patches to the kernel, companies are looking, you are almost guaranteed to land a job." I immediately thought well, that must be very true, but good luck with that. When people deep in the know complain about so much trouble getting even one single patch in, and important patches at that... To get four or five of those approved?

And yet, a plain call for destruction sounds like the voice of pure entropy. Again, just plain negativity. Let's destroy X, it's bad. Ok let's assume it's destroyed and it's no more, but then what replaces X?

Maybe someone or you have already written a measured article or blog post somewhere, pointing out the problems in that process, analogies of why that is counterproductive or doesn't work well here or there, what could be done instead? If you have written such a text, ignore this post. But if you haven't, why not? I mean not just another hostile attack, not just another rant. A measured analysis of pros and cons, plus workable or possible alternatives, maybe with successful active examples backing them up?

Notice the difference. Such a measured write up (e.g. not just a fuming voice of entropy calling for unconditional destruction) would not count as just another venting off of frustration about something that you view as clearly wrong. It could be a plan for action. It could trigger discussions, even possible small changes.

If you think any such measured write up would have extremely limited to null impact, well, almost certainly just ranting will have a much worse prospect of making any difference.

1
0
1

@raulinbonn @vbabka ... Okay, now you're just proving @sima's point that people will interpret my words in whatever random way they feel like, even if I didn't say that.

Read my post again. I said a certain *belief* is a lie that needs to be destroyed. Lies do not need to be replaced with anything in exchange, they just need to go. This isn't about the *process*. Yes, the process needs to go too and I've talked at length about better available options.

0
0
0

@raulinbonn @vbabka @sima This isn't a tit for tat thing. You don't need to propose some kind of opposite alternative to call out toxic beliefs. Yes, we can talk at length about the problems with the kernel and how to improve it, but an elaborated discussion is not a prerequisite to be able to call out certain things as bad.

0
0
0