Conversation

"Why does ACPI exist" in the beforetimes power management on x86 was done by jumping to an opaque BIOS entry point and hoping it would do the right thing. It frequently didn't. Failed to program your graphics card exactly the way the BIOS expected? Hurrah! Data corruption for you. ACPI made the reasonable decision that, well, maybe it should be up to the OS to set state and be able to recover it. But how should the OS deal with state that's fundamentally device specific?

2
5
0

One way to do that would be to have the OS know about the device specific details. Unfortunately that means you can't ship the computer until you modify every single OS you want to support and get new releases out there. This, uh, was not an option the PC industry seriously considered. The alternative is that you ship something that abstracts the details of the specific hardware. This is what ACPI does, and it's also what things like Device Tree do.

2
1
0

The main distinction between Device Tree and ACPI is that Device Tree is purely a description of the hardware that exists, and so still requires the OS to know what's possible - if you add a new type of power controller, for instance, you need to add a driver for that to the OS before you can express that via Device Tree. ACPI decided to include an interpreted language to allow vendors to expose functionality to the OS without the OS needing to know about the underlying hardware.

3
1
0

So, for instance, ACPI allows you to associate a device with function to power down that device. That function may, when executed, trigger a bunch of register accesses to a piece of hardware otherwise not exposed to the OS, and that hardware may then cut the power rail to the device to power it down entirely. And that can be done without the OS having to know anything about the control hardware.

1
1
0

How is this better than just calling into the firmware to do it? Because the fact that ACPI declares that it's going to access these registers means the OS can figure out that it shouldn't, because it might otherwise collide with what the firmware is doing. With APM we had no visibility into that - if the OS tried to touch the hardware at the same time APM did, boom, almost impossible to debug failures

1
1
0

(This is why various hardware monitoring drivers refuse to load by default on Linux - the firmware declares that it's going to touch those registers itself, so Linux decides not to in order to avoid race conditions and potential hardware damage. In many cases the firmware offers a collaborative interface to obtain the same data, and a driver can be written to get that (https://bugzilla.kernel.org/show_bug.cgi?id=204807#c37 discusses this for a specific board))

2
1
0

Unfortunately ACPI doesn't entirely remove opaque firmware from the equation - ACPI methods can still trigger System Management Mode, which is basically a fancy way to say "Your computer stops running your OS, does something else for a while, and you have no idea what". This has all the same issues that APM did, in that if the hardware isn't in exactly the state the firmware expects, bad things can happen.

1
1
0

Historically there were a bunch of ACPI-related issues because the spec didn't define every single possible scenario and also there was no conformance suite (eg, should the interpreter be multi-threaded? Not defined by spec, but influences whether a specific implementation will work or not!). These days overall compatibility is pretty solid and the vast majority of systems work just fine, but we do still have some issues that are largely associated with System Management Mode.

2
1
0

One example is a recent Lenovo one, where the firmware appears to try to poke the NVME drive on resume. There's some indication that this is intended to deal with transparently unlocking self-encrypting drives on resume, but it seems to do so without taking IOMMU configuration into account and so things explode. It's kind of understandable why a vendor would implement something like this, but it's also kind of understandable that doing so without OS cooperation may end badly.

2
1
0

This isn't something that ACPI enabled - in the absence of ACPI firmware vendors would just be doing this unilaterally with even less OS involvement and we'd probably have even more of these issues. Ideally we'd "simply" have hardware that didn't support transitioning back to opaque code, but we don't (ARM has basically the same issue with TrustZone)

1
1
0

By and large ACPI has been a net improvement in Linux compatibility on x86 systems. It certainly didn't remove the "Everything is Windows" mentality that many vendors have, but it meant we largely only needed to ensure that Linux behaved the same way as Windows in a finite number of ways rather than in every single hardware driver, and so the chances that a new machine will work out of the box are much greater than they were in the pre-ACPI period

2
2
0

(The alternative of teaching the kernel about every piece of hardware it should run on? We've seen that in the ARM world. Most code simply never reaches mainline, and most users are stuck running ancient kernels as a result. Imagine every x86 device vendor shipping their own kernel optimised for their hardware, and now imagine how well that works out given the quality of their firmware. Does that really seem better to you?)

2
3
0

@neversphere Honestly version 1 of the spec? It's a reasonable size compared to later versions and covers most of what's still relevant today

1
0
0

@mjg59

@neversphere

Isn't this something SBSA was meant to address?

1
0
0

@jwp @neversphere Yeah ARM+SBSA in theory brings ARM into the same space, but SBSA-conforming devices make up a tiny amount of deployed ARM

1
0
0

@mjg59

@neversphere

I always ponder as to why. Is there a lack of good reference SBSA definitions that OEM/integrators can pluck and attach to their DTB ... Is this a paywall/IP issue? Or is someone guarding the certification process - like why is an SBC/SoC developed in 2023 still not adhering/implementing it?

2
0
0

@jwp @neversphere SBSA has Opinions like what sort of interrupt controller you have and a bunch of mobile stuff just doesn't conform to those opinions (Windows on ARM deals with this by using a vendor-supplied HAL)

1
0
0

@mjg59 every now and then benh would start banging on about adding a bytecode feature to devicetree. Never quite got around to implementing it though

0
0
0

@ncommander tbf SBSA does do that in the server space (and for exactly this reason)

0
0
0

@mjg59 well another difference is that (practically speaking) you have to ship your device tree as part of the OS right?

1
0
0

@mwhudson there's no inherent reason not to have a standard way to read that from the firmware or first-stage bootloader (effectively equivalent)

1
0
0

@mjg59 well sure but in practice the device tree format changes, doesn't it? I guess it may settle down eventually...

0
0
0

@mjg59 thanks. ACPI was always a mystery to me.

But one thing I still don't get. The kernel needs a driver to talk to every device and that driver needs to know how to do everything else. Why is turning the device on and off so uniquely tricky that it would be a problem to do that in the driver too?

1
0
0

@stark That kind of thing ends up being *very* platform dependent. Say you have a GPU - the GPU driver has no idea how the power line to the GPU is controlled, because that's up to how it was wired up in the specific machine, and the control mechanism is likely also hardware-specific (is it controlled via the embedded controller? Is there some other power controller that needs to be spoken to? That sort of thing)

1
0
0
That seems like something that could have been standardized through data rather than code, though. For instance, a standard interface for power supply hardware, with enumerable power lines, and tables that say "this device is attached to this power line".

Having *tables* in firmware seems like a great thing, for everyone except vendors who think it'll destroy their ability to "differentiate" and "value add". Why give vendors a language to drive arbitrary non-standard functionality?
1
0
0

@josh @stark And, well, the fundamental problem is still that you need to identify all possible scenarios people might reasonably want to implement in advance, and it's clear the industry isn't interested in that

0
0
0

@mjg59 genuinely admire your ability to respond calmly to someone who wants to start a riot over their inability to see their cpu temperature

1
0
0

@eevee Therapy has all kinds of useful side-effects

0
0
0
Sure, but that happened well after vendors had the option of ACPI and could just say "this new thing sucks, where do we shove all the proprietary custom stuff" and ignore it.

Not saying it's politically feasible to push standardized interfaces with no custom code, just that the result seems wildly better.

(Then again, at this point I'd just take "you can't get certified for Windows if you use SMI, ever".)
1
0
0

@josh @stark Yeah, but it feels more likely that in the face of "You can't do this in ACPI" we'd just have more "We're doing this in SMM instead", and I don't think that's better. SMM is a fundamental part of the x86 security model (how do you manage authenticated flash access otherwise?) so removing it entirely seems complicated? And honestly I suspect that it would be too big a change for even Microsoft to impose on the industry.

0
0
0

@mjg59 What I don't understand is why all of this functionality is implemented on the CPU at all, rather than on the EC/BMC.

(and then have the EC/BMC expose a standardized, abstracted API to the OS for things like voltage/temperature sensors)

Power management also seems like the kind of thing that makes more sense to have been architected as an out of band feature that software was unaware of except for giving high level directives to the EC.

1
0
0

@azonenberg It being entirely in the EC would be basically equivalent to ACPI except we'd have less insight into what's going on behind the scenes

1
0
0

@mjg59 I'm thinking more about things like SMM (and ME, and even UEFI).

What does SMM do that the EC couldn't do better?

All of the hardware I design these days has a MCU that does management stuff (sometimes two, one really low level one for power/reset sequencing and one that comes up later to do everything else) and then talks to an FPGA that does most of the work of the board.

The FPGA doesn't care about polling sensors or controlling fan speeds or anything else that's needed to make the system work. It just lives off in its own temperature-controlled world and does its thing, and has a SPI interface to the MCU when it needs something.

1
0
0

@azonenberg The EC doesn't have access to the CPU's buses, SMM does.

1
0
0

@mjg59 Which buses in particular? PCIe etc I assume, not SMBus / eSPI which it definitely does?

And that's an issue with how we currently have systems architected. Not a fundamental limitation if this were 30 years ago and we were figuring out how to add things like power management to the i386 platform with ISA buses etc.

I'm wondering why it made sense to go this way in the first place.

2
0
0

@azonenberg Vendors use SMM to do things like directly access PCI devices on resume, which is kind of entirely counter to the entire point of ACPI but still

1
0
0

@mjg59 I'm not asking how it's currently (ab)used :)

It just seems like power management is the kind of thing that makes sense to do in a super low power coprocessor that's running all the time, independent of the CPU and OS (and incapable of touching end user data or main RAM - exclusively control plane), and cares about all of these low level board specific details.

Ideally that coprocessor would also do things like poke the necessary registers in the northbridge to initialize the RAM controller so the OS bootloader could come up out of reset with fully working DRAM like all of my FPGA-based systems do, but that's probably wishful thinking :)

1
0
0

@mjg59 Well OK *ideally* the RAM controller would just be an RTL state machine that knows how to parse the SPD and make the RAM work without any software ever being in the loop.

But it seems that ship sailed years ago.

1
0
0

@mjg59 In general it seems like we have this criscrossing of low level management stuff (reset, power management, etc) and high level stuff (business logic of the hardware) that are living in commingled register spaces on the same bus, but that's an engineering decision.

There's no reason why it couldn't be separated. For example, on most of the MCUs I work on, the registers for reset, clock enable, and power enable to various peripherals are completely independent (totally separate address range) from the registers for that peripheral. They might not even be on the same AMBA segment.

I can totally imagine standardization of all power management over SMBus or eSPI by the EC, for example, with the PCIe side only used by the OS.

As far as I can see it's an an architectural decision (one I'm not a fan of, due to making privilege separation and compartmentalization more difficult), not a fundamental limitation.

2
0
0

@azonenberg In that model, how does the firmware transparently manage self-encrypting drive unlock? I agree that there's a strong argument that it *shouldn't*, but otherwise it's something that you don't get to launch without OS support and that's not how the market works.

1
0
0

@mjg59 I'm also a bit skeptical of the benefits of SEDs vs OS-managed FDE. Always seemed like a gimmick to me (especially since it's implemented in black-box firmware).

Treat the drive as an untrusted remote server that you write bits to and hope the same bits later come back. Assume anyone else can read those bits and encrypt anything you don't want the world to see.

1
0
0

@mjg59 If the drive wants to use AES internally as a whitening function to provide desirable statistical properties to your data for media wear leveling, simultaneous switching output control, forward error correction? Totally fine.

But don't trust the drive to keep your secrets.

1
0
0

@azonenberg Look I fundamentally agree but that's not what we have

0
0
0

@azonenberg @mjg59 Didn't we have PCIe config space versus BAR space for the OS management versus usage? What happened to that?

1
0
0

@dascandy42 @azonenberg There's no way to speak to either without going via the CPU

0
0
0

@jwp @mjg59 @neversphere once you have SBSA compliant hardware you are expected to go UEFI and ACPI rather than DeviceTree.

I work on SBSA Reference Platform in QEMU and we provide EDK2 with set of ACPI tables.

Arm has SystemReady certification program where you have to pass ACS and show that you are able to boot set of operating systems.

1
0
0

@hrw

@mjg59 @neversphere

Systemready cert I assume requires some non trivial resources to get into the club ?

1
0
0

@jwp @mjg59 @neversphere I have never worked on SystemReady certification. If your hardware is right then rest is firmware.

I would say that once you have NetBSD 10, FreeBSD 14, OpenBSD 7.4 and Debian 12 working you can start applying.

When those systems work then WinPE requirement should be easy.

No idea about ESXi Arm as they require registration to download ISO.

(I run those systems as part of my SBSA Reference Platform work).

0
0
0

@mjg59 And if device tree hadn't been GPL preventing even BSD from adopting it, we might not have wound up with ACPI on arm. As I complained about back in the day... https://www.uwsg.indiana.edu/hypermail/linux/kernel/1505.3/00292.html

0
0
0

@mjg59 The funny thing is that many of the ARM devices that support ACPI boot Linux with device trees. I'm not sure quite why, I'm guessing undocumented extensions or something.
I've found device trees fine enough for embedded devices but really wish more stuff supported ACPI if you would even think about using a distro other than buildroot or yocto

1
0
0

@artemist @mjg59 ACPI has a very strong model of what the hardware will look like so if you want to do things that don’t fit well into that model things will often work better if you use a less opinionated spec like DT. This does rely on the OS understanding the hardware so there’s a tradeoff with software compatibility and sometimes that’s a call for the end user.

0
0
0

@azonenberg @mjg59 But don't forget how complex the set of hardware is and how configurable it is for a given system; unlike a mostly fixed SoC. You could have hundreds of PCIe devices, each with their own power management firmware, wakeups from all over; and the OS is asking you to juggle where/when it can use the limited power and cooling.

0
0
0

@ncommander @mjg59 SBSA has some ideas about how systems should look that aren’t a good fit for all markets. There are some other BSAs that target some of those markets, though demand varies.

EFI is much more widely available these days, u-boot implements it which really helps.

0
0
0

@mjg59 Not to mention that many ACPI issues were/are caused by the motherboard-side implementation of ACPI being buggy, incomplete or self-contradictory. One thing that ACPI decidedly isn't: simple.

2
0
0

@klausman @mjg59 and there were still plenty of issues where the ACPI expects the OS to be some flavor of Windows, e.g. "if win95 do X elseif win2k do Y" and does nothing on Linux, so some feature just doesn't work. Typically we'd fix these by dumping the DSDT and rewriting it, but nowadays dynamically loadable DSDTs are deprecated even though those types of problems are just as prevalent as ever.

1
0
0

@hyc @klausman And the answer is just to claim to be Windows, because Windows has an established contract with the firmware in a way that Linux never has

1
0
0

@klausman @mjg59 The real test is whether it makes the system simpler over all. And I'd argue the one-kernel-device model seen on Android phones is complicated in a different way, even if each individual kernel might be simpler.

1
0
0

@jamesh @mjg59 I agree. But then again, embedded devices,.phones and desktop computers and data centre servers all have different parameters and benefit from different approaches.

Just as long as we don't go back to SET BLASTER="A220 I5 D1"

1
0
0

@klausman @mjg59 every smartphone I've owned so far has stopped receiving OS version upgrades before it became unusable.

In contrast, I've got a 10+ year old x86 server in my closet running a recent Linux distro. It just works because no one has to do hardware enablement for that specific system in the new OS release.

0
0
0

@mjg59 @klausman That's the stock answer but it's inadequate. You have to know to claim to be a specific version of Windows, otherwise you still get breakage.

2
0
0

@hyc @klausman You claim to be whatever is the latest version of Windows whose behaviour you've attempted to model

0
0
0

@hyc @mjg59 @klausman web guy here points wildely at user agent strings to show where that path leads ;D

0
0
0

@mjg59

The worst part of ACPI is that they invented at stupid crappy unique language for it.

The world would be a better place if they had just picked an already existing scripting language like Tcl, Lua or even Python...

2
0
0

@bsdphk you want to put one of those in-kernel?

1
0
0
@bsdphk @mjg59 It is pretty hard to see how these would map for task that AML does TBH. I'd get someone proposing eBPF based BIOS/firmware (perhaps), or even WebAssembly would make more sense in this context :-)
1
0
0
@bsdphk @mjg59 WebAssembly because it is sort of stack machine, which sort of fits for tasks that ACPI is good for... could be any Forth alike. But since AML does its job pretty well, and we know it pretty well, I'd hate it to be switched more than I hate AML (which I don't like but I *know* it).
1
0
0

Jarkko Sakkinen

Edited 1 year ago
@bsdphk @mjg59 And stack machines are best for the task because they work for any ISA created by us (humans).
1
0
0

Jarkko Sakkinen

Edited 1 year ago
@bsdphk @mjg59 One innovation in this area that might be cool or even useful, would be to translate AML to eBPF and use unified interpreter for both inside kernel (perhaps). It would be useful for e.g. ACPI debugging when you could try ACPI table fixup with eBPF program... Not sure how feasible would be but if it was feasible I could see applications...
0
0
0

@mjg59 @jwp @neversphere Ultimately, SBSA was too late. If it had been introduced early enough when vendors were not used to the pain, there would have been more drive to adapt to it and adopt it. But by the time it was settled and being promoted, everyone was used to the problems of the current model.

It doesn't help that Arm doesn't think it should be pushing for stuff like this either...

0
0
0