Ok, thinking about secure hibernation again. We can't rely on LUKS because we can't trust the initramfs, and someone could simply drop a LUKS swap partition on top of the existing one and resume from that. So we need this to be kernel mediated (ChromeOS doesn't have this restriction so has an easier job)
My initial implementation generated a key on every hibernation, which meant we needed to control access to a PCR to prevent userland being able to achieve the same state, which annoyed people who wanted to use that PCR. That's fair. But we can't use an nvram PCR because those aren't stored in creationdata, so on resume we wouldn't be able to prove it was created by the kernel.
My proposal is this: have the kernel generate and store a TPM key pair in a UEFI variable, and reuse it. But we still need the key to be generated in a way userland can't mimic. So we need to do this provably before userland runs. And that's hard, because the kernel doesn't cap PCRs, so someone could boot an old kernel and reproduce the same PCR state and generate their own key that the kernel would then trust.
But! Calling ExitBootServices() happens in the context of the kernel and is supposed to extend PCR 5. So, we measure a statement that the kernel supports this into PCR 5 before calling EBS and now we have a PCR state that can't be faked with existing kernels (userland can't perform any extensions until after EBS so the measurement will be different)
Kernel boots, checks the creationdata in the stored key matches the expected PCR values, re-creates the key if not, and then extends PCR 5 again. The key stays in the kernel keyring and is used to verify the hibernation image before resume or to sign it on suspend
I think that covers things without blocking any TPM resources from userland while retaining security, and should also work on TPM1.2
Also we should probably cap PCRs before executing init
This would all be much easier if our entire boot chain was integrated in some meaningful way but alas we can't even do resume in early kernel because we rely on userland to prompt for the disk password if swap is encrypted (which it should be!)
@mjg59 Could you explain why this isn't a problem on ChromeOS? Do they do things fundamentally different here? Do they validate things in the bootloader?
@sheogorath initramfs is part of the signed payload, which isn't how most mainstream distros work
@mjg59 Could we have the kernel automatically encrypt and decrypt swap using the TPM? Theoretically nothing about the bootloader or firmware or hardware should be able to change between hibernation and resume, so the resume-time PCR values should be predictable at hibernation time, no?
@AdrianVovk hmm that's an interesting question! In theory this could be used to auto encrypt all swap, I think
@AdrianVovk (we might need a new swap header format)
@mjg59 just bonk the mainstream distros on their silly heads until they adopt UKIs
@valpackett eh it's still awkward you still need to ensure there are zero ways to get code exec in the initramfs which is actually a very hard problem
@mjg59 maybe generate tpm hmac key during early kernel init with pcr policy binding it to pcr 9 or so with its current value, then measure it to pcr 9, thus making it impossible to ever get accesss to the key again -- unless you reboot the very same boot path. Then use that key to encrypt hibernation image, and store the exported version of said tpm key as part of it.
Generally i think cutting off access to keys via separator pcr measurements sounds like a *waaaay* better idea than...
@mjg59 ... trying to access control tpm resources in the kernel, because interposers do exist...
I understood that some pcrs were reserved for the OS via localities, though that wasn't actually implemented correctly. Maybe I've misremembered or misunderstood things you previously said.
If there are some that should be reserved for the OS but aren't because of implementation failures, doing something similar to your PCR 17 filter for them would seem reasonable, no?
@seanfurey localities just don't work in most cases (jejb was wrong), so reserving was an option but there's various scenarios that could maybe work around it and it breaks any userland that was using it
@pid_eins that's basically what I'm suggesting, but based on config someone else could generate their own key with the same PCR 9 data which is why I'm suggesting PCR 5
@mjg59 PCR5? that's weird, it's where firmware measures the GPT partition table to of the selected boot disk. That's something that quite likely changes between subsequent boots. At least on systems deploying systemd-repart, we quite likely grow or add partitions on first boot, and in fact any following boot too, if necessary.
Hence, by binding to PCR 5 you break the stuff under the wrong conditions. You want something that breaks when wrong os is booted, which would normally suggest pcr 4+9
@mjg59 but frankly i would stay away from measuring anything into pcr 0-7 after firmware measured its separators in there. in particular if whatever you measure there is not stable, because you break prediciting measurements for use in policies then.
hence, lock against pcr 4+9, and measure into 9, for capping.
@mjg59 but please, before the kernel starts to measure more stuff i'd really appreciate if it would actually expose a log somewhere of what it measures, maybe in CEL format or so, because we can only guess otherwise. systemd-pcrlock really cares about these measurements after all.
@pid_eins PCR 9 is unsuitable because we don't guarantee it's modified before userland starts (and no IMA doesn't work here unless every valid kernel a vendor has signed has IMA enabled, otherwise boot a kernel without IMA and do the key generation from userland). We execute code before PCR 5 is extended by ExitBootServices() so we can do stuff there. And yes, this would be handled in a predictable way. I'll write up a design doc, we can discuss at LPC?
@mjg59 i am suggesting *you* measure something derived off the key you generated into pcr 9. That caps the pcr, noone can fake it later, because a) you already measured something into it, and b) what you are measuring there is derived from a secret nobody knows but you (and the tpm)
@pid_eins they can fake it later by booting an older kernel that doesn't do that
@mjg59 they cannot, because as mentioned you lock against both pcr4 and pcr9. One of the two is going to change if you change kernels.
@pid_eins I'm not certain that's true if using grub
@mjg59 to summarize alg: early in the kernel: either codepath A: ask tpm to generate hmac key, with access policy locked to current pcr4+pcr9. Download key in both plaintext and exported forms from tpm. Keep plaintext version in memory in kernel and use for hibernation image. Store exported version somewhere reasonable (could be: efivar or nvindex, doesnt matter). Or codepath B: load previously exported/saved key from codepath A from efivar/nvindex into the tpm. If it accepts it, ...
@mjg59 ..., great, use it, extract plaintext key, use for hibernation omage later. If it doesnt accept it (because pcrs on this boot changed) eraase from storage, and use codepath A instead.
After codepath A or B completed: measure something hashed from the plaintext key to pcr 9.
@mjg59 well, if grub doesnt measure the kernel it invokes your chain of trust is broken anyway, maybe dont bother? And if it measure it somewhre other than 9, fine, include that pcr in your access policy too.
@pid_eins @mjg59 The problem with grub, is that it measures too much. It measures (into PCR 9 IIRC) everything it reads, including grub.cfg and grubenv, the later changing all the time due to the automatic boot_success variable (which is cleared by grub and set by userspace after a timer or on some conditions).
@pid_eins @mjg59 Not across a hibernation though, right?
Presumably the kernel would freeze userspace, put the key into the TPM, store the encoded blob somewhere, and shut down. Resume happens on next startup
I think it's fine to say that if someone messes with the ptable between hibernation and resume they just won't get a resume. Especially because they'd have to do it from another OS they'd boot in between hibernation and resume
@AdrianVovk @mjg59 the problem is that the key is generated during early boot (because the goal here after all is to enforce it can *only* be generated by the kernel at an early point where userspace isn't there yet). So if you do a first boot all the way through and then hibernate it, then the gpt table shortly before hibernation might look quite different from when the key was generated during early boot. And hence you are fucked. So it's not about what happens between hibernate+resume, but...
@AdrianVovk @mjg59 ... what might happen between early boot and the time you enter hibernation.
@pid_eins @mjg59 I'd be careful doing this kind of thing because it introduces a nasty edge case with sysupdate:
1. We transparently install an update for the user in the background. This drops in a new kernel in the ESP. We wait for the user to reboot to apply the update
2. Next boot: sd-boot picks up the new kernel
3. New kernel fails to unlock the hibernation recovery image because pcr4 changed. So hibernation is aborted
4. User loses their data
@AdrianVovk @mjg59 so, systemd stores hibernation info in an efi var when hibernating, we could use that from sd-boot to pick the right kernel, for starters. Would be quite easy. But of course, if you delete the old kernel entirely, then you are fucked, but frankly that's something we should check for at hibernation time and simply refuse hibernation if kernel is gone.
@pid_eins @mjg59 I think it does matter in this case. Consider what happens if it measures a grubenv with boot_success=0, then you use the computer for a while (so the timer sets boot_success=1 in the grubenv file), then you hibernate; it will fail to recover the hibernation key, because it was generated with one value in PCR9 (for the grubenv with boot_success=0) but now PCR9 has different value (the grubenv file it measures has boot_success=1).
@pid_eins @mjg59 Generating the key and enrolling it into the TPM don't have to be one step
- Before EBS() measure in a "this kernel supports secure hibernation" magic value
- EBS() caps the PCR before userspace can execute. This prevents someone from abusing an older kernel to spoof the aforementioned measurement
- During the first swapon, generate a random key to be used for encrypting swap and hibernation. Or do it at boot. When the key gets created doesn't matter
(To be continued)
- Immediately before hibernation freeze userspace and then wrap the key via the TPM. Make sure you bind to the PCR that includes the magic "support secure hibernation" measurement
- Because measurements were captured immediately before suspend, ideally the state should be up to date for when the system is resumed during next boot
Notably the only thing that needs to be measured during very early boot is entirely static
@AdrianVovk @pid_eins you need to generate the TPM key at a point before userland is running, otherwise they can generate a key with identical creationdata
@AdrianVovk @pid_eins I swap out the key for one I control, write out a hibernation image containing my code, and use the kernel to resume it thus obtaining arbitrary code in ring 0
@mjg59 @pid_eins Why would the kernel allow userspace to swap out the key, or even know that encryption is happening?
I can't see a scenario where userspace needs to worry about the keys the kernel is using here. So instead the kernel can generate ephemeral keys per swap superblock at swapon time, and transparently encrypt the swap so that userspace cannot poke at it
@AdrianVovk @pid_eins assume hostile userland. Nothing stops it from performing the same steps as the kernel in your scenario, which means the kernel will happily resume an image containing arbitrary code.