Conversation

Lorenzo Stoakes

One of the ways you can get tripped up in kernel code is that sometimes something really major happens and it isn't in any way highlighted.

For instance - when reading from disk into a folio (whether file-backed or from swap), the folio is locked and will become unlocked once it is 'uptodate' (i.e. has the PG_uptodate flag set).

That renders all instances of trying to lock a folio a barrier that either waits for the read to happen or, as if often the case, incorporates some ability to tell the caller 'ok I need to wait on I/O try again later' as userland experiences with async I/O.

A good example of this is do_swap_page() [0], where we kick off either a synchronous or async read from disk (see block starting at line 3766), but later (line 3878) indicate SIGBUS if the page isn't uptodate.

The key here? Line 3826:

locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);

We try to lock the page, if I/O is done already we can, if it was synchronous we can, otherwise we either return VM_FAULT_RETRY to indicate the I/O is needed or we wait to acquire the lock.

After this line, we can safely assume the swap has been read from.

I kind of feel that a comment would be helpful here :>) [checking tip I see there isn't one, might add one myself], at the same time it's sort of assumed you'd know this was the case...

While I'm writing the book primarily to learn the mm in more depth, I hope pointing out stuff like this in it will help others over these kind of stumbling blocks.

[0]: https://elixir.bootlin.com/linux/v6.0/source/mm/memory.c#L3718
0
0
5