Every (serious) debugger has a memory cache. It's fun to ponder what the theoretical justification for caching is here. The debugger effectively becomes a cache agent (like a core or cache-coherent peripheral) but without the benefit of bus snooping or a directory. Because most debuggers use a stop-the-world model as their baseline, the job is made simpler but no coherence metadata means you typically have to be very conservative with flushing your cache if you want to maintain coherence.
Not usually a big deal in practice--most debuggers just flush their cache every time they resume the target. The cache is used as a way to batch and amortize the cost of target interactions and to isolate individual debugger functions from having to worry about the piecemeal efficiency of every target memory access. It's similar to caches in other kinds of systems but a debugger can end up doing a lot of accesses between target resumes, so full cache invalidation is usually acceptable.
Anyway, I was thinking about this in the context of whether a debugger could use something like GetWriteWatch() on Windows and https://lwn.net/Articles/940704/ on Linux so you only have to do page-level invalidation for its target memory cache. :)
@pervognsen Interesting... That makes sense, kind of like file i/o buffers to batch up reads and writes. Though I'm guessing most debuggers cache more memory-related data outside the memory cache (e.g. loaded binaries, their addresses and their symbols?)
So I could imagine trying to cache much more non-writeable memory and intercept syscalls that unmap it or make it writeable.
I'm guessing the relative cost of a write-watch would outweigh the cache benefit? But idk, I'd have to measure.
@dougall Yeah, it's not solving a real problem I have either. It's more of a what-if. It might also address some actually real issues with non-stop-the-world debugging where you don't just have the coherence issue with respect to the cache as a whole but you might want to be able to take coherent snapshots of some part of the heap asynchronously (versus free-running threads) or at least know when your attempt at a coherent snapshot was invalidated.
@dougall Although depending on how intrusive the overhead of these kinds of page table and TLB invalidations are, at some point you should really just stop all the threads for a moment if you want or need a coherent snapshot of some part of the heap, because it could end up being a lot cheaper even in terms of net latency impact to the threads.
@dougall Regarding loaded binaries and such, yeah, that's maintained outside of the memory cache so the coherence mechanism isn't the same. I think most people just use a notify or poll approach to look for loads and unloads (e.g. on Windows you get LOAD_DLL_DEBUG_EVENT and UNLOAD_DLL_DEBUG_EVENT) and on Linux I think you'd have to scan /proc/pid/maps (or similar) every sync event to see if anything changed but I'm not 100% on what people do on Linux for that.
@dougall And when you detect those events (by notification or inference) you can go and sync up the debugger's view of any metadata you care about relating to those newly loaded/unloaded binaries. So yeah the synchronization wouldn't be handled the same way as for memory but by a completely independent mechanism. Ideally something notification based like on Windows but sometimes you just have to scan metadata and manually check for changes.
@pervognsen Yeah, I feel like you could break the "/proc/pid/maps" method if you tried, replacing the code/symbols without changing the mappings, but unloading binaries is one of the shadier and more broken things anyway, so I don't think it's likely to come up in practice.
@dougall Aside from those specifics, general stuff like this is also why stop-the-world is infinitely easier from a soundness perspective than free-running. It's not just the memory coherence, there's all kinds of process state and when everything is in flight you're extremely limited in what you can assume about the process if you care at all about soundness.
@dougall Obviously intercepting syscalls from free-running threads is one way of at least trying to stay on top of things, but if you try to interfere too much along those lines it can also start to negate the point of free-running support in the first place. Similarly, you could even imagine a pseudo-free-running mode kind of like what rr does, where you're only allowing one thread to run at a time and the debugger/rr is acting as a user-mode scheduler. So everyone gets to progress.
@osandov @dougall On further thought, I suppose it would have to be different on Unix since the kernel isn't involved in dynamic loading in the traditional Unix approach. Even on Windows you can do manual, hand-rolled DLL loading and that will evade the usual debugger notification mechanism as well and I don't think there is a way to manually notify the debugger in that scenario even if you wanted to.
@osandov @dougall On the off chance anyone is interested (I was), here's where it happens in NT. ReactOS is an independent reverse engineered implementation but it follows the internal function call structure 1:1 so it's usually a reliable guide to the basic flow. https://doxygen.reactos.org/dd/d83/ntdllp_8h.html#aa72c37f2e665e3bd248399f76b5588b3