Conversation
If the RCU implementations that you are working with are to complex, reliable, and boring, this blog post is for you! https://people.kernel.org/paulmck/stupid-rcu-tricks-corner-case-rcu-implementations
2
4
13

@paulmckrcu
Azul has the C4 GC which (I think) uses virtual memory mapping to implement the GC read barrier.

doesn't do compacting reclamation so memory fragmentation would kill you. But if you had a data structure what didn't do that, like a Michael-Scott queue, you could use mmap memory protection to trap any stale references, trap the segfault to restart a restartable RCU locked region.

2
0
0
@jwseigh I confess to missing the connection, but I do agree that if you wanted to compact an RCU data structure, you would need that data structure's implementation to cooperate explicitly. And then wait a grace period (as always) before freeing the old uncompacted data items.

Or am I missing something subtle here?
2
0
0

@paulmckrcu
The data structures don't need to be self compacting. Just all the memory objects have to be ephemeral, e.g. FIFO queue w/ linked nodes.
Malloc would be a get and increment form a block of memory.
free would be noop.
Restartable RCU w/ RSEQ might be a better choice than mprotect.

1
0
0
@jwseigh But then aren't they automatically self-compacting? ;-)
0
0
0

@paulmckrcu i worked at a place that ran time-based reclamation in userspace on ~1000 machines, for a while (: it's not stupid when it works!

2
0
0

@pkhuong @paulmckrcu that much reclamation latency surely has to translate into lots of memory overhead?

1
0
0

@tobinbaker @paulmckrcu depends on the update rate (and the ability to defer/batch/merge updates).

0
0
0
@pkhuong Understood, and this is why I said "Choose wisely" rather than something like "Never ever do it this way". In your case, I am glad that you guys found something that worked for you.
1
0
1

@paulmckrcu (we eventually switched to a more reasonable approach, but it did keep us afloat for a while)

1
0
0
@pkhuong Unless you object, I will add this to the blog. Something like "Paul Khuong reports that this technique was used in production for some time, until a more principled and sufficiently performant technique was put in its place."
0
0
1

@paulmckrcu
Depends on the data structure. For a lock-free queue, no RCU would be needed. You could do it w/ 3 interlocked ops for enqueue and 1 for dequeue. Not as good as a ring buffer though which only needs 2 interlocked ops for enqueue.

The memory management can be though of like arena memory but with a sliding window containing the live objects.

1
0
0
@jwseigh I completely agree with your "depends on the data structure"!
0
0
0