@lkundrakexpert? me?
@vbabka @ljs @sj are the experts.
I am just a dumb/curious undergraduate without Ph.D nor B.S. (yet) XD
The main benefit of NUMA architecture is to distribute memory bus traffics to several memory buses instead of a single global bus, because the global bus can be bottleneck as the number of CPUs and memory capacity grows.
A set of CPUs and memory near to those CPUs is called a NUMA node. If a CPU wants to access memory not in the local node, it reads data from a remote node via interconnect (instead of the local, faster bus)
Because local (to cpu) and remote NUMA node has different access latency and bandwidth, OS tries to utilize local node's memory first (ofc that depends on NUMA memory policy of the task/VMA)
But a laptop is too cheap and small system for a single bus to be a bottleneck, so I don't get why the hardware designer decided to adopt NUMA architecture.
And it's really strange that different ranges of physical memory from a single DIMM chip belongs to different NUMA nodes. Do they really have different performance characteristics?