Posts
469
Following
90
Followers
94
n00b Kernel Hacker
- Intern @ NVIDIA Korea (Security System Software) (2024.06 ~)
- Ex-Intern @ Panmneisa (CXL emulation stuff) (~2023.12)
- Undergraduate majoring CSE (estimated graduation: Feb. 2025)
- Working as reviewer at Linux Slab subsystem
- Born in August 6, 2000

Opinions are my own.

My interests are:
Memory Management,
Computer Architecture,
Circuit Design,
Virtualization
first time using GCOV, it's pretty nice.
0
0
1
@ljs @vbabka I truly support both of you
1
0
2
Edited 1 year ago
now I'm slowly getting how CXL manages cache coherence of device memory (HDM-{H,D,DB}). my head aches.

BTW this is a good material to grasp some important concepts of of CXL.

https://arxiv.org/abs/2306.11227

Even if it's not the final version yet.
0
0
2
Edited 1 year ago
progress of this week:

#kerneltesting side project

Now my testing system can submit and run LAVA [1] test jobs automatically after jenkins [2] build!

[1] https://lava.kerneltesting.org/scheduler/alljobs
[2] https://jenkins.kerneltesting.org/job/slab/

currently it runs those:
- a subset of mm LTP testcases
- hackbench for a very short time
0
1
6
@chrisg
No I didn't, but looks so interesting, thanks!
0
0
1

@hyeyoo Have you watched Ben Eater's videos on YT, where he builds a CPU on breadboards?

If you follow his videos from the start, a lot of concepts that you've heard about but don't currently get will click.

At least they did for me.

1
1
1
Edited 1 year ago
@ptesarik

Then now I wonder how the error rate is well under control then....

And as a person who doesn't know quantum mechanics - have a question:
Even are systems with ECC memory safe enough?
Safe enough to be used for electronic money transfer, satellites, and nuclear weapons?
1
0
0
and I always confuse transaction layer with transport layer in the OSI, TCP/IP layers
0
0
0
Edited 1 year ago
```
As a bit of history, in the early days of PCI the spec writers anticipated cases
where PCI would actually replace the processor bus.
```
from "PCI Express Technology 3.0", MindShare

And yeah, that seems to be revived in 2019 by Intel, in the name of IAL (Intel Accelerator Link), which was renamed to CXL (Compute Express Link) later. The main reason why Intel donated IAL was to avoid making the industry fragmented. Later CXL consortium was founded and over 250 companies now contribute to the specification. Including my company! (who hired me as an intern)
0
1
3
Edited 1 year ago
it is interesting to learn how PCIe endpoints and switches implements their own protocols (similar to network protocols) for communication (PCIe transport/data link/physical layers)

but since I know very little about electronics some topics are hard to understand, like why it's difficult to increase speed in parallel bus architecture or why errors occur during transmission, or why DC components should be avoided.
3
0
0
Edited 1 year ago
@ljs @bagder
it was the first linux kernel book since I started (not) learning it
0
0
2
@Aissen @lkundrak @ljs @sj @vbabka

It's a Lenovo ThinkBook 15 G4 ABA,
AMD Ryzen 5 5625U with Radeon Graphics.

hmm 'bios vendor' is a bit unclear but looks like written by Lenovo itself?

# dmidecode -t memory
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.

Handle 0x0022, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 64 GB
Error Information Handle: 0x0025
Number Of Devices: 2

Handle 0x0023, DMI type 17, 92 bytes
Memory Device
Array Handle: 0x0022
Error Information Handle: 0x0026
Total Width: 64 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: SODIMM
Set: None
Locator: DIMM 0
Bank Locator: P0 CHANNEL A
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 3200 MT/s
Manufacturer: Unknown
Serial Number: 0CCD0E1E
Asset Tag: Not Specified
Part Number: KD4BGSA80-32N220A
Rank: 2
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Bank 9, Hex 0x98
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None

Handle 0x0024, DMI type 17, 92 bytes
Memory Device
Array Handle: 0x0022
Error Information Handle: 0x0027
Total Width: 64 bits
Data Width: 64 bits
Size: 8 GB
Form Factor: Row Of Chips
Set: None
Locator: DIMM 0
Bank Locator: P0 CHANNEL B
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 3200 MT/s
Manufacturer: Hynix
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: HMAA1GS6CJR6N-XN
Rank: 1
Configured Memory Speed: 3200 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Bank 1, Hex 0xAD
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 8 GB
Cache Size: None
Logical Size: None
2
0
0
@ljs @lkundrak @sj @vbabka

node 1 is cpuless, and
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 0 size: 11937 MB
node 0 free: 6770 MB
node 1 cpus:
node 1 size: 23954 MB
node 1 free: 23854 MB
node distances:
node 0 1
0: 10 20
1: 20 10

I was like "wtf did I turn on fake numa?" but no.

$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt3)/vmlinuz-6.6.0-rc4+ root=UUID=96f9e501-caa5-4c39-bc11-5d104517f08d ro rootflags=subvol=root loglevel=8
2
0
0
@ljs @lkundrak @sj @vbabka

no, I have only a 8 GB DIMM and a 32GB DIMM, and node 0 has 12GB of memory,
node 1 has 23GB of memory lol
1
0
0
@lkundrak
expert? me?
@vbabka @ljs @sj are the experts.
I am just a dumb/curious undergraduate without Ph.D nor B.S. (yet) XD

The main benefit of NUMA architecture is to distribute memory bus traffics to several memory buses instead of a single global bus, because the global bus can be bottleneck as the number of CPUs and memory capacity grows.

A set of CPUs and memory near to those CPUs is called a NUMA node. If a CPU wants to access memory not in the local node, it reads data from a remote node via interconnect (instead of the local, faster bus)

Because local (to cpu) and remote NUMA node has different access latency and bandwidth, OS tries to utilize local node's memory first (ofc that depends on NUMA memory policy of the task/VMA)

But a laptop is too cheap and small system for a single bus to be a bottleneck, so I don't get why the hardware designer decided to adopt NUMA architecture.

And it's really strange that different ranges of physical memory from a single DIMM chip belongs to different NUMA nodes. Do they really have different performance characteristics?
1
1
4
@vbabka that would have been much slower 🤣
hmm it makes no sense because it has 8GB and 32GB DIMMs and node 0 has 12GB ❓

Maybe the board designer knows why
0
0
1
until yesterday I didn't know that my laptop has 2 NUMA nodes, but why?
2
0
3
Show older