Conversation

Marcin Juszkiewicz πŸ™ƒ

Why RISC-V is not yet ready? Because it is sluggish...

Let look at build times of the binutils 2.45.1-4.fc43 package in Fedora:

- x86_64: 30m
- i686: 25m
- aarch64: 36m
- ppc64le: 47m
- s390x: 37m
- riscv64: 240m

I think that with 120 minutes we can start thinking of saying "it is just slow". Reaching one hour would mark "starts to be ready to be used".

I lack old AArch64 hardware to check how fast/slow it was in 2013.

2
0
0

@hrw aarch64 2015/2016 built time for GCC (at the time) was an hour on a 48 core ThunderX1 running with -j48. (yes thunderX1 because that is what I had access to at the time).
GCC is much worse build time than binutils even. Like 3-4x times worse.
I can't remember what was the build time without -j was though.

2
0
0

FULL-FIFO DEVELOPER πŸ‡ΊπŸ‡¦πŸ‡¨πŸ‡Ώ

@hrw bottom line: use 32-bit x86

0
0
0
@pinskia @hrw ThunderX was wider and faster than anything you can actually buy in RISC-V land, at least last I checked (and likely significantly wider and faster than the Fedora build servers, unless they got an upgrade).
1
0
1

@palmer @hrw ThunderX1(note the 1 there) might had more cores. the cores themselves were clocked at 1.8GHz (maybe 1.6GHz); they could not get them higher due to design.
Also the ThunderX1 cores were in-order cores (well had a small window [3 cycle IIRC] for out of order; mostly for cache misses).

2
0
0

@pinskia who cares about single threaded builds when you have multiple cores ;D

I was building llvm during last two days (about 4-5h per build). Full load on 80 cores for most of time.

0
0
0

@pinskia @palmer Cavium ThunderX1 cpu had 48 cores but many systems used 2 cpu sockets.

Several systems outperformed it even in AArch64 space.

0
0
0
@pinskia @hrw ya, I had to go look it up before posting because I wasn't sure how bad it was.

RISC-V hardware is in a really bad state, it's mostly sub-1GHz in-order cores. There's a few things out there clocked a bit higher and some OOO cores, but they don't tend to be all that good.

Plus there tends to be some pretty horrific memory system performance going on, as these things aren't really production systems...
1
0
4
@pinskia @hrw and just bouncing around the numbers here, to make sure I didn't screw anything up (they're all marketing numbers, though, so not sure how much I trust them):

* Best benchmark I can find the ThunderX is c-ray, where it seems to roughly match a Xeon D-1587.
* There's 16 cores in the D-1587, and 48 cores in the tested ThunderX. So that means we're talking about 1/3 of the per-core performance (assuming c-ray scales well, I'm kind of assuming that given it's the marketing number for an early many-core CPU) for a ThunderX vs a Xeon D-1587.
* I can't find SPECInt for a Xeon D-1587. A D-1527 has a SPECInt Rate 2k6 of 165, so ~40/core (at a 2.1GHz base, so a little less than 20SPECInt/GHz). That roughly seems to match with the single-core results for these Broadwell designs from other points on the SPEC lists.
* SiFive claims 8.6 SPECInt/GHz the P550, so 12 at 1.4 GHz. That's also about 1/3 of those Xeon cores in terms of single-thread performance.

So we're basically talking the same per-core performance level between the ThunderX1 and the SiFive P550, and the SiFive designs have only 4 cores compared to 48 (IIUC there's also a dual-die SiFive configuration that's possible, but I'm not sure if they ever shipped). I don't know of any faster RISC-V cores that exist in publicly-availiable silicon, I'd bet there's some workloads where the C920 is faster but the available chips have some crazy memory system stuff going on so I'm not sure how that'd go.

So that means we're talking single-core performance levels around a 2016 Arm server, if you can even call the ThunderX a server (IMO it's more of a network accelerator than a proper server).

To get back to single-core performance levels this low in x86 land you're talking about SPECInt scores something in the realm of the best Prescott or K8 based chips, but not as good as Intel's mobile-derived stuff from after that. I have no idea if SPEC scores from back in 2006 actually mean anything when compared to today, though...

Sources:

I got the ThunderX numbers for a Serve The Home post from 2016, which IIUC is before the X2 launch so it must be an X1 (though they're not specific):
https://www.servethehome.com/exclusive-first-cavium-thunderx-dual-48-core-96-core-total-arm-benchmarks/

The SiFive numbers are just from their marketing material, I don't usually trust that but I think it's good enough for this sort of thing. Here's a press release, but there's a lot of these online: https://www.sifive.com/press/sifive-performance-p550-core-sets-new-standard-as-highest . They're not quoted on the actual board page: https://www.sifive.com/boards/hifive-premier-p550 .

All the SPEC numbers came from the official list, which usually I don't really put much meaning behind: https://www.spec.org/cpu2006/results/cpu2006/
2
0
2

@palmer

It is easy to differentiate ThunderX from ThunderX2.

First one has 48 cores. Often used in two sockets setup.

Second one has 28 or 32 cores and SMT4 can be used to have 4 threads per core. And most systems had two cpu sockets.

@pinskia

1
0
0

@palmer @pinskia

The problem with RISC-V hardware is low core count and small memory amount.

4-8 slow cores and 8/16/32 GB of ram is typical setup. Fedora RISC-V port disabled LTO to cut build times.

I use 80 cores of my Ampere Altra systems to emulate RISC-V during package builds. And there are packages which saturated them all.

High cpu count and multichannel memory is what this architecture needs. In proper, datacentre-ready, servers.

0
0
0
@hrw @pinskia cool, looks like I managed to find the right ThunderX benchmarks then. I still don't really trust the SPEC side of things, but I think the comparisons ended up in about the right place.
0
0
1