social.kernel.org

Conversation

Marcin Juszkiewicz 🙃

hrw@society.oftrolls.com

2 months ago

Why RISC-V is not yet ready? Because it is sluggish...

Let look at build times of the binutils 2.45.1-4.fc43 package in Fedora:

- x86_64: 30m
- i686: 25m
- aarch64: 36m
- ppc64le: 47m
- s390x: 37m
- riscv64: 240m

I think that with 120 minutes we can start thinking of saying "it is just slow". Reaching one hour would mark "starts to be ready to be used".

I lack old AArch64 hardware to check how fast/slow it was in 2013.

Andrea (Drea) Tamar Pinski

pinskia@hachyderm.io

2 months ago

Reply to @hrw@society.oftrolls.com

@hrw aarch64 2015/2016 built time for GCC (at the time) was an hour on a 48 core ThunderX1 running with -j48. (yes thunderX1 because that is what I had access to at the time).
GCC is much worse build time than binutils even. Like 3-4x times worse.
I can't remember what was the build time without -j was though.

NEPŘÁTELSKÉ EMOCE 🇺🇦🇨🇿

lkundrak@metalhead.club

2 months ago

Reply to @hrw@society.oftrolls.com

@hrw bottom line: use 32-bit x86

Palmer Dabbelt

palmer

2 months ago

Reply to @pinskia@hachyderm.io

@pinskia @hrw ThunderX was wider and faster than anything you can actually buy in RISC-V land, at least last I checked (and likely significantly wider and faster than the Fedora build servers, unless they got an upgrade).

Andrea (Drea) Tamar Pinski

pinskia@hachyderm.io

2 months ago

Reply to @palmer

@palmer @hrw ThunderX1(note the 1 there) might had more cores. the cores themselves were clocked at 1.8GHz (maybe 1.6GHz); they could not get them higher due to design.
Also the ThunderX1 cores were in-order cores (well had a small window [3 cycle IIRC] for out of order; mostly for cache misses).

Marcin Juszkiewicz 🙃

hrw@society.oftrolls.com

2 months ago

Reply to @pinskia@hachyderm.io

@pinskia who cares about single threaded builds when you have multiple cores ;D

I was building llvm during last two days (about 4-5h per build). Full load on 80 cores for most of time.

Marcin Juszkiewicz 🙃

hrw@society.oftrolls.com

2 months ago

Reply to @pinskia@hachyderm.io

@pinskia @palmer Cavium ThunderX1 cpu had 48 cores but many systems used 2 cpu sockets.

Several systems outperformed it even in AArch64 space.

Palmer Dabbelt

palmer

2 months ago

Reply to @pinskia@hachyderm.io

@pinskia @hrw ya, I had to go look it up before posting because I wasn't sure how bad it was.

RISC-V hardware is in a really bad state, it's mostly sub-1GHz in-order cores. There's a few things out there clocked a bit higher and some OOO cores, but they don't tend to be all that good.

Plus there tends to be some pretty horrific memory system performance going on, as these things aren't really production systems...

Palmer Dabbelt

palmer

2 months ago

Reply to @palmer

@pinskia @hrw and just bouncing around the numbers here, to make sure I didn't screw anything up (they're all marketing numbers, though, so not sure how much I trust them):

* Best benchmark I can find the ThunderX is c-ray, where it seems to roughly match a Xeon D-1587.
* There's 16 cores in the D-1587, and 48 cores in the tested ThunderX. So that means we're talking about 1/3 of the per-core performance (assuming c-ray scales well, I'm kind of assuming that given it's the marketing number for an early many-core CPU) for a ThunderX vs a Xeon D-1587.
* I can't find SPECInt for a Xeon D-1587. A D-1527 has a SPECInt Rate 2k6 of 165, so ~40/core (at a 2.1GHz base, so a little less than 20SPECInt/GHz). That roughly seems to match with the single-core results for these Broadwell designs from other points on the SPEC lists.
* SiFive claims 8.6 SPECInt/GHz the P550, so 12 at 1.4 GHz. That's also about 1/3 of those Xeon cores in terms of single-thread performance.

So we're basically talking the same per-core performance level between the ThunderX1 and the SiFive P550, and the SiFive designs have only 4 cores compared to 48 (IIUC there's also a dual-die SiFive configuration that's possible, but I'm not sure if they ever shipped). I don't know of any faster RISC-V cores that exist in publicly-availiable silicon, I'd bet there's some workloads where the C920 is faster but the available chips have some crazy memory system stuff going on so I'm not sure how that'd go.

So that means we're talking single-core performance levels around a 2016 Arm server, if you can even call the ThunderX a server (IMO it's more of a network accelerator than a proper server).

To get back to single-core performance levels this low in x86 land you're talking about SPECInt scores something in the realm of the best Prescott or K8 based chips, but not as good as Intel's mobile-derived stuff from after that. I have no idea if SPEC scores from back in 2006 actually mean anything when compared to today, though...

Sources:

I got the ThunderX numbers for a Serve The Home post from 2016, which IIUC is before the X2 launch so it must be an X1 (though they're not specific):
https://www.servethehome.com/exclusive-first-cavium-thunderx-dual-48-core-96-core-total-arm-benchmarks/

The SiFive numbers are just from their marketing material, I don't usually trust that but I think it's good enough for this sort of thing. Here's a press release, but there's a lot of these online: https://www.sifive.com/press/sifive-performance-p550-core-sets-new-standard-as-highest . They're not quoted on the actual board page: https://www.sifive.com/boards/hifive-premier-p550 .

All the SPEC numbers came from the official list, which usually I don't really put much meaning behind: https://www.spec.org/cpu2006/results/cpu2006/

Marcin Juszkiewicz 🙃

hrw@society.oftrolls.com

2 months ago

Reply to @palmer

@palmer

It is easy to differentiate ThunderX from ThunderX2.

First one has 48 cores. Often used in two sockets setup.

Second one has 28 or 32 cores and SMT4 can be used to have 4 threads per core. And most systems had two cpu sockets.

@pinskia

Marcin Juszkiewicz 🙃

hrw@society.oftrolls.com

2 months ago

Reply to @palmer

@palmer @pinskia

The problem with RISC-V hardware is low core count and small memory amount.

4-8 slow cores and 8/16/32 GB of ram is typical setup. Fedora RISC-V port disabled LTO to cut build times.

I use 80 cores of my Ampere Altra systems to emulate RISC-V during package builds. And there are packages which saturated them all.

High cpu count and multichannel memory is what this architecture needs. In proper, datacentre-ready, servers.

Palmer Dabbelt

palmer

2 months ago

Reply to @hrw@society.oftrolls.com

@hrw @pinskia cool, looks like I managed to find the right ThunderX benchmarks then. I still don't really trust the SPEC side of things, but I think the comparisons ended up in about the right place.

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org