social.kernel.org

Conversation

JF Bastien

Proposing that C++ acknowledge that there are exactly 8 bits in a byte.
I’d love feedback, especially from embedded / DSP folks who think this is a terrible idea.
https://isocpp.org/files/papers/P3477R0.html

Miod Vallat

miodvallat@hostux.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien How are we going to run C++26 code on our PDP-10s, then?

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @miodvallat@hostux.social

@miodvallat upgrade to PDP-11 😁

Miod Vallat

miodvallat@hostux.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien That's actually a downgrade (in word size).

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @miodvallat@hostux.social

@miodvallat it’s one more. It goes to 11.

Paul E. McKenney

paulmckrcu

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien @miodvallat Why mess around? Crank it up one more to PDP-12, and enjoy those lovely 12-bit words!!! And 12-bit address space, while you are at it!

Miod Vallat

miodvallat@hostux.social

9 months ago

Reply to @paulmckrcu

@paulmckrcu @jfbastien That's only 1/3rd of the gorgeous 36-bit word of good ol'PDP-10.

ljs

9 months ago

Reply to @paulmckrcu

@paulmckrcu @jfbastien @miodvallat cc: @lkundrak

TheZoq2

thezoq2@mastodon.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien This sounds like it would disallow my totally serious architecture with a 63 bit byte and 64, 65, 66 and 67 bits for short, int, long and long long

Peter Hartley

TalesFromTheArmchair@hachyderm.io

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien I'd agree that the most recent CHAR_BIT=24 device I worked with, the Sigmatel STMP3500, and its 56000-based relatives, are not relevant to modern C++, nor vice versa. But there is no sane way to program those devices except in C, so the only sentence of your proposal that gives me pause is "Ideally, [the C and C++] committees would be aligned" -- it would not be ideal if C dropped support for such architectures, though I guess they can hardly _retroactively_ drop it from C99, C11 etc.

John Ripley

jripley@mastodon.social

9 months ago

Reply to @TalesFromTheArmchair@hachyderm.io

@TalesFromTheArmchair @jfbastien The trouble with the STMP3500 was it really shouldn’t have existed. There’s a language otherwise used for it, which resembles a pretty neat pseudocode for DSP ops. The original chips had tiny program memory, usually EPROM, and even tinier SRAM. You wouldn’t use C because it wouldn’t be efficient enough. Then SigmaTel came along and stuck 300KB on it. I’m not sure there are other useful examples of architectures like this.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @TalesFromTheArmchair@hachyderm.io

@TalesFromTheArmchair right it wouldn’t be old versions that change! That’s not a thing that would be doable. Only newer language versions

Dr Lee A. Christie

0x1ac@techhub.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien @siracusa Are there embedded processors designed and made today that don’t use 8-bit bytes or do you mean people who for some reason have to keep supporting a chip from the 1970s?

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @0x1ac@techhub.social

@0x1ac @siracusa there are still processors made today with bytes that are not 8 bits.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @thezoq2@mastodon.social

@thezoq2 100%

Scott

scottearle@poorlyrendered.com

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien or all those pesky PDP-8 programmers

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @scottearle@poorlyrendered.com

@scottearle they’re too busy saying “well actually CHAR_BIT is…” to notice the paper. I will free them of this burden and unleash them onto the world.

zwarich

zwarich@hachyderm.io

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien In a way, tying C/C++ together might make this decision worse. A world where the C standard doesn't support a lot of very standard DSP architectures seems unlikely or at least very odd, whereas a world where the same is only true for C++ doesn't actually seem that odd. However, I might just have an incorrect view about the penetration of C++ into those markets.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @zwarich@hachyderm.io

@zwarich I think C is unlikely to follow. @rcs can prove me wrong!

Alona

aDot@social.treehouse.systems

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien but then how will i feel really clever when using i/CHAR_BIT, i%CHAR_BIT in yet another reimplementation of a bit set. years of useless standardeeze trivia down the drain

Dominik Grabiec

dominikg@mastodon.gamedev.place

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien
I'm no longer in embedded but the only situation I can think of is when you have some crazy PIC microcontroller which has 12/14 bit instructions but is an otherwise 8 bit processor and uses a Harvard architecture.

Michael Wyman

mwyman@mastodon.social

9 months ago

Reply to @dominikg@mastodon.gamedev.place

@dominikg @jfbastien I’ve worked on Unisys mainframes with 9-bit bytes.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @mwyman@mastodon.social

@mwyman @dominikg cool story 😎

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @aDot@social.treehouse.systems

@aDot I put exactly this clever personality in the paper ☺️
Forever immortalized.

Frost, glow wolf 🐺

IceWolf@masto.brightfur.net

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien What about weirdass old systems, too? I guess that's similar to embedded.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @IceWolf@masto.brightfur.net

@IceWolf they use C++26?

Cassandrich

dalias@hachyderm.io

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien I dunno if C++ has hosted/freestanding distinction like C does, but if so, a very good compromise would be mandating 8 bit byte for hosted. Then the DSP folks can do their freestanding thing. Full stdlib without 8 bit byte is bs, IMO.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @dalias@hachyderm.io

@dalias oh that’s a great point. Will add to the paper (yes C++ has this distinction).

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien As someone who's been doing C++ embedded development for quite a few years, I can't remember the last time I saw a non 8 bit byte.

Certainly not in the last decade.

Very much in favor of mandating CHAR_BIT == 8.

Frost, glow wolf 🐺

IceWolf@masto.brightfur.net

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien Good point, still though.

It also limits anyone wanting to build new Weird Computer Architectures.

Frost, glow wolf 🐺

IceWolf@masto.brightfur.net

9 months ago

Reply to @IceWolf@masto.brightfur.net

@jfbastien Though, on the other paw, if specifying it gets compiler makers to stop making weird assumptions around it for ✨OPTIMIZATION✨...

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @IceWolf@masto.brightfur.net

@IceWolf compilers don't optimize on this. It's hard-coded pretty strongly in the frontend, so there's zero ambiguity about this value from the compiler's perspective.

Frost, glow wolf 🐺

IceWolf@masto.brightfur.net

9 months ago

Reply to @IceWolf@masto.brightfur.net

@jfbastien but like, ideally the fix for that is more to get compiler makers to not do quite so much "it's undefined behavior so we can do whatever we want!", and less "you're not allowed to use C++ on wonky architectures". I would think.

But we've never even used C++ yet (only C), never ran into the weird UB things people complain about, and so yeah, no personal experience with this stuff.

rCs

rcs@hachyderm.io

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien @zwarich I also think it is unlikely C will follow anytime soon

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @rcs@hachyderm.io

@rcs @zwarich
C++ so fast, C so slow 🙃

Frost, glow wolf 🐺

IceWolf@masto.brightfur.net

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien Oh huh. That makes more sense; why specify it at the spec level, then? There's no benefit except for shafting weird architectures.

Unless there is a benefit somehow?

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @IceWolf@masto.brightfur.net

@IceWolf the "why bother?" part of the paper. I don't think it's world-shattering. Others think it is. So I wrote the paper.

Michael Wyman

mwyman@mastodon.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien @dominikg got pretty used to reading octal at the time.

John Regehr

regehr@mastodon.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien “octet”

shac ron ₪‎

shac@ioc.exchange

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien In practice, any embedded architecture that supports modern C++ also has 8-bit bytes.

While some DSPs and other specialized hardware cannot address 8-bit values, they usually rely on vendor-specific or hacked up open source toolchains. No vendor has the resources to keep up with modern C++ compilers, they will stick to C. In effect, C++ has complicated itself out of being a target for oddball toolchains.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @shac@ioc.exchange

@shac I hear that this is true for *almost* all of them 😅
*almost*

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @azonenberg@ioc.exchange

@jfbastien (for that matter I would also like to formalize that integers are twos complement and signed overflow wraps, which is how literally every mainstream ISA works anyway)

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @azonenberg@ioc.exchange

@jfbastien Or, worst case, make it implementation defined and provide a standard mechanism to determine the actual behavior (saturate / wrap / trap) on overflow.

✧✦Catherine✦✧

whitequark@mastodon.social

9 months ago

Reply to @azonenberg@ioc.exchange

@azonenberg @jfbastien iirc integers are two's complement already, but signed overflow is still UB

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @whitequark@mastodon.social

@whitequark @jfbastien Yeah and I am very much im favor of making the C++ standard reflect the realities of the 99.99% of hardware everyone actually uses and not UB things because some architecture from the 1970s does it differently.

For example defining that sizeof(P*) == sizeof(Q*) for any P and Q. And allowing printf(%p) to be used directly on any pointer type as a consequence.

Make the actual size implementation defined, by all means. As long as it's the same for any two pointer types.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @azonenberg@ioc.exchange

@azonenberg @whitequark
Yeah I wrote that paper and gave a talk about the outcome https://youtu.be/JhUxIVf1qok?si=aSPEivvr84c27pVk

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien @whitequark An opt-in overflowing integer type would be fine by me (i.e. "int32_overflow_t" or similar). As long as there's a way to do it in a well-defined manner when I want to.

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

9 months ago

Reply to @azonenberg@ioc.exchange

@azonenberg @jfbastien @whitequark 1 for each of the different possible behaviors, IMO: int32_wrap_t, int32_sat_t, and int32_trap_t

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @egallager@social.treehouse.systems

@egallager @jfbastien @whitequark I would not be opposed to that. And then software emulate any that aren't natively supported by hardware (with some means of querying if this is being done).

shac ron ₪‎

shac@ioc.exchange

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien Those people can keep using the same compilers they’re currently using. It’s not like you’re cancelling their C++ license.

shac ron ₪‎

shac@ioc.exchange

9 months ago

Reply to @shac@ioc.exchange

@jfbastien It’s also good to push hardware vendors to stop making dumb parts and firmware developers to stop making dumb decisions. Making a DSP without an 8-bit load/store ability is a poor choice, it isn’t 1994 anymore. Running C++27 on your 13-bit-word engine is also a terrible idea.
The concept that we should cater to all bad decisions forever is silly.

✧✦Catherine✦✧

whitequark@mastodon.social

9 months ago

Reply to @azonenberg@ioc.exchange

@azonenberg @egallager @jfbastien soooo... Rust's integer types, more or less?

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to @whitequark@mastodon.social

@whitequark @egallager @jfbastien Similar. It looks like in rust saturation is an operator (saturating_add) rather than a type "integer with saturating operations".

I don't have strong feelings on how it's done in C++ other than wanting a standard way to get overflowing, saturating, or trapping behavior on demand.

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @azonenberg@ioc.exchange

@azonenberg @whitequark @egallager
A subset of what you want is in C++26
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p0543r3.html

f4grx@chaos.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien @azonenberg @whitequark @egallager this is nice and clearly useful for many things but I really hope this all stays explicit and opt-in. Usual behaviour may be UB in the spec, but in practice it is very well defined in all not alien implementations.

using these different types will have very different consequences on the generated code and the programmer should be aware of them.

Matteꙮ Italia

cvtsi2sd@hachyderm.io

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien at work we use a DSP architecture with 16-bit bytes (TI TMS320F28xx; chat = int = 16 bit; long = 32 bit), definitely a bit odd, but not "a PDP-8 from the 70s" as many quip around here. TI provides a C and C++ compiler (AFAIK stuck into C++03), definitely a freestanding implementation with its own extra quirks. Clearly mandating char = 8 bit would make it impossible to support such a platform (or require bizarre compiler contortions to address half words).

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @cvtsi2sd@hachyderm.io

@cvtsi2sd yeah but docs like this place it squarely as a C++03 product. Does TI intend to move?
https://www.ti.com/lit/ug/spru514p/spru514p.pdf

Pavel Machek

pavel

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien There are exactly 8 bits in byte! :-). Whether unsigned char should be exactly 8 bits may be different discussion, but anything else is so uncommon that ... proposal looks good to me.

Andrew Zonenberg

azonenberg@ioc.exchange

9 months ago

Reply to

@jcelerier @whitequark @jfbastien Sorry, I meant any pointer to object/data (vs function pointers).

Paul E. McKenney

paulmckrcu

9 months ago

Reply to @miodvallat@hostux.social

@miodvallat @jfbastien Small is beautiful!!! ;-)

JF Bastien

jfbastien@mastodon.social

9 months ago

Reply to @jfbastien@mastodon.social

Maybe I should have called this paper “8ers gonna 8” 😂

Thanks for all the feedback so far! Keep it coming.
Here is an updated draft on making CHAR_BIT == 8 in C++:

https://isocpp.org/files/papers/D3477R1.html

pedestrian cyclist

odoruhako@mastodon.social

9 months ago

Reply to @jfbastien@mastodon.social

@jfbastien Ok, I'm not an expert, but why not make every byte be one bit? Seems to be the most reasonable choice for building an architecture. We can address every bit individually and there will be no wasted space...