Proposing that C++ acknowledge that there are exactly 8 bits in a byte.
I’d love feedback, especially from embedded / DSP folks who think this is a terrible idea.
https://isocpp.org/files/papers/P3477R0.html
@jfbastien How are we going to run C++26 code on our PDP-10s, then?
@paulmckrcu @jfbastien That's only 1/3rd of the gorgeous 36-bit word of good ol'PDP-10.
@jfbastien This sounds like it would disallow my totally serious architecture with a 63 bit byte and 64, 65, 66 and 67 bits for short, int, long and long long
@jfbastien I'd agree that the most recent CHAR_BIT=24 device I worked with, the Sigmatel STMP3500, and its 56000-based relatives, are not relevant to modern C++, nor vice versa. But there is no sane way to program those devices except in C, so the only sentence of your proposal that gives me pause is "Ideally, [the C and C++] committees would be aligned" -- it would not be ideal if C dropped support for such architectures, though I guess they can hardly _retroactively_ drop it from C99, C11 etc.
@TalesFromTheArmchair @jfbastien The trouble with the STMP3500 was it really shouldn’t have existed. There’s a language otherwise used for it, which resembles a pretty neat pseudocode for DSP ops. The original chips had tiny program memory, usually EPROM, and even tinier SRAM. You wouldn’t use C because it wouldn’t be efficient enough. Then SigmaTel came along and stuck 300KB on it. I’m not sure there are other useful examples of architectures like this.
@TalesFromTheArmchair right it wouldn’t be old versions that change! That’s not a thing that would be doable. Only newer language versions
@jfbastien @siracusa Are there embedded processors designed and made today that don’t use 8-bit bytes or do you mean people who for some reason have to keep supporting a chip from the 1970s?
@scottearle they’re too busy saying “well actually CHAR_BIT is…” to notice the paper. I will free them of this burden and unleash them onto the world.
@jfbastien In a way, tying C/C++ together might make this decision worse. A world where the C standard doesn't support a lot of very standard DSP architectures seems unlikely or at least very odd, whereas a world where the same is only true for C++ doesn't actually seem that odd. However, I might just have an incorrect view about the penetration of C++ into those markets.
@jfbastien but then how will i feel really clever when using i/CHAR_BIT, i%CHAR_BIT in yet another reimplementation of a bit set. years of useless standardeeze trivia down the drain
@jfbastien
I'm no longer in embedded but the only situation I can think of is when you have some crazy PIC microcontroller which has 12/14 bit instructions but is an otherwise 8 bit processor and uses a Harvard architecture.
@dominikg @jfbastien I’ve worked on Unisys mainframes with 9-bit bytes.
@aDot I put exactly this clever personality in the paper ☺️
Forever immortalized.
@jfbastien What about weirdass old systems, too? I guess that's similar to embedded.
@jfbastien I dunno if C++ has hosted/freestanding distinction like C does, but if so, a very good compromise would be mandating 8 bit byte for hosted. Then the DSP folks can do their freestanding thing. Full stdlib without 8 bit byte is bs, IMO.
@dalias oh that’s a great point. Will add to the paper (yes C++ has this distinction).
@jfbastien As someone who's been doing C++ embedded development for quite a few years, I can't remember the last time I saw a non 8 bit byte.
Certainly not in the last decade.
Very much in favor of mandating CHAR_BIT == 8.
@jfbastien Good point, still though.
It also limits anyone wanting to build new Weird Computer Architectures.
@jfbastien Though, on the other paw, if specifying it gets compiler makers to stop making weird assumptions around it for ✨OPTIMIZATION✨...
@IceWolf compilers don't optimize on this. It's hard-coded pretty strongly in the frontend, so there's zero ambiguity about this value from the compiler's perspective.
@jfbastien but like, ideally the fix for that is more to get compiler makers to not do quite so much "it's undefined behavior so we can do whatever we want!", and less "you're not allowed to use C++ on wonky architectures". I would think.
But we've never even used C++ yet (only C), never ran into the weird UB things people complain about, and so yeah, no personal experience with this stuff.
@jfbastien @zwarich I also think it is unlikely C will follow anytime soon
@jfbastien Oh huh. That makes more sense; why specify it at the spec level, then? There's no benefit except for shafting weird architectures.
Unless there is a benefit somehow?
@IceWolf the "why bother?" part of the paper. I don't think it's world-shattering. Others think it is. So I wrote the paper.
@jfbastien @dominikg got pretty used to reading octal at the time.
@jfbastien In practice, any embedded architecture that supports modern C++ also has 8-bit bytes.
While some DSPs and other specialized hardware cannot address 8-bit values, they usually rely on vendor-specific or hacked up open source toolchains. No vendor has the resources to keep up with modern C++ compilers, they will stick to C. In effect, C++ has complicated itself out of being a target for oddball toolchains.
@shac I hear that this is true for *almost* all of them 😅
*almost*
@jfbastien (for that matter I would also like to formalize that integers are twos complement and signed overflow wraps, which is how literally every mainstream ISA works anyway)
@jfbastien Or, worst case, make it implementation defined and provide a standard mechanism to determine the actual behavior (saturate / wrap / trap) on overflow.
@azonenberg @jfbastien iirc integers are two's complement already, but signed overflow is still UB
@whitequark @jfbastien Yeah and I am very much im favor of making the C++ standard reflect the realities of the 99.99% of hardware everyone actually uses and not UB things because some architecture from the 1970s does it differently.
For example defining that sizeof(P*) == sizeof(Q*) for any P and Q. And allowing printf(%p) to be used directly on any pointer type as a consequence.
Make the actual size implementation defined, by all means. As long as it's the same for any two pointer types.
@azonenberg @whitequark
Yeah I wrote that paper and gave a talk about the outcome https://youtu.be/JhUxIVf1qok?si=aSPEivvr84c27pVk
@jfbastien @whitequark An opt-in overflowing integer type would be fine by me (i.e. "int32_overflow_t" or similar). As long as there's a way to do it in a well-defined manner when I want to.
@azonenberg @jfbastien @whitequark 1 for each of the different possible behaviors, IMO: int32_wrap_t, int32_sat_t, and int32_trap_t
@egallager @jfbastien @whitequark I would not be opposed to that. And then software emulate any that aren't natively supported by hardware (with some means of querying if this is being done).
@jfbastien Those people can keep using the same compilers they’re currently using. It’s not like you’re cancelling their C++ license.
@jfbastien It’s also good to push hardware vendors to stop making dumb parts and firmware developers to stop making dumb decisions. Making a DSP without an 8-bit load/store ability is a poor choice, it isn’t 1994 anymore. Running C++27 on your 13-bit-word engine is also a terrible idea.
The concept that we should cater to all bad decisions forever is silly.
@azonenberg @egallager @jfbastien soooo... Rust's integer types, more or less?
@whitequark @egallager @jfbastien Similar. It looks like in rust saturation is an operator (saturating_add) rather than a type "integer with saturating operations".
I don't have strong feelings on how it's done in C++ other than wanting a standard way to get overflowing, saturating, or trapping behavior on demand.
@azonenberg @whitequark @egallager
A subset of what you want is in C++26
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p0543r3.html
@jfbastien @azonenberg @whitequark @egallager this is nice and clearly useful for many things but I really hope this all stays explicit and opt-in. Usual behaviour may be UB in the spec, but in practice it is very well defined in all not alien implementations.
using these different types will have very different consequences on the generated code and the programmer should be aware of them.
@jfbastien at work we use a DSP architecture with 16-bit bytes (TI TMS320F28xx; chat = int = 16 bit; long = 32 bit), definitely a bit odd, but not "a PDP-8 from the 70s" as many quip around here. TI provides a C and C++ compiler (AFAIK stuck into C++03), definitely a freestanding implementation with its own extra quirks. Clearly mandating char = 8 bit would make it impossible to support such a platform (or require bizarre compiler contortions to address half words).
@cvtsi2sd yeah but docs like this place it squarely as a C++03 product. Does TI intend to move?
https://www.ti.com/lit/ug/spru514p/spru514p.pdf
@jcelerier @whitequark @jfbastien Sorry, I meant any pointer to object/data (vs function pointers).
Maybe I should have called this paper “8ers gonna 8” 😂
Thanks for all the feedback so far! Keep it coming.
Here is an updated draft on making CHAR_BIT == 8 in C++:
@jfbastien Ok, I'm not an expert, but why not make every byte be one bit? Seems to be the most reasonable choice for building an architecture. We can address every bit individually and there will be no wasted space...
@odoruhako because most hardware that exists can’t address bits individually.
Oh hey Hacker News found me 🫣
https://news.ycombinator.com/item?id=41874394
@jfbastien So French!
May be we should call them Freedom Byte®©™ now 🤔
@DanielaKEngert
We errmmmm… should not ask the French to talk about “bite” 😳
The Germans, however, can say it kindly! Please 🙏
@dgregor79 people like you appreciate this kind of stirring! ☺️
@richardday @shac
See the updated paper https://wg21.link/D3477R1