social.kernel.org

Conversation

Zack Weinberg

zwol@hackers.town

It's late enough to be hacker hours, if you're as old as I am. Gonna write down a bunch of rambly thoughts about #xz and #autoconf and capital-F Free Software sustainability and all that jazz. Plan is to edit it into a Proper Blog Post™ tomorrow. Rest of the thread will be unlisted but boosts and responses are encouraged.

1

2

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Starting with the very specific: I do not think it was an accident that the xz backdoor's exploit chain started with a modified version of a third party .m4 file to be compiled into xz's configure script.

It's possible to write incomprehensible, underhanded code in any programming language. There's competitions for it, even. But when you have a programming language, or perhaps a mashup of two languages, that everyone *expects* not to be able to understand — no matter how careful the author is — well, then you have what we might call an attractive nuisance. And when blobs of code in that language are passed around in copy-and-paste fashion without much review or testing or version control, that makes it an even easier target.

So, in my capacity as one of the last few people still keeping autoconf limping along, I'm thinking pretty hard about what could be done to replace its implementation language, and concurrently what could be done to improve development practice for both autoconf and its extensions (the macro archive, gnulib, etc.)

3

3

1

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Side bar, I know a lot of people are saying "time to scrap autotools for good, everyone should just use cmake/meson/GN/Basel/..." I have a couple different responses to that depending on my mood, but the important one right now is: Do you honestly believe that your replacement of choice is *enough* better, readability wise, that folks will actually review patches to build machinery carefully enough to catch this kind of insider attack?

2

2

1

sam

thesamesam@social.treehouse.systems

Reply to @zwol@hackers.town

Edited 2 years ago

@zwol Yes, I genuinely do believe this for Meson, because it doesn't let you implement custom functions. It generally tries to avoid the metaprogramming thing which e.g. CMake does.

I can say this because I do review the whole build system diff for packages whenever I can.

CMake in particular didn't really learn a lot of the lessons from Autotools - it has the same "weak typing" problem and encourages the globs of modules/macros.

https://mesonbuild.com/FAQ.html#why-doesnt-meson-have-user-defined-functionsmacros covers this.

But I'm not going to pretend that fixes everything, it's just a point that's been on my mind. Not trying to devalue the meaningful discussion you're starting here which is brilliant so far.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

On the subject of implementation language, I have one half-baked idea and one castle in the air.

The half-baked idea is: Suppose ./configure continues to be a shell script, but it ceases to be a *generated* shell script. No more M4. Similarly, the Makefile continues to be a Makefile but it ceases to be generated from Makefile.am. Instead, there is a large library of shell functions and a somewhat smaller library of Make rules that you include and then use.

For ./configure I'm fairly confident it would be possible to do this and remain compatible with POSIX.1-2001 "shell and utilities". (Little known fact: for a long time now, autoconf scripts *do* use shell functions! Internally, wrapped in multiple layers of M4 goo, but still — we haven't insisted on backcompat all the way to System V sh in a long, long time.) For Makefiles I believe it would be necessary to insist on GNU Make.

This would definitely be an improvement on the status quo, but would it be *enough* of one? And would it be less work than migration to something else? (It would be a compatibility break and it would *not* be possible to automate the conversion. Lots of work for everyone no matter what.)

3

2

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Suppose that's not good enough. Bourne shell is still a shitty programming language, and in particular it is really dang hard to read, especially if you're worried about malicious insiders. Which we are.

Now we have another problem. The #1 selling point for autotools vs all other build orchestrators is "no build dependencies if you're working from tarballs," and the only reason that works is you can count on /bin/sh to exist on anything that purports to be Unix. If we want to stop using /bin/sh, we're going to have to make people install something else first, and that something else needs to be a small and stable Twinkie. Python need not apply (sorry, Meson).

What's small and stable enough? Lua is already too large, and at the same time, too limited.

There's one language that's famous for being tiny, flexible, and pleasantly readable once you wrap your head around it: Forth.

If I had investments to live off, I would be sorely tempted to take the next year or so and write my own Forth that was also a shell language and a build orchestrator, and then have a look at rewriting Autoconf in *that.* This is the castle in the air.

5

1

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Edited 2 years ago

Side bar 2: Let's table the whole "shouldn't everyone build from git nowadays?" discussion. I'm quite sure the xz insider could've found a way to hide the stage 0 exploit in a checked-in file. If you care about ways to make the output of "make dist" verifiable and reproducible, and to facilitate building from VCS checkout for those who want that, we're actually having a productive discussion about that on one of the autotools mailing lists right now.

(Not sure which list — I sort them all into one mailbox — and I have to warn you that several other less helpful conversations are happening under the same subject line.)

2

1

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Moving to the more general.

I said this over on the autoconf lists earlier today: just as I think it is a mistake to focus on the stage 0 exploit having been concealed by not checking it into the VCS, I also think it is a mistake to focus on the next few stages having been concealed in a binary file. There are binary files that are naturally editable and auditable as themselves (raster images, for instance) and there are text files that nobody wants to look at at all (ever tried to fix a merge conflict in an SVG image?)

A more interesting line to draw, IMO, is between code and tests. I feel quite confident in saying that the files written to $prefix by "make install" should never need to have any sort of dependence on the project's test suite, and that is something that ought to be possible to detect mechanically (the biggest challenge is determining what files of the source repo are exclusively part of the test suite).

1

1

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

tea break

1

0

0

sam

thesamesam@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol You've no doubt seen https://lists.gnu.org/archive/html/bug-autoconf/2024-03/msg00000.html already.

It's quite common for me to come across stale autoconf-archive (or other shared) macros. Just last week, I did https://github.com/LibRaw/LibRaw/pull/633.

I don't know what the fix is for that. Perhaps making autoreconf -fiv note if it did/didn't replace would help.

But gnulib isn't something we can ignore, either. The use of gnulib across projects is inconsistent. Not everyone uses its bootstrap script, some people use the Python verson of it, some people use gnulib as a submodule, some people use it as a dir called lib, some gnulib, sometimes it's buried.

Bisecting projects which use gnulib isn't easy partly because of the above.

For projects not using submodules, it's also not easy to figure out how to reproduce it at all.

https://lists.gnu.org/archive/html/groff/2024-03/msg00211.html got me thinking about this as well. I don't know what the answer is.

(Please don't get me wrong, I can't stand submodule UX.)

3

0

0

Colin Watson

cjwatson@mastodon.ie

Reply to @thesamesam@social.treehouse.systems

@thesamesam @zwol Minor detail: is that "not using submodules and not using GNULIB_REVISION"? Because I'm probably biased but I don't see why it should be difficult to reproduce them in the latter case.

1

0

0

sam

thesamesam@social.treehouse.systems

Reply to @cjwatson@mastodon.ie

Edited 2 years ago

@cjwatson @zwol I absolutely forgot that was a thing. Yeah, totally fine when that exists.

I think I'm mostly complaining about the inconsistency + when people make up their own method?

(although I think documenting it would be useful -- check out what nano does, there's no real way to discover this kind of thing automatically right now)

0

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @zwol@hackers.town

While I completely agree that shell is not a great language to work with, I'll take it over Forth any day. And even more importantly, I think it's important to not introduce a new implementation of a language for this purpose. I'd rather have not just an established language but an established implementation of that language.

If you're already expecting people to have gmake installed, how about using Guile/Scheme, which gmake already has built-in support for? That would be a substantial improvement over shell, and Scheme has a respectable library and macro system with which to provide a variety of helpful functions people can use.

(That said, I think Python is still a reasonable choice as well, and one that many many systems will already have installed.)

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Last bit. Community, sustainability, and trust.

The early free software movement (1983–1994 give or take) was, as I've heard the tales, consciously revolutionary, and, as revolutions often do, it ran on the spare time of relatively young people with time and energy to spare.

I came on the scene in 1997, right about the time it became reasonably possible to run Linux as your only desktop OS if you knew what you were doing — or, to put it another way, right about the time the original goal of the GNU Project had been achieved.

Like many other revolutions, GNU had no answer, and still doesn't, to the question: now what?

This is not the only reason the young, energetic revolutionaries of 1997 are now the exhausted maintainers of an archipelago of individual "projects" that sort of add up to a computing environment that one might fairly describe as "the worst (except for all the others)". But I think it's an important reason.

1

1

0

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Side bar 3: In the middle 1990s someone — either Eric Raymond or Guy Steele — wrote as part of the "Portrait of J. Random Hacker" appendix to the Jargon File

> [Among hackers] racial and ethnic prejudice is notably uncommon and tends to be met with freezing contempt.

This was not true even at the time, and twenty years later ESR was cheerfully making common cause with Vox Day and the Sad Puppies.

I'm a white guy (albeit some of my grandparents weren't). I already knew how to program when I got to college. If I'd made different choices in the early 2000s, I could very well now be sitting on enough investment income to take a sabbatical and invent a new shell language.

When we look around and say "where do we find the helping hands we so desperately need?" we must recognize that part of the problem is that hacking was never as inclusive a club as we claimed.

(This sidebar is not *only* a response to the commenters who saw a name like "Jia Tan" and immediately started hating on China as a whole.)

1

1

1

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

Last thought for tonight: Riding on the time and energy of revolutionaries no longer works. Giving away stuff for free and then asking corporations to pay us *never* worked. Grants from governments and NGOs works only for the stuff you can successfully write grants for, which is almost never "five years' salary for invisible maintenance tasks." What's left?

2

3

1

Scott M. Stolz

scott@authorship.studio

Reply to @zwol@hackers.town

Edited 2 years ago

I think that is why many people are adopting a hybrid strategy regarding open source, such as:

1. Basic features free; commercial features require a paid license.
2. Offering services related to the software (hosting, SaaS, custom development, support, etc.)
3. Creating an Association that collects membership dues, contributions, and grants.

There are ways to make money on open source software. The purists shy away from that, but the reality is that people need to pay their bills, and monetization is how you pay your bills without having to work for an employer.

0

0

0

Jamey Sharp

Reply to @zwol@hackers.town

@zwol I'm not sure I'm quite prepared to accept Forth as the standard build language, but I certainly wouldn't be sad about it if that turned out to be the answer everyone went with.

That makes me curious, though, how well WebAssembly might do as a compromise. Its text representation can be reasonably clear, at least as far as stack languages go, and it's nicely explicit about what non-computational (I/O) capabilities you're asking for. I can think of various ways that could work that seem nice to me, but this is already a thought experiment on top of a thought experiment so I'll stop there.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @jamey@toot.cat

@jamey For possibly irrational reasons having to do with my past employment history I don't trust WebAssembly to continue being a going concern. I might change my mind in another 15 years or so.

2

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @zwol@hackers.town

Leaving aside *which* specific sandbox we use, there's something to be said for build systems being a sandbox that has well-defined capabilities and non-capabilities. Earlier in the thread you mentioned having test suites not write to the source directory, for instance; a sandbox would allow enforcing that. Similarly, we could set rules for what's unreasonable for a build system to read and to write in which phases, and such constraints could make it easier to review and catch weirdness.

0

0

0

Jamey Sharp

Reply to @zwol@hackers.town

@zwol Fair enough! It was pretty much idle speculation anyway.

0

0

0

Zack Weinberg

zwol@hackers.town

Reply to @josh@joshtriplett.org

@josh Huh, you dislike Forth that much? The *only* non-esoteric programming languages I would personally rate as worse than Bourne shell are C shell, DOS batch, VMS DCL, and Tcl, just so you know where I'm coming from here.

I do not actually *like* the idea of requiring GNU Make, and I think Guile is a non-starter for the same reason I don't think Python or Perl are an option: it would make architecture bootstrap worse. I get why you want an established implementation of an established language, but that's very much in tension with "the interpreter for the replacement implementation language for ./configure should not need a ./configure itself".

1

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @zwol@hackers.town

I would *much* rather program build systems or anything else in shell than in stack-oriented RPN, yes. And that aside, I think Forth or any other non-everyday language fails in a similar fashion to m4: most people will treat it as "you are not expected to understand (or review) this" magic to be copy-pasted, precisely *because* the only time most people would encounter it is in this hypothetical successor to autoconf. We should strive for code that's easy and inviting to review using skills many people already have from non-build-system code.

And I don't think we should make everyone's experience with build systems worse just to make architecture bootstrap marginally easier. Let's find a solution for architecture bootstrap, and then let people write Python. If that means we need to ensure we can build Python without Python, or handle Python via build-an-old-version-then-successively-newer-versions, so be it.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @thesamesam@social.treehouse.systems

@thesamesam I feel like this feeds into the sustainability issues; it's great that you have the time and energy to review whole build system diffs, but we need institutions and processes that are resilient in the *absence* of people prepared to go to that much effort.

Re Meson and user defined functions/macros/rules/whatever, when I tried to convert libxcrypt to Meson I got 90% done and ran right into that limitation. I *think* I could have worked around it with even more custom scripts but that was about the same time Florian Weiner (iirc) asked me not to make Red Hat's architecture bootstrap pull in Python.

1

0

0

sam

thesamesam@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol Yes - to be clear, I don't consider what I do to be sustainable, and it wears me out. I wanted to share the perspective there on "I do this and something which is a limited DSL would help me a lot".

Totally agree.

0

0

0

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol I think mostly what autoconf needs is stricter style guidelines and better diagnostics; an entire change of implementation language would be too much, IMO

0

0

0

Zack Weinberg

zwol@hackers.town

Reply to @thesamesam@social.treehouse.systems

@thesamesam gnulib makes my head hurt as well. If I had funding and helpers (guesstimate this needs 4 or 5 highly experienced C programmers fulltime for 3 months) I'd like to go through all of gnulib and all of autoconf-archive and write down which pieces are still useful when, which pieces are partially or completely redundant to core autoconf macros, etc. Goal being to really nail down who actually needs what anymore.

1

2

1

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol @thesamesam That's something I've kinda been working on off-and-on-again over the years: https://github.com/autoconf-archive/autoconf-archive/pull/289

0

0

0

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

Reply to @thesamesam@social.treehouse.systems

Edited 2 years ago

@thesamesam @zwol some sort of consistent recommendation about whether or not it's generally a good idea to use aclocal's `--install` flag would be helpful; I personally always put it in the ACLOCAL_AMFLAGS of all my automake Makefiles, but as discussions have shown, there are some drawbacks to that

0

1

0

Wendy (prev Jason) A

norgralin@hachyderm.io

Reply to @zwol@hackers.town

@zwol it’s 2024. Assuming Python exists would not be out of line. That would cover all the hobbyists leaving behind organizations that should probably be coughing up money to solve their own problems.

2

0

0

Zack Weinberg

zwol@hackers.town

Reply to @norgralin@hachyderm.io

@norgralin the problem with assuming python exists is it has an enormous dependency list and a big hairy configure script itself

0

0

0

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

Reply to @norgralin@hachyderm.io

@norgralin @zwol IMO, a statement to the effect of "it's safe to assume the existence of Python" ought to get added to POSIX first

0

0

0

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol I've been trying to get governments to fund this kind of stuff without requiring people to file grant applications first, but it's tough going... I think if we could lobby for more people who understand FOSS to be appointed as state IT directors, that could help, though. It would require political organizing and lobbying, but it could be worth it.

1

0

0

Rep. Eric Gallager (no "h"!)

egallager@social.treehouse.systems

Reply to @egallager@social.treehouse.systems

@zwol Specifically, bird-dogging gubernatorial candidates would be one strategy. If candidates for governor in your state hold publicly-open town halls, go and ask, "What sort of qualities would you look for when nominating a state IT director?" and then grade them based on how FOSS-maintenance-friendly their answers are.

0

0

0

Zack Weinberg

zwol@hackers.town

Reply to @josh@joshtriplett.org

@josh That's a really good point about languages people already know. I suppose we could try to define a subset of a fixed older version of Python (3.6 or so) that was sufficient to run Meson. A Scheme subset as suggested in another branch (GNU Mes) seems like it would be less work, though.

1

0

1

Josh Triplett

josh@joshtriplett.org

Reply to @zwol@hackers.town

I don't think it's sustainable to force an older version of anything, except for bootstrapping (e.g. use old Python to build new Python).

Scheme would be less work for bootstrapping but more work for users (since most users won't be using Scheme for anything else). I'd rather have less work for users, which means more potential reviewers.

It's challenging to make Python code opaque, and doing so is automatically suspicious.

(As an aside, when I'm saying "Python" I'm not automatically assuming "to run Meson". Meson has some nice properties, but is not the end-all be-all of build systems, and seems far too opinionated to be a universal build system for everyone.)

1

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @josh@joshtriplett.org

(Using Mes to bootstrap the world is still a good idea, though. But the tools used for bootstrap don't need to be the same as the tools used for day-to-day builds.)

0

0

1

Xdej

Reply to @zwol@hackers.town

@ska do you still have have an alternative shell to Forth which is small and not limited, and do you want to suggest it to @zwol?

1

0

0

Laurent Bercot

ska@social.treehouse.systems

Reply to @xdej@mamot.fr

@xdej @zwol I do have execline - https://skarnet.org/software/execline/ - as a noninteractive scripting language that is simpler and more logical than the shell, is trivial to audit because there's very little code, and is easy to programmatically generate.

However, as much as I can recommend it if the goal is "autogenerate scripts that will not have quoting nightmares", I simply cannot recommend it 1. as a programming language if you're going to do anything else than combine binaries invocations in weird Unix ways, 2. as something that would be easier to read than the shell if someone's going to take a look at the generated scripts.

execline would certainly have its uses somewhere in a build system (typically I like to use it sometimes to replace shell invocations in a Makefile) but it's not a good fit to replace a shell as is used by autotools. It utterly lacks any programming language features.

However, since we're talking about Forth, I should mention T1 - https://t1lang.github.io/ - which is a Forth-like in progress and incredibly minimal. It's written by the author of BearSSL, whose code I have studied and can vouch for. It's certainly not applicable to autotools as is either, but as a programming - not scripting - language, if we're just exploring and throwing ideas around, it should be in the conversation.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @ska@social.treehouse.systems

@ska @xdej To be clear, any replacement for autoconf that I design will *not* generate a script from a custom input language. I want a developer experience much more like programming in a normal scripting language with a bunch of specialized libraries to hand. Whether there are syntactic conveniences enough to call it a DSL is an open question.

There might need to be a tool that packs up an appropriate subset of the library for inclusion in tarball releases, depending on how that winds up working.

1

0

0

Laurent Bercot

ska@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol @xdej Also note that the "replacing is only a good idea if the replacement is actually better" argument (which I agree with) is also valid for autoconf-next.

The big advantage of autotools over other build systems, IME, is that autoconf produces a configure script that has NO dependencies. The developer has to install m4 and autoconf, but the user doesn't need anything more than a vaguely POSIX environment.

If you change autoconf so that it produces something that needs to be interpreted by something that the user doesn't natively have, it becomes something entirely different. It cannot be used in bootstrapping anymore. The interpreter needs its own build system. And it's much less convenient for the user - now you're in Meson or Ninja territory, which means that in order to have some value you need to be *better* than Meson or Ninja. It's a risky proposition.

The fact that /bin/sh is available everywhere is, I believe, a *really* strong argument in favor of keeping /bin/sh, and looking for autoconf improvements elsewhere.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @ska@social.treehouse.systems

@ska @xdej I provisionally agree with this, although you might be interested to know that some of the other respondents to this thread believe that scenarios where Python is unavailable are unimportant.

1

0

0

Laurent Bercot

ska@social.treehouse.systems

Reply to @zwol@hackers.town

@zwol @xdej Yeah, of course they do. But the starting point of all this is a supply chain attack and the end goal is to simplify and minimize chains of trust. Adding Python to the dependencies doesn't exactly feel like an improvement using that metric.

0

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @zwol@hackers.town

@zwol M4 always was horrible, and I say that as the original author of GNU M4.

The original Unix M4 was weird, and there weren't really any good explanations for why it was the way it was. Apparently someone at Bell labs needed a preprocessor and wrote M4, sometimes in the '70, for no other greater purpose than to scratch a personally itch.

GNU M4 only exists because RMS wanted GNU to have what Unix had, and while I wanted to do something different and better, RMS convinced me to do M4 first.

12

20

4

Sumana Harihareswara

brainwane@social.coop

Reply to @seindal@mastodon.social

@seindal Thanks for sharing this! May I boost this post? @zwol

2

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @brainwane@social.coop

@brainwane @zwol Naturally you can.

1

0

0

Sumana Harihareswara

brainwane@social.coop

Reply to @seindal@mastodon.social

Thanks. (I try to be cautious when someone's marked their post as unlisted, since different people have different expectations and etiquette about that.)

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @brainwane@social.coop

@brainwane @zwol I didn't know it was unlisted, but if might be because it's a reply.

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @seindal@mastodon.social

@brainwane @zwol I'm using the standard app. How do I see if a post is listed or not?

1

0

0

Sumana Harihareswara

brainwane@social.coop

Reply to @seindal@mastodon.social

I believe your post was marked as unlisted because you replied to an unlisted post and your reply defaulted to the same status as @zwol 's.

I use a different interface than you, but: in the web interface, you can see that https://mastodon.social/@seindal/112223948014049603 has a partly-occluded moon icon [indicating it is unlisted], whereas Zack's post *starting* the thread has a globe icon [indicating that it is visible for all/"public" to searches and other discovery mechanisms].

1

0

0

Sumana Harihareswara

brainwane@social.coop

Reply to @brainwane@social.coop

Some applications and web interfaces indicate unlisted posts with an icon of an *unlocked padlock* alongside. That seems a popular choice so your app may do that too.

0

0

0

Jarkko Sakkinen

jarkko

Reply to @seindal@mastodon.social

@seindal @zwol reminded me of this presentation where the presenter totally mixes up with the toy example ;-) https://youtu.be/ULZxHSPWn98?t=1005

imho, m4 has its time and place in history and inspired bunch of other tools.

i remember using it ages ago for e.g. static web site generation where it did the job (today i'd probably pick something more modern), and of course still continuing enforced use with horrific autotools from time to time :-)

1

0

3

Demi Marie Obenour

alwayscurious@infosec.exchange

Reply to @seindal@mastodon.social

@seindal @zwol Why do you dislike M4?

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @alwayscurious@infosec.exchange

@alwayscurious @zwol It has an awful syntax, especially around quoting, and while you can do a lot, some very simple problems have really contorted solutions.

I wrote it, but have hardly ever used it.

0

1

0

Demi Marie Obenour

alwayscurious@infosec.exchange

Reply to @seindal@mastodon.social

@seindal @zwol M4 is indeed horrible. Some of my problems with it:

Macro names don’t have a prefix.
Quoting rules that make it very hard to escape quotes.
No embedded scripting language (such as Lua).
Purely textual rescanning.

Ironically, RPM’s macro language is probably the best I have seen.

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @alwayscurious@infosec.exchange

@alwayscurious @zwol Exactly. The way quoting works is a huge problem.

I added the possibility of having an escape character for macros.

0

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @jarkko

@jarkko @zwol I think it was made as a preprocessor for ratfor, rational fortran, I believe. It was in Unix v7, but then it got picked up by Berkeley and was used in BSD for generating sendmail configurations, which was already a crime in itself.

Lucky we're well rid of sendmail, but then autotools picked up m4, for god knows which reasons, prolonging it's agony by decades.

1

0

1

This account has moved

thebluewizard@hackers.town

Reply to @seindal@mastodon.social

@seindal @zwol Makes sense. RMS envisioned GNU as a something of drop-in replacement for UNIX. So, if M4 is in UNIX, then M4 shall be in GNU.

Nowadays, there zillions of flavors of UNI* and various programs that it is no longer really that important now. But, still...

2

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @thebluewizard@hackers.town

@thebluewizard @zwol That was exactly the reasoning. It was before POSIX, so the model was v7 with some Berkeley enhancements.

0

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @thebluewizard@hackers.town

@thebluewizard @zwol Once every major computer company has their own Unix lookalike, and they were often very different.

I've worked with System V, BSD, Ultrix, HP-UX, Aix, Xenix, SunOS, Solaris, and maybe some more.

A program which compiled on one system rarely did in another.

It's so much easier now.

0

0

0

Jef Poskanzer

jef@mastodon.social

Reply to @seindal@mastodon.social

@zwol @seindal This got me to read up on the history of m4 and TIL that it's based on the ratfor macro stuff. I always assumed it was the other way around!

0

0

0

Sumana Harihareswara

brainwane@social.coop

Reply to @brainwane@social.coop

Also, I have never been to Venice, but now I am eager to go someday so I can hire you for a tour! And I am wondering what a "walk/boat tour" of the history of GNU would be like.

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @brainwane@social.coop

@brainwane @zwol 😄

I never was central to anything in the GNU project, I just wrote a program and a manual to give something back for all the software we got for free.

My colleagues back then agreed and covered my back for a month while I did the job.

0

0

0

Jarkko Sakkinen

jarkko

Reply to @seindal@mastodon.social

@seindal @zwol I could imagine that at the time make + m4 was quickest way to PoC a higher level build system than raw makefile because you get away without having to implement a macro language so i'd guess that autotools was an unfortunate accident :-)

1

0

1

Zack Weinberg

zwol@hackers.town

Reply to @jarkko

@jarkko @seindal It was well before my time, but I am pretty sure the logic of whoever started autoconf went like

* Imake is terrible
* A big chunk of why Imake is terrible is because the C preprocessor isn't Turing complete and isn't designed to crunch anything but C
* Are there any other macro languages lying around that are more capable than CPP?
* Oh, hey, M4! Let's see how that goes.

0

0

0

Ludovic Courtès

civodul@toot.aquilenet.fr

Reply to @zwol@hackers.town

@zwol In #bootstrapping circles, we have GNU Mes and Gash (the combination of which is good enough to run ./configure scripts).

The Racket folks have switched to Zuo as their build system, also based on a minimal Scheme implementation.

Maybe not a universal option, but I can imagine a build system based on Mes/Zuo, at least in the circles I care about.

2

0

0

Philip McGrath

LiberalArtist@mastodon.social

Reply to @civodul@toot.aquilenet.fr

@civodul @zwol Upstream Chez Scheme now also uses Zuo. And by “minimal scheme implementation” there is a single file, zuo.c, that builds with `cc -o zuo zuo.c`—no configure required, and even very old or limited C compilers supported.

1

0

0

Philip McGrath

LiberalArtist@mastodon.social

Reply to @LiberalArtist@mastodon.social

@civodul @zwol In this context, maybe it’s worth mentioning that Racket and Chez Scheme use Zuo to replace make (keeping a stub makefile): they specifically don’t try to replace configure. Racket uses Autoconf; Chez Scheme uses a handwritten shell script.

The Zuo language certainly could be used to write a configure script. I mention this just to reaffirm that implementing ./configure does have specific challenges!

1

0

0

Philip McGrath

LiberalArtist@mastodon.social

Reply to @LiberalArtist@mastodon.social

@civodul @zwol In fact, I nudged Zuo to use Autoconf for its optional configure script to get the standard installation flags and such to work.

1

0

0

Andy Wingo

wingo@mastodon.social

Reply to @LiberalArtist@mastodon.social

@LiberalArtist @civodul @zwol probably the biggest obstacle for guile to switch away from autoconf would be gnulib (include-only lib to polyfill modern posix+gnu extensions); it integrates tightly with autoconf/m4/automake

0

0

0

Simon Tournier

zimoun@sciences.re

Reply to @civodul@toot.aquilenet.fr

@civodul Hum the attack would be a bit more sophisticated for GNU Mes and Gash as implemented in Guix. But still…

Instead of targeting plain Bash, one needs to target the Guix package ’guile-bootstrap’. This package depends on tar, bash, mkdir and xz; it adds some surface.

Else, it would also be possible to exploit the non-deterministic Gash compilation to hide stuff.

https://simon.tournier.info/posts/2023-10-01-bootstrapping.html

The attack would be much more complicated, I guess.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @zimoun@sciences.re

@zimoun @civodul I thought about it some more and absolute size is not the most important issue here; the most important issues are (1) how difficult is it to install the thing, and (2) how much more readable than sh(+m4)+make do you get for the effort. That said, size does matter in that someone might want to audit the language they're being asked to install, on top of everything else. And the big popular interpreted languages tend to have large dependency graphs, which makes their true size even bigger, makes them harder to install, and makes problems for bootstrapping.

Python and Perl are very large (current releases are ~1.2M lines of code each according to SLOCCount), nontrivial to install from source, and problematic at the lowest levels of the bootstrap chain.

A mostly complete implementation of POSIX shell and utils, namely busybox, can be fit into 200,000 lines. bash+coreutils has important missing pieces (grep, sed, awk, find, diff are the ones I know about) and is about twice as big.

mes+gash+gash-utils is ~70,000 lines. Lua is ~20,000. Neither Scheme nor Lua feels like *enough* of a readability improvement over sh to be worth the switching costs.

I would say that 20,000 lines of C is about the upper limit for what I'd feel comfortable demanding people install before they can build the thing they actually wanted to build.

Furthermore, any such component cannot require a complex configure+build process itself lest we have a circular dependency.

2

0

0

Ludovic Courtès

civodul@toot.aquilenet.fr

Reply to @zwol@hackers.town

@zwol @zimoun To be fair, Mes includes a C library, a C compiler with 4 backends, etc. The parts that would matter here are the interpreter, which is ~6K lines of C under src/.

Zuo has an interpreter with ~8K lines of C and ~5K lines of Zuo (Scheme).

This should be compared with the line counts of Perl + Auto{conf,make} + Make or CMake + Make/Ninja.

1

0

0

Ludovic Courtès

civodul@toot.aquilenet.fr

Reply to @civodul@toot.aquilenet.fr

@zimoun @zwol Speaking of build systems: in 2008, Tom Tromey wrote Quagmire, a proof-of-concept replacement of Autoconf + Automake, mostly compatible with the latter, implemented in GNU Make (~1K lines).

https://tromey.com/blog/?cat=16
https://github.com/tromey/quagmire

It’s appealing because GNU Make is ubiquitous and ‘Quagmire’ files looked very much like ‘Makefile.am’.

The downside is that it’s hard to debug and work with (lots of ‘eval’ tricks…). Less appealing than Zuo or similar to me.

1

0

1

Zack Weinberg

zwol@hackers.town

Reply to @civodul@toot.aquilenet.fr

@civodul @zimoun Oh, I remember when Tom announced Quagmire. Hard to debug is exactly what we don't want, though — I bet it would be easy to hide a back door in 1000 lines of cryptic Makefile tricks.

0

0

0

js

Reply to @zwol@hackers.town

@zwol @zimoun @civodul I’ve seen Fossil use Tcl for their configure script and shipping a micro-Tcl called jim that’s a single .c file for when you don’t have Tcl

0

0

0

ArneBab

ArneBab@rollenspiel.social

Reply to @seindal@mastodon.social

@seindal wasn’t the reason for M4 that they needed some templating language that’s easy to implement very low level for bootstrapping?

Building it in a month sounds like it fulfilled that requirement.
@zwol

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @ArneBab@rollenspiel.social

@ArneBab @zwol I had the printed Unix v7 manual as a reference, but I had to stay clear of the actual Unix v7 sources, which we had on a tape, but I couldn't use.

0

0

0

Caleb Maclennan

alerque@mastodon.social

Reply to @seindal@mastodon.social

@seindal @zwol Bookmarking this so I have somebody to direct my rants at next time m4 makes me want to pull out hair...

Ironically I spent a couple hours just this morning before seeing this thread wrangling some new autoconf macros and don't feel the need for any ranting; it's a decades old love/hate for me. I still sometimes reach for m4 even apart from autotools, but it is eccentric for sure.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @seindal@mastodon.social

@seindal Thanks for the history. I've never thought much about M4 considered as itself, separate from the autoconf DSL (where it isn't a great fit but I see why it was chosen) "At least it's more capable than the C preprocessor" is what I probably would have said ... But its clearly not what anyone would build today.

2

0

0

Lukáš Zapletal

lzap@mastodon.social

Reply to @zwol@hackers.town

@zwol @seindal SELinux makes heavy use of it. And sendmail as far as I remember or was it not m4? :)

1

0

0

Eli the Bearded

elithebearded@fed.qaz.red

Reply to @zwol@hackers.town

@zwol @seindal It's been a while since I was running sendmail, but that has an m4 cf file generator. It was my first real exposure to m4, back in the nineties.

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @lzap@mastodon.social

@lzap @zwol Sendmail used M4 to generate the config files.

I didn't know SElinux uses it.

1

0

0

jaj

jaj@mastodon.social

Reply to @elithebearded@fed.qaz.red

@elithebearded
sendmail m4 config was really horrible. I did however write a paper once using m4 to enhance pandoc markdown and it was quite nice
@zwol @seindal

0

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @alerque@mastodon.social

@alerque @zwol I might have written the program but I didn't put it into autoconf.

It didn't even exist back then. I wrote GNU m4 in '89, while autoconf is from '91.

0

0

0

irenes@mastodon.social

Reply to @seindal@mastodon.social

@seindal @zwol oh ha, happy to see you weigh in here

yeah... we've read a lot of the Bell Labs cohort's papers. it cannot be over-stated the degree to which EVERYTHING in Unix, including the kernel, was somebody's personal thing they did for fun. mind, we see that as a good thing, but it does mean people today need to think critically about it now and again.

0

0

0

Zack Weinberg

zwol@hackers.town

Reply to @seindal@mastodon.social

@seindal @lzap@mastodon.social I had a work-study sysadmin job in college, long enough ago to be *before* sendmail introduced the M4-generated .cf file. Writing that shit by hand was not a fun time.

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @zwol@hackers.town

@zwol Luckily I never had to do that kind of stuff. I do, however, remember the regular moaning from the mail guy the next table over.

0

0

0

d@nny disc@

hipsterelectron@circumstances.run

Reply to @seindal@mastodon.social

@seindal @zwol question if anyone has a moment: i have always wanted to try to hack autoconf/autotools to parallelize ./configure checks, and i have been told there is e.g. a lot of reliance on statefully modifying a hardcoded file path. in light of the above framing, would you recommend i avoid trying to do this with autotools at all and make a replacement from scratch without m4 (which would parallelize ./configure checks) or would you suggest some sort of translation layer, or something else?

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @hipsterelectron@circumstances.run

@hipsterelectron @seindal In the current architecture, I think AC_CHECK_HEADERS and _FUNCS could be parallelized easily, and that would give 80% of the potential benefit. Parallelizing anything but the CHECK_[plural] macros, however, is hopeless — not because of state in a file (that file is unordered and append-only, it'll be fine even with parallel checks) — but because of every existing configure.ac expecting their checks to occur in sequence.

Regardless, I would be delighted to have any help whatever with autoconf and/or a replacement ;-)

1

0

0

d@nny disc@

hipsterelectron@circumstances.run

Reply to @zwol@hackers.town

@zwol @seindal gettext is (reasonably) in the dependency tree for a lot of c/c++ code, and a tool like @spack (very similar to nix and guix) which builds from source often ends up spending just over a minute blocking on configure checks for gettext (we optimized this a while ago by emulating arrays in posix sh with eval). i really appreciate your feedback -- it sounds like parallelizing AC_CHECK_HEADERS and _FUNCS would still be a useful contribution even if we probably need to break compatibility to get at the remaining parallelism.

i'm also particularly wondering whether (with e.g. a new architecture for labeling/composing checks) we might be able to have a global cache of checksummed configure tests with unique identifiers, so that two separate projects A and B can both depend (at configure time) on labelled definitions for compiler checks from project C, and if A and B are both configured with the same value for CC/etc, then the result of the check can be retrieved from the cache and reused without needing to invoke the compiler again. this would depend on some interface for composing checks that avoids any implicit serialization, and refers to some checksum to deduplicate checks performed with the same parameters.

i recently implemented a similar system to deduplicate python sdist builds in pip, sharding the cache by python interpreter version and interacting with HTTP caching methods. unlike pip, we don't need to conform to an existing language's compatibility/caching decisions, so we could key our global cache of compiler checks by a checksum of the check's implementation code + environment variable values (maybe we checksum the file path CC points to as well, or we parse the output of cc --version), and we get:
(1) compiler check scripts with checksummed contents (autoreconf won't change output bytes -- more secure)
(2) most checks simply won't need to be run again since checks are not frequently updated (?), so from-source packagers like spack/nix/guix get free speedup
(3) even after wiping the global cache, checks can still be parallelized (this isn't magic; we would need to develop a community of checks that can safely be run in parallel. but i think developers should be able to do this if we can give them a tool that leverages that parallelism)

the above is all brainstorming, but i feel really excited by the prospect of:
(a) parallelizing AC_CHECK_HEADERS/_FUNCS
(b) the more general global check caching approach
as a path to a better build space. @irenes has always said that the portability from autotools configure checks is really valuable and worth preserving, and i would be super interested in expanding that approach to languages like rust as well where it makes sense.

2

0

0

Zack Weinberg

zwol@hackers.town

Reply to @hipsterelectron@circumstances.run

@hipsterelectron @seindal @spack @irenes@mastodon.social thanks for the brain dump! I will reply at more length when I'm not halfway out the door.

(Have you tried the existing config.cache/config.site mechanism to bypass doing checks over and over? I'm vaguely aware it has problems, which is why it's off by default, but I do not actually know what the problems are.)

1

0

0

d@nny disc@

hipsterelectron@circumstances.run

Reply to @zwol@hackers.town

@zwol @seindal @spack my next step should absolutely be to learn the format of the existing config cache file!

1

0

0

ArneBab

ArneBab@rollenspiel.social

Reply to @hipsterelectron@circumstances.run

@hipsterelectron back when I used Gentoo there used to be confcache, but IIRC that had some recurrent issues so it didn’t become the default.
@zwol @seindal @spack @irenes

1

0

0

d@nny disc@

hipsterelectron@circumstances.run

Reply to @hipsterelectron@circumstances.run

@zwol @seindal i'm not familiar with config.site but it absolutely sounds like the right place for a global cache

0

0

0

d@nny disc@

hipsterelectron@circumstances.run

Reply to @ArneBab@rollenspiel.social

@ArneBab @zwol @seindal @spack @irenes it sounds like people have been trying to solve this problem for a while then! :) maybe we can finally solve it once and for all

1

0

0

ArneBab

ArneBab@rollenspiel.social

Reply to @hipsterelectron@circumstances.run

@hipsterelectron good luck! @zwol @seindal @spack @irenes

0

0

0

forrestthewoods

forrestthewoods@mastodon.gamedev.place

Reply to @zwol@hackers.town

@zwol I would like to see ./configure die for non-bootstrap users. It’s so very “C project from the 90s”.

Zig has the right approach imho. Pre-generate headers and “import libs” for target platforms / library versions. It’s an exceedingly tractable problem afaict.

Configuring for a given target has a deterministic output. Forcing everyone to repeat themselves feels deeply wrong.

1

0

0

forrestthewoods

forrestthewoods@mastodon.gamedev.place

Reply to @forrestthewoods@mastodon.gamedev.place

@zwol do I think Bazel/Buck2 is better enough to catch this style attack? Probably not.

Do I think they’re better than autotools/make/cmake? Absolutely, by leaps and bounds!

I live in a wildly crossplatform world. Linux is so much more complex and painful to build for. But it really doesn’t have to be! It can be done better. Imho. Respectfully.

1

0

0

Zack Weinberg

zwol@hackers.town

Reply to @forrestthewoods@mastodon.gamedev.place

@forrestthewoods I have no experience with Bazel or Buck2; I'd certainly look real hard at them before beginning any hypothetical "autoconf 3.0" project.

I do not like the idea of pre-generated "this is what this system is like" files basically because I do not want to rule out the possibility of a new era of diversity in low-level APIs. Feature probes adapt more incrementally to new systems. It *would* be interesting to try to resurrect the idea of a shared probe result cache, though.

1

0

1

Zack Weinberg

zwol@hackers.town

Reply to @zwol@hackers.town

@forrestthewoods also, "ruling out" xz-style insider malware implants is not something one can do with technical measures. It's a people problem and it has to be addressed with people measures.

0

0

1

Martin Seeger

masek@infosec.exchange

Reply to @seindal@mastodon.social

@seindal @zwol m4 was the rope sendmail was hanged with.

1

0

0

René Seindal (MOVED)

seindal@mastodon.social

Reply to @masek@infosec.exchange

@masek @zwol
I don't think GNU m4 had anything to do with that.

0

0

0

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org