Conversation

"Repository name should contain only alphanumeric, dash ("-"), underscore ("_") and dot (".") characters."

Hot take: we're too attached to the script. It's a historical accident that I can't call my repo "привет" on a code , and it's pushing a cool part the native cultures of billions of humans to the margin. Git itself can do it no problem!

And why should Дима switch to Latin for naming project if John doesn't switch to anyway?

4
0
0

@dcz vile anglosax even devised bigger hit - they invented programming languages in latin, a clear attack to slavic culture to undermine equality!111

2
0
0

@ruff Not just on Slavic culture, also on Japanese culture. And Indian. And Chinese. And Native American. And, and, and...

Or we could just understand that this is a historical accident and stop sticking to it as the One True Way.

0
0
0

@ruff @dcz I mean, almost every non-english language uses at least an extended latin alphabet, so it excludes virtually every language except english.

1
0
0

@trimethylpentan @dcz exactly. And the reason being - no one wahts a headache with the stringprep, and without that you're risking with uniqness or readability of the identifier. So to make everyone equally unhappy the possible solution is to move to uuid only.

1
0
0

@ruff @trimethylpentan No one (developers) wants a headache with stringprep, so let's give the other 99% of the world (users) headache with code switching.

Yeah, that doesn't sound so great.

Maybe having identifiers is indeed the root of the problem. Why can't the identifier just be the name? People are referring to things by name in the real world, not by a vaguely related identifier.

1
0
0
@dcz I would find it annoying if git repository name or file name contained czech characters. Fortunately noone does that. Containing chinese, for example, would be .. very annoying. Just don't. I still run system limited to US-ASCII.
2
0
0

@dcz @trimethylpentan hm, i thought source code repositories are for developers, not for users. But I could just be narrow-minded.

1
0
0

@ruff @trimethylpentan Developers (coders) are not the only people participating in creating software.

But have it your way. Make it 80% of all people having headaches due to code switching.

0
0
0

@dcz Just self-host your repositories using something simple like git-instaweb or cgit, those should be able to handle repos with such names just fine. All these extra fancy modern UIs are more limiting than necessary compared to “hey here’s a patch over email enjoy” and a simple web interface that simply does the job and get out of the way.

1
0
0

@tyil I could do that, but it won't change that all the coders are not so subtly nudged to use names for their projects drawn from a set of cultures representing a minorty of the population.

(Also I won't do that any time soon: https://dorotac.eu/posts/git-botch-email )

0
0
0

@pavel I am also immersed in the English-language culture, not expecting projects to stray far from it. But I can easily see a version of myself who grew up in X and stayed in X collaborating with Xians to prefer to name projects in X script.

That's a choice that is not available on platforms which are positioning themselves globally. (Free Software also belongs to this category if you see it as cross-culture.)

We can write software in Chinese but not name it Chinese?

http://chinesepython.org/english/english.html

0
0
1

@pavel What you're really annoyed about is code switching. The US keyboard doesn't have an easy č.

As a kid I was fascinated by the keys Q, and V, and X. Those are not letters of the Polish alphabet. We don't learn them much at school.

Why we can type them easily is - again - a historical artifact of typing tech coming from the English zone (not unavoidable: https://en.wikipedia.org/wiki/%C4%84%C5%BDERTY?useskin=vector#%C4%84%C5%BDERTY_(Lithuanian) )

1
0
0

@pavel A culturally optimal keyboard would not have useless keys, so it would not be able to type in the name of the project called, say, "xorg".

Exactly the same way that en-us can't type in "Čapek".

The one-sided difficulty code switching is favoring en-us names, which is the problem I'm seeing.

1
0
0

@dcz Would it help if repository names would support more characters, while profile and org names would not?

This might be actually worth a consideration.

In recent years, there have been a lot of attacks by using lookalike unicode characters, so although technically feasible, many fear to accept non-ASCII characters.

But since repositories of the same author already induce a certain trust, it might be worth taking that one step.

Happy to start the conversation about this in#Forgejo.

~f

2
0
0

@Codeberg I think it would be a very good first step. There are good reasons for not internationalizing domains. That's all because the namespace owner does not vet the entries.

In case of projects belonging to a user, the user *does* control the entries, so I'm not seing any downsides.

0
0
0

@Codeberg Unicode libraries exist to "normalise" strings in order to look for similarities in appearance.

@dcz

1
0
0

@nemobis @Codeberg I think it could be a starting point for detecting duplicate internationalized user names, which is already being done anyway.

0
0
0
@dcz Well, culturally optimal keyboard would be useless, as you could not write programs on it. Many people use en-US here... "Capek" is understood even without special characters.
1
0
0

@pavel It's not true that every culturally optimal keyboard would be useless. The cause and effect is reversed here: programming languages are set in US-Latin *because* keyboards are US-Latin. And only typically, because you could program this one on a Chinese-optimal keyboard:

http://chinesepython.org/english/english.html

"Capek" may be understood", but "bąk" is very much not the same as "bak".

1
0
0
@dcz No. Programming languages are US-ASCII because they are in English, and that in turn means we can all collaborate on projects easily. We do actually have programming language based on czech, it is called "Karel"... but it would not really be suitable for global collaboration.

to_the_wall znamena
dokud neni zed
krok
konec
konec

Also iirc it is based on czech language, but it still uses US-ASCII.
1
0
1

@pavel Right, but why are programs in English?

Some of that is because people want to collaborate on something.

The rest of that is the same historical accident that gave us Latin keyboards.

That accident feeds into programming languages, and programming languages favor Latin keyboards again.

That doesn't mean that Latin is good or bad, but it does mean it's overrepresented.

1
0
0
@dcz Pocitacovy svet se rozhodl, ze bude pouzivat anglictinu jako spolecny jazyk. (Stejne jako letecky a namorni svet). A to je dobra vec -- protoze diky tomu si muzem bez problemu povidat, aniz bychom se museli ucit spoustu cizich jazyku.
1
0
0

@pavel Angielski to dla mnie też obcy język.
Komputerowy świat nie jest jeden. Jeden jest świat open source, inny jest świat studentów w Warszawie, a inny inżynierów w Chinach.
Ci ostatni tylko tyle znają angielski, żeby wpisać "if" "else" bez zrozumienia.
Dlaczego utrudniać życie tym, co mogą pisać po swojemu?

1
0
0
@dcz The amount of language in programming languages is really low. Ability to share code worldwide is far more important than having to learn 10-or-so english words you are using in programming language.
1
0
0

@pavel That's a value judgement :) As far as personal values go, there's no right or wrong, just preferences.

To put something else into consideration, does learning familiar words make English-language kids more likely to engage with programming as an activity?

From chinesepython.org:

"basic computer programming concepts are, not difficult at all, but turned out to be so difficult for some chinese people because they have to first break a language barrier"

1
0
0
@dcz I don't believe English helps you understand programming languages just because they usually share few words.

But I do agree than Iran, Russia, China and North Korea should get their own localized version of all used languages. They also should get culturally-optimal charsets (utf-8 is using too many bytes) and should get local versions of http and html...
1
0
0

@pavel I'd look for some paper about the rate of language learning by youngsters, but I'm too lazy.

But languages are only a small part of the point. You're not typically meant to come up with your language for a project.

Even then, naming variables in Spanish or Russian is normal and expected (Python lets me). Except you can't culturally adjust the most important: your project name - if you use mainstream forges.

Is code meant for humans or computers? (*ahem* )

0
0
0