Algorithms

Gerry Schmitz13-Nov-22 10:21

13-Nov-22 10:21

That's what I said: no common base.

"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

trønderen13-Nov-22 13:50

13-Nov-22 13:50

Gerry Schmitz wrote:
Some languages do not have words for concepts that exist in other languages

That is certainly not limited to programming languages! It is a general problem for any translator of natural languages.

From an end user point of view, you can expect the user to have the required terms in his language. If English (i.e. the programming language) lacks the ability to express the problem concepts, then that is a problem belonging to the programming language, not to the problem solution as experienced by the customer / end user. You simply cannot go to a custormer and say: Sorry, mate, your suggested solution is perfectly fine, but we cannot use it, because the English language doesn't have words for those concepts! We have to solve the problem in a different way!

When programming in a high level language, you will almost always make use of concepts that do not exist in machine code. Even a 'high level machine code', such as .net CIL, lacks a lot of higher concepts. Yet you can express a problem solution that can be handled in CIL. You can even express problem solutions in lots of different languages, and they can all be handled by CIL!

Or moving yet another level up: With the GNU Compiler Collection, each source language - which on the surface may be quite different from other source languages - are parsed down to a parse tree, a format common to all languages.

And then: I never was considering any "universal language interpreter". You need not lump everything from lisp to Algol68 to APL to Erlang into one single structure to get away from programming being forced to be done in English. Different languages have different uses; that should be maintained. It is sufficient that the standard representation of a language such as C# used abstract tokens: Rather than 'w', 'h', 'i', 'l', 'e' in a 7-bit-ASCII-file (you still see that a lot of places!), the representation is [while loop], which can be displayed in various languages. A variable reference is not coded as 't', 'o', 't', 'a', 'l', '_', 's', 'u', 'm', but as [variable 277], which may be assigned the external identifier 'total_sum' for English, and 'totalbeløp' for Norwegian presentation.

I frequently get the feeling that we programmers actively want our code to be unintelligible for the customer (and for that sake, for other programmers of a different clan), so that we maintain full control over it. We do not ask the customer for his opinion about how the problem can be solved; at most we present some top-level box diagrams of how we will solve the problem. We most certainly don't want to discuss algorithms and code structures with the customer and future users!

I think we ought to. I think learning how the customer approaches the problem will improve our code significantly. It could improve the user interface tremendously! Then the customer and end user must understand the solution. It is far from enough that we, the programmers, understand it!

You can use e.g. ER for modelling the user's data (it is so well suited for communicating with end users that it is a pity it has essentially been totally abandoned today). You can describe your solution methods in pseudocode based on the customer / end user's native language. The problem is that most programmers either refuse to do so, calling it doing the work twice (i.e. pseudocoding and programming), or it is just sort of an act of courtesy: When the customer meeting is over, all the pseudocode is thrown away and the programmers do it how they see it fit, not the way the customer and pseudocode indicated.

To me, communicating with the customer / end user is equally or more important than communicating with workmates. And fact is that even when I discuss program code with some other Norwegian workmate, we speak in Norwegian. Usually, we will even use Norwegian words for coding terms such as 'metode', 'variabel', 'løkke' and 'unntak' (exception). If we could write down what we say, it could even be possible to discuss the code with a customer who is not fluent in English!

jschell14-Nov-22 6:06

14-Nov-22 6:06

trønderen wrote:
I frequently get the feeling that we programmers actively want our code to be unintelligible for the customer...I think we ought to.

It has been tried. On large scale and small.
And it continues to be tried.
But it does not work.

I repeatedly run into feature discussions where there claim is made that it must support complex scenarios but must also be 'simple' enough for users to understand it. That has resulted in in the following real cases.
1. Customer is now responsible for learning a programming language and providing programmers for it. Different projects required C#, Java and even a variation (and not a good one) of C. (The solution requires them to write code, application compiles it and dynamically injects it into the application.)
2. Solution is provided that does not provide full functionality for the actual known more complex cases. The customer is told they cannot have that feature.

There is always a point where one reaches that the complexity of the allowed solution requires specialized knowledge and training just to achieve the task. Thus no matter how one wraps it up there must always be a 'programmer'.

trønderen wrote:
it could even be possible to discuss the code with a customer who is not fluent in English!

But only if they can program. And except for algorithm discussions they would also need to be a programmer in the language you are discussing.

trønderen14-Nov-22 9:57

14-Nov-22 9:57

jschell wrote:
trønderen wrote:I frequently get the feeling that we programmers actively want our code to be unintelligible for the customer...I think we ought to.

Rather creative quoting you are doing there! I do not thing we ought to make "our code to be unintelligible for the customer"!

jschell wrote:
It has been tried. On large scale and small. And it continues to be tried. But it does not work.

I know of lots of end user 'macro' languages that exist in different language varieties; even system functions are localized. People with no programming background are capable of adapting applications to their own needs without having to learn English. It certainly works in the small.

I do not know of any compiler storing the code as a semi-parsed tree of abstract tokens, applying a concrete syntax only in the presentation for a human developer. I am simply unfamiliar with any other large-scale failed try to localize any tool, whether programming tool or tool for other application areas, where a significant deployment of localized versions was pulled back and replaced with English language versions.

If you can point to one example of failure: One failed project does not imply that the principle has no merit. If you are eager to 'prove' that English Is The Answer, you may of course justify you attitude by referring to the failure. Otherwise, you may study the failure to learn why it failed, and what could be done in better ways.

An example: The first release of localized Excel formulas did localize function names. In multi-language corporations, you could not share a spreadsheet between those working in an English context with those in a Norwegian context - the function names from the 'other' language were not found. In your approach, it seems like the proper solution would be to force everyone back to English. Rather, a later Excel version replaced the internal representation of system functions (which was by the localized name) with an abstract reference, sort of like 'built-in 37', which was displayed as 'average' in English versions, 'gjennomsnitt' in Norwegian versions. (In a spreadsheet, 'variables' are referenced by row and column, so the problem of localized variable names does not occur.)

jschell wrote:
it could even be possible to discuss the code with a customer who is not fluent in English!

My old mother could not distinguish between a PC and an electric heater, but she was fluent in English. When I started programming, in the old Pascal days, she was curious about it, and I spent some time on taking her through a number of Pascal program. She was really fascinated by the orderly, disciplined way of approaching a problem and building a solution! She never programmed a line herself, but she was fully capable of following my walkthrough of a moderately sized Pascal program. But then: Pascal was far more readable than today's C++!

I also had a strongly visually handicapped daughter and had to write her a few support programs. She was at the outset (age from 9-10 years) curious about daddy's work, and got really excited when I was making something for her. Discussing how to structure the solution with her was very simple, and she could see how I shaped that into program functions (even though she never coded a single line herself).

In my professional work, I have been discussing topics like data flow in the city administration and library organization, all with non-computer people. I have been teaching macro programming in an office automation system to users who had never seen an electrical typewriter (so my analogy from the on/off switch on the terminal to the on/off switch of a typewriter failed...). I have been teaching '101 Programming' to people who had never before sat down at a computer (this was around 1990). In other words: I have long experience in making non-computer people understand a computer-systematic way of organizing the user's data structures, breakdown of the total problem into well defined, orderly tasks, discussing alternate solution methods.

I know that users with domain knowledge and experience are very good at understand even tiny little details in a computer solution, if you are willing to listen to them when they tell you something and try to talk in a similar language when you explain your proposals to them. And you are prepared for your proposals being exactly that: Proposals, that the domain expert may have objection to. You are no sort of god, even if you are the one mastering the compiler.

jschell28-Nov-22 10:17

28-Nov-22 10:17

trønderen wrote:
I do not thing we ought to make "our code to be unintelligible for the customer"!

I realize what you are saying.

trønderen wrote:
My old mother could not distinguish between a PC

I didn't claim people were stupid. I never do that. I disdain programmers that think users are stupid.

But as I pointed out your idea is not new. COBOL was created with that in mind in that someone besides a programmer could read the code and more easily understand that.

The problem however is still that to actually create an application which is complex the details/process requires that someone somewhere must still be a 'programmer'. And all attempts to move that out of the developer space either result is something that only supports simplistic examples or it requires that someone else (like a customer) must then become a 'developer.'

trønderen wrote:
I have been teaching '101 Programming' to people who had never before sat down at a computer

So you were teaching them to be programmers. Not users.

trønderen wrote:
I know that users with domain knowledge and experience are very good at understand even tiny little details in a computer solution, if you are willing to listen to them when they tell you something and try to talk in a similar language when you explain your proposals to them.

I have written requirements for entire systems based on user/customer requests. Designed architectures and designs to meet the needs as they describe. While leading them through the process of not only describing what they want and need but also picking through the parts that they understand but have not verbalized such as (the very common need) of how to handle failure scenarios.

But I do that so they can focus on what they do best while others (developers) focus on what they do best.

Gerry Schmitz15-Nov-22 7:48

15-Nov-22 7:48

I found that if I deliver software that "I would like to use", it never fails to please. I also write in such a way, that I don't need to create help files. If I needed something, a video screen capture would be all that was needed; perhaps 10 minutes.

As for "communicating", I learn the users job, and lingo, to the point I can do it. In English.

No new languages were created.

(And the end user doesn't care what "coding" language I use; as long as they're happy)

And I have no problem talking customers out of software and services, including my own, that they don't need.

modified 15-Nov-22 13:57pm.

jschell28-Nov-22 10:20

28-Nov-22 10:20

Gerry Schmitz wrote:
I have no problem talking customers out of software and services, including my own, that they don't need.

My understanding now however is that...

Customers need applications to do certain tasks. From the service provider point of view you must provide that so they continue to be a customer.

Customers want applications to do certain tasks. From the service provider point of view you must provide that so they become a customer.

The two might overlap but certainly sometimes they do not.

Gerry Schmitz28-Nov-22 16:59

28-Nov-22 16:59

Quote:
Customers need applications to do certain tasks. From the service provider point of view you must provide that so they continue to be a customer.

The customer who wants an applications to do certain tasks is "wrong".

They tell me what they "need", and I tell them what "tasks" the application will perform, if any.

e.g.

(1.) "We send sports statistics for evaluation and reports". It is very slow and we need you to write a new app to speed up the process.

All they needed to do was zip the files. (True story).

(2.) Streamlining a law office.

I sent them happily off using SharePoint Online subscriptions.

§

You're suggesting you would write them a redundant app and take the money ... because "I must provide it, etc." Not for me you don't.

jschell29-Nov-22 6:12

29-Nov-22 6:12

I believe you are talking either about implementation and not features or just about doing a better analysis of finding a solution to a problem that the user stated.

Doesn't change what I said however in that the user still is stating needs and wants. They might want the application to be a 100 times faster but they do not need that. They might need a way to enter an external invoice number into the tracking system but they might want that to replace the internal invoice number.

Gerry Schmitz wrote:
You're suggesting you would write them a redundant app and take the money

No that is not what I am suggesting.

I am talking primarily about SaaS and the difficulty in creating something that produces a profit (not just revenue) on a continuing basis.

Gerry Schmitz29-Nov-22 7:12

29-Nov-22 7:12

You think you know a lot about how I operate but you don't.

I also "build" systems. People will tell me I need to "add this". I show them "if you do it this way, then that works too".

Which gets back to: I learn the customer's business ... so I don't build pointless functionality ... just to get paid.

And I only work as a "lead", so, I would have a lot to say about your M.O.

jschell30-Nov-22 6:42

30-Nov-22 6:42

Gerry Schmitz wrote:
And I only work as a "lead", so, I would have a lot to say about your M.O.

Sigh...no idea what that comment has to do with anything that I said.

But since it seems to be questioning my competence....
I have been a senior developer for more than 30 years. Including in supervisory roles.
I have written requirements, architectures and designs too many times to count.
I have analyzed customer requests including official Statements of Work to insure both that the needs of a company (one I was working with/for at the time) and the customer were being met.
I have seen others fail to do that - such as the case where one third of the customer company refused to use the new system because of the failure to meet (and even discover) a single need (not a want.)

Besides working with customers and have also worked with Sales. Those people that are attempting to convince others to provide new business and not just existing business.

I will also note I have worked with at least one consultant that took several hours to present a solution (a complex one) that was really 'cool' but which presented a solution for which the company neither had a need or a want. And when I questioned the assumptions behind that I was told I didn't know what I was talking about. At which point the CEO and founder also spoke up and also noted he had no idea what assumptions the consultant was using either because they did not fit the actual business model.

So I will stick with what I was saying. There are 'needs' and 'wants'. They can and do overlap. They can and do often serve different purposes.

Eddy Vluggen12-Nov-22 10:53

Eddy Vluggen

12-Nov-22 10:53

You post something that should be an article, not as a mere comment.

trønderen wrote:
From a factual point of view, you are perfectly right.

Well, I'm never wrong..

trønderen wrote:
In application development, the great majority of your communication is done in the local language

I only work in English, all comments in code are in Engrish, and all documentation is. Never, ever, do I code in the local language, as no dog is ever gonna learn Dutch just for maintaining a code base. Ever.

trønderen wrote:
rather than assuming that the U.S. interpretation is globally valid.

They stuck with Fahrenheits and inches.

Bastard Programmer from Hell Suspicious | :suss:

"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

trønderen13-Nov-22 13:56

13-Nov-22 13:56

Eddy Vluggen wrote:
Never, ever, do I code in the local language, as no dog is ever gonna learn Dutch just for maintaining a code base. Ever.

My idea of a token-style representation of a program is that noone should have to learn Dutch for maintaining your code base. If all they master is English, then the tokens are mapped to their English representation for that programmer. You map the tokens to Dutch when discussing the solution with your Dutch customer. Or to German if the customer (or end user) is German.

The idea is not having to learn a different language. Not even English.

Gerry Schmitz15-Nov-22 8:03

15-Nov-22 8:03

Adding an interpreter, adds another burden and potential failure point to the tool chain. Plus the requirement for an in-house local language to interpreter translator.

Eddy Vluggen12-Nov-22 11:04

Eddy Vluggen

12-Nov-22 11:04

Aight, my turn, and I'll provide justification.

I'm a non English native, but all documentation is in English. It has nothing to do with imperialism, it's just the ring that binds us all. Engrish is simple to learn, and as such it became the language of documentation.

So. If you want to code, you better learn English and not French.

As for your Frenchies, I do not take your questions into consideration if you cannot speak English. Any code written containing French isn't worth the time. Or simpeler, I'd delete it Smile | :)

Ctrl A. delete.

Bastard Programmer from Hell Suspicious | :suss:

"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

trønderen13-Nov-22 14:03

13-Nov-22 14:03

Eddy Vluggen wrote:
So. If you want to code, you better learn English and not French.

My concern is not that I have to master English (I guess I do master it far above the level required for programming), but the customer / end user.

We programmers have a tendency to lock ourselves up in an ivory tower, where we want to lock the door and work in total isolation (although maybe as a programming team, not as individuals), most certainly isolated from the customer and the users. We refuse to communicate with anyone that doesn't master our tribal language to perfection.

I think this is very bad for our profession. We have a lot to learn from those who have the problems/tasks that we are trying to solve. They do not speak our tribal language. We have to speak their language.

Gerry Schmitz15-Nov-22 8:09

15-Nov-22 8:09

You're talking about an IT "shop"; individuals and smaller outfits wouldn't be able to function in the outside world with that attitude.

The one thing that progress did, was create middle men (i.e. Business analysts) that separated user and "creator".

A Technical Lead cannot be a lead without having interacted with the eventual users, IMO.

Richard Deeming14-Nov-22 22:40

Richard Deeming

14-Nov-22 22:40

Eddy Vluggen wrote:
Engrish is simple to learn

I wish you'd tell some Brits that! Laugh | :laugh:

(Many "native" English speakers ... don't. At least not half as well as most non-native speakers.)

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

jschell14-Nov-22 5:46

14-Nov-22 5:46

trønderen wrote:
For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments,

I would say...no.

First, at least last time I created a compiler much less studied compilers it is not "simple" to replace keywords.

Second the latter part of that statement seems to suggest exactly the problem that compilers need to solve with the first part of what I said. Compilers (and interpreters) already convert key words into tokens. Those that do not are very inefficient (as I know since I had to work with one long ago.)

trønderen wrote:
I think the global software culture would be enrichened if we could disengage from the absolute binding to the English-speaking culture.

No as you already pointed out prior to that. Most discussion about what the software does happens in a natural language. Very likely in the vast majority of cases a single natural language.

Requirements, Architecture, Design are all in that natural language. All of those have much more impact on the solution than the actual code itself.

Significant failures do not happen at the code level. They happen due to a failure in the above processes. Such as the failure with the Mars Climate Observer. The specifications were exact and numerical (which is universal). The communication was not.

k505414-Nov-22 6:45

k5054

14-Nov-22 6:45

jschell wrote:
trønderen wrote:For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments,
I would say...no.

Or perhaps yes. Many years ago, when I was just starting university, the community college had a very old (even then) IBM mini. I don't recall the model number, but it was somewhat larger than an "executive" office desk, with a disc-pac to one side, and one of those 7 foot tall chain-driven line printers to the other. It might have been something from the 1400 series: IBM 1400 series - Wikipedia I think the instructor said that he knew of one other example of that model of computer, but it was in a museum!
Anyway, we used that computer to learn Algol and Fortran. The Algol compiler was written somewhere like McGill university, in Quebec. As such, I seem to recall that the keywords were bi-lingual, you could use either English or French. So either if or si. But the error messages were all in French. So maybe at the time the requirement that we take first year French wasn't so pointless after all. Maybe it was written in Paris: https://dl.acm.org/doi/pdf/10.1145/872738.807150 See note about bilingual details on P113

Keep Calm and Carry On

trønderen14-Nov-22 8:33

14-Nov-22 8:33

Algol68 was explicitly defined for adaptation to different languages: The syntax was defined using abstract tokens that could be mapped to various sets of concrete tokens.

This is no more difficult than having a functional API definition with mappings to C++, PHP, Fortran, Java, ... Obviously, to define these mappings, you should both thoroughly understand the API, and of course the language you are mapping to. It is not always a trivial thing to do.

When you choose concrete tokens for a programming language, it is not something that you do a Friday night over a few beers. It is professional work, where you must know the semantics of those abstract tokens, and you must know the natural language from which you select your keywords. You must be just as careful when selecting a term as the English-speaking language designers when they select their English terms. If the language defines some tokens as reserved, you must honor that even for your alternate concrete mapping.

In your French Algol version, I assume that the source code was maintained in a plain text file (probably in EBCDIC, for IBM in those days), handled by the editor of your choice. Switching between English and French would require a textual replacement. If the source code was rather stored as abstract tokens, maybe even as a syntax tree, it would require an editor specifically made for this format. (Note that you could still have an selection of editors for the same format!) The editor might choose to look up the concrete syntax only for that part of the tree that is at the moment displayed on screen. 'Translation' is done by redrawing the screen, using another table of concrete symbols.

This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit. I sure can agree that it is fully possible to construct obstacles for preventing any sort of change in our ways of thinking. I am not hunting for that kind. Like you, k5054, I observe that 'It happens, so it must be possible'.

jschell29-Dec-22 11:48

29-Dec-22 11:48

trønderen wrote:
The syntax was defined using abstract tokens that could be mapped to various sets of concrete tokens.

In Computer Science the area of Compiler Theory is very old and very well studied.

Your statement is describing something that well designed compilers (and interpreters) already do. Only time I have ever seen a 'compiler' not do that it was coded by someone who had zero training in how the science of Compilers.

As I suggested before the problem is not in creating tokens. The problem is in creating the language in the first place such that it is deterministic and second it creating a compiler that can report errors. That last part is the most substantial part of every modern compiler (even toy ones.)

trønderen wrote:
If the source code was rather stored as abstract tokens,

Parsing text into tokens is the first part of what all compilers/interpreters do.

Following is one source of the very well known process Compilers already do.

Compiler Design - Phases of Compiler[^]

What you are describing does not have anything to do with the actual problem.

English version of a standard (very standard) part of programming languages

if x then y

Now the French version

si x alors y

So in the above for just two natural languages you now have 4 keywords in the language.

Lets add Swedish

om x så y

So for every language added it is reasonable to expect that the number of keywords would be duplicated. Keywords often cannot be used in code both because it makes it much harder for the compiler to figure it out and for it to correctly report on errors. Additionally even when the context allows the compiler to figure it out it does not make it ideal for human maintenance.

Consider the following statement. If one was using a different native language to drive the compiler then the following should be legal. But in the english version do you really want to see this code?

int if = 0;

So not only would the number of keywords increase but the programmer would still need to be aware of all of those keywords while coding.

Now besides the increasing number of keywords the following are some of the problems that I see.
1. Two programmers are working on the same file. The file MUST be syntactically correct before developer A (English) goes on vacation. Because otherwise the mechanism (code) that must translate it back from the english form will not work when Developer B is french.
2. Comments cannot be supported.
3. Third party APIs would still require whatever is supported by by the 3rd party service (library, Rest, TCP, whatever.)
4. Adding new languages to the compiler after first release would mean that existing applications could break because existing code might use them. This also is a well known problem that exists right now when new functionality is added to an existing compiler. So all known languages would need to be supported on first release.

trønderen30-Dec-22 6:57

30-Dec-22 6:57

jschell wrote:
In Computer Science the area of Compiler Theory is very old and very well studied.

I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)

The problem is in creating the language in the first place such that it is deterministic and second it creating a compiler that can report errors. That last part is the most substantial part of every modern compiler (even toy ones.)

Reminds me of VAX/VMS: Every message delivered by system software (including compilers) were headed by a unique but language independent numeric code. Support people always asked you to supply the code; the message text could be in any language - they never read that anyway.

So for every language added it is reasonable to expect that the number of keywords would be duplicated.

You are missing my point completely. Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.

Keywords often cannot be used in code both because it makes it much harder for the compiler to figure it out and for it to correctly report on errors.

Noone is suggesting that you are allowed to use the binary [if] token as a user defined symbol.

The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token. For creation of new control structures, an IDE working directly on a parse tree representation could provide function keys for inserting complete control skeletons. I have been working with several systems working that way, both for data structures, graphic strucures - and for program code, although the latter inserted textual keywords, not binary tokens the way I wish it to do. Once you get out of the habit of thinking of your program as a flat string of 7-bit-ASCII characters, it it actually quite convenient! (You can assign the common structures, like if/else, loops, methods etc. to F1-F13 keys so that you don't have to move your hand over to the mouse for selecting from a menu.)

So not only would the number of keywords increase but the programmer would still need to be aware of all of those keywords while coding.

Quite to the contrary! The programmer might very well define a variable named if, which is distinct from the binary token [if]. There would be no reserved words on the textual level.

A not very well known fact: Classic FORTRAN actually managed without reserved words. I just posted an entry in 'The Weird and the Wonderful' - something from my student days that I found in a box in the basement - to illustrate the point. Note, however, that F77 philosophy is not what I am asking for: It did not represent control (and other) structures by binary tokens, but relied on semantic analysis of plain text source code.

1. Two programmers are working on the same file. The file MUST be syntactically correct before developer A (English) goes on vacation. Because otherwise the mechanism (code) that must translate it back from the english form will not work when Developer B is french.

I say again: You missed my point completely. If the IDE stores the code as a parse tree, it is syntactically correct, otherwise the IDE would not have accepted it. Of course developer B may define user variables and methods with French names, but so he can in any IDE environment.

2. Comments cannot be supported.

Why can't the parser define a binary 'comment' token, and store that in the parse tree? In one project I am currently working on (which is not a general programming language, but an application specific control language), we are doing exactly that. The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed. (Otherwise, when the English, say, comment is displayed, you can add a French translation of it.)

3. Third party APIs would still require whatever is supported by by the 3rd party service (library, Rest, TCP, whatever.)

I have been working with third party APIs with French method and parameter names, in an otherwise English language environment; it was a nightmare ... If you define a language along the lines I am suggesting, a library would be delivered as a parse tree as well, along with one or more (i.e. different languages) symbol tables for use in the API. (This is how we do it in that application control language mentioned above). Otherwise, if the binary interface is given, the library comes in a compiled, linkable format with given entry point symbols, your parse tree interface to that library should include a mapping from a call token to the entry point symbol, unlinking that symbol from the external display. Establishing this mapping is a one-time operation that could follow the library file, similar to how a '.h' file follows a C library.

4. Adding new languages to the compiler after first release would mean that existing applications could break because existing code might use them.

Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes. Adding a new binary token, with its unique token ID, would not invalidate any program whatsoever. Of course there is the question of where the display mapping is done: If the IDE does it, and imports a new compiler with new binary tokens, it might not have a proper French or Swedish word to represent it. If the new and extended compiler is delivered with a token display mapping table for a number of languages, the problem is significantly reduced. (The user may have a language fallback list, both for comments and other binary tokens, so that something meaningful is displayed, although not in the primary language.)

As I wrote in my post,

trønderen wrote:
This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit.

Almost all of your comments are fundamentally based on the idea that a source program really, as a matter of fact, is a string of 7-bit-ASCII characters, and this will always remain true. I am suggesting that it is not.

Compare an old style text formatter such as troff with, say, MS Word: You may argue that '\fI' is like a reserved word for italicizing text; you cannot use it as plain text (without quoting). Troff stores everything as plain text. MS Word does not - prior to .docx, the storage format was a true binary format, and even XML is just a storage encoding - internally, the working format is binary, just like before. In MS Word, '\fI' is freely available as document text without quoting. Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.

I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.

jschell2-Jan-23 13:14

2-Jan-23 13:14

trønderen wrote:
I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)

C? Compiler theory applies to any language (including interpreters.)

trønderen wrote:
Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.

That is a non-starter.

The human needs to write the code. Using token representations that the user is responsible for memorizing would not work. If the user at any time uses something like 'if' and 'then' then those are keywords for the language. That is how it works. Just as in native languages it works that way. Changing semantics (english) does not alter the role of what a system that eventually must run code must still do in that it still must convert the keywords into something else.

And defining keywords is necessary for any computer language because it is not deterministic otherwise.

trønderen wrote:
The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token.

Errr...no idea what you are talking about.

The 'bold' just becomes part of the textual representation of the keyword. No different than requiring that the keyword is in lower case.

You seem to think that because you use bold on a keyword that it is no longer a keyword. It doesn't matter how you differentiate the language specification is it still a keyword.

And no developer is going to work in a language where they need to make keywords by switching from bold and back.

trønderen wrote:
Why can't the parser define a binary 'comment' token,

Because the content of the comment is NOT the token that tells the compiler that it is comment. The content of the content is what is contained by the comment. So in the following the value of the comment in text not the '//'

// A comment in english is useless in french.

trønderen wrote:
I have been working with third party APIs with French method and parameter names

Only when named parameters are supported and used can the parameter names matter.

And when you use it in English exactly how are you going to use that method unless you have English that tells you how to use it?

trønderen wrote:
Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes

I have studied Compiler Theory formally and informally for a long time. That statement, by itself, is not possible.

As I said, tokenization itself, is not something that is new in Compiler Theory. It has been there for a very long time. That very word is the process of converting keywords to tokens. You seem to think you are going to be able to remove keywords from the definition of the language but failing to describe, in detail, how a user is then going to be able to do something without using keywords.

trønderen wrote:
Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.

You do understand that a MS Work doc is a binary file which has embedded symbols in it which define the format?

The text of the document is NOT the relevant part. The analogy to code for a Work doc is that all of the text that you see in MS Word is a 'comment'.

However when you write code most of what you write and what you debug is not comments. So you are proposing the the keywords of the language would be written using combination key presses. For every single thing that one wrote.

trønderen wrote:
I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.

Knock yourself out. It is call BNF - Backus Naur Form notation.

The Java example (however it has bugs in it.)

Chapter 18. Syntax[^]

trønderen3-Jan-23 14:39