|
That's what I said: no common base.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
Gerry Schmitz wrote: Some languages do not have words for concepts that exist in other languages That is certainly not limited to programming languages! It is a general problem for any translator of natural languages.
From an end user point of view, you can expect the user to have the required terms in his language. If English (i.e. the programming language) lacks the ability to express the problem concepts, then that is a problem belonging to the programming language, not to the problem solution as experienced by the customer / end user. You simply cannot go to a custormer and say: Sorry, mate, your suggested solution is perfectly fine, but we cannot use it, because the English language doesn't have words for those concepts! We have to solve the problem in a different way!
When programming in a high level language, you will almost always make use of concepts that do not exist in machine code. Even a 'high level machine code', such as .net CIL, lacks a lot of higher concepts. Yet you can express a problem solution that can be handled in CIL. You can even express problem solutions in lots of different languages, and they can all be handled by CIL!
Or moving yet another level up: With the GNU Compiler Collection, each source language - which on the surface may be quite different from other source languages - are parsed down to a parse tree, a format common to all languages.
And then: I never was considering any "universal language interpreter". You need not lump everything from lisp to Algol68 to APL to Erlang into one single structure to get away from programming being forced to be done in English. Different languages have different uses; that should be maintained. It is sufficient that the standard representation of a language such as C# used abstract tokens: Rather than 'w', 'h', 'i', 'l', 'e' in a 7-bit-ASCII-file (you still see that a lot of places!), the representation is [while loop], which can be displayed in various languages. A variable reference is not coded as 't', 'o', 't', 'a', 'l', '_', 's', 'u', 'm', but as [variable 277], which may be assigned the external identifier 'total_sum' for English, and 'totalbeløp' for Norwegian presentation.
I frequently get the feeling that we programmers actively want our code to be unintelligible for the customer (and for that sake, for other programmers of a different clan), so that we maintain full control over it. We do not ask the customer for his opinion about how the problem can be solved; at most we present some top-level box diagrams of how we will solve the problem. We most certainly don't want to discuss algorithms and code structures with the customer and future users!
I think we ought to. I think learning how the customer approaches the problem will improve our code significantly. It could improve the user interface tremendously! Then the customer and end user must understand the solution. It is far from enough that we, the programmers, understand it!
You can use e.g. ER for modelling the user's data (it is so well suited for communicating with end users that it is a pity it has essentially been totally abandoned today). You can describe your solution methods in pseudocode based on the customer / end user's native language. The problem is that most programmers either refuse to do so, calling it doing the work twice (i.e. pseudocoding and programming), or it is just sort of an act of courtesy: When the customer meeting is over, all the pseudocode is thrown away and the programmers do it how they see it fit, not the way the customer and pseudocode indicated.
To me, communicating with the customer / end user is equally or more important than communicating with workmates. And fact is that even when I discuss program code with some other Norwegian workmate, we speak in Norwegian. Usually, we will even use Norwegian words for coding terms such as 'metode', 'variabel', 'løkke' and 'unntak' (exception). If we could write down what we say, it could even be possible to discuss the code with a customer who is not fluent in English!
|
|
|
|
|
trønderen wrote: I frequently get the feeling that we programmers actively want our code to be unintelligible for the customer...I think we ought to.
It has been tried. On large scale and small.
And it continues to be tried.
But it does not work.
I repeatedly run into feature discussions where there claim is made that it must support complex scenarios but must also be 'simple' enough for users to understand it. That has resulted in in the following real cases.
1. Customer is now responsible for learning a programming language and providing programmers for it. Different projects required C#, Java and even a variation (and not a good one) of C. (The solution requires them to write code, application compiles it and dynamically injects it into the application.)
2. Solution is provided that does not provide full functionality for the actual known more complex cases. The customer is told they cannot have that feature.
There is always a point where one reaches that the complexity of the allowed solution requires specialized knowledge and training just to achieve the task. Thus no matter how one wraps it up there must always be a 'programmer'.
trønderen wrote: it could even be possible to discuss the code with a customer who is not fluent in English!
But only if they can program. And except for algorithm discussions they would also need to be a programmer in the language you are discussing.
|
|
|
|
|
jschell wrote: trønderen wrote:I frequently get the feeling that we programmers actively want our code to be unintelligible for the customer...I think we ought to. Rather creative quoting you are doing there! I do not thing we ought to make "our code to be unintelligible for the customer"!
jschell wrote: It has been tried. On large scale and small. And it continues to be tried. But it does not work. I know of lots of end user 'macro' languages that exist in different language varieties; even system functions are localized. People with no programming background are capable of adapting applications to their own needs without having to learn English. It certainly works in the small.
I do not know of any compiler storing the code as a semi-parsed tree of abstract tokens, applying a concrete syntax only in the presentation for a human developer. I am simply unfamiliar with any other large-scale failed try to localize any tool, whether programming tool or tool for other application areas, where a significant deployment of localized versions was pulled back and replaced with English language versions.
If you can point to one example of failure: One failed project does not imply that the principle has no merit. If you are eager to 'prove' that English Is The Answer, you may of course justify you attitude by referring to the failure. Otherwise, you may study the failure to learn why it failed, and what could be done in better ways.
An example: The first release of localized Excel formulas did localize function names. In multi-language corporations, you could not share a spreadsheet between those working in an English context with those in a Norwegian context - the function names from the 'other' language were not found. In your approach, it seems like the proper solution would be to force everyone back to English. Rather, a later Excel version replaced the internal representation of system functions (which was by the localized name) with an abstract reference, sort of like 'built-in 37', which was displayed as 'average' in English versions, 'gjennomsnitt' in Norwegian versions. (In a spreadsheet, 'variables' are referenced by row and column, so the problem of localized variable names does not occur.)
jschell wrote: it could even be possible to discuss the code with a customer who is not fluent in English! My old mother could not distinguish between a PC and an electric heater, but she was fluent in English. When I started programming, in the old Pascal days, she was curious about it, and I spent some time on taking her through a number of Pascal program. She was really fascinated by the orderly, disciplined way of approaching a problem and building a solution! She never programmed a line herself, but she was fully capable of following my walkthrough of a moderately sized Pascal program. But then: Pascal was far more readable than today's C++!
I also had a strongly visually handicapped daughter and had to write her a few support programs. She was at the outset (age from 9-10 years) curious about daddy's work, and got really excited when I was making something for her. Discussing how to structure the solution with her was very simple, and she could see how I shaped that into program functions (even though she never coded a single line herself).
In my professional work, I have been discussing topics like data flow in the city administration and library organization, all with non-computer people. I have been teaching macro programming in an office automation system to users who had never seen an electrical typewriter (so my analogy from the on/off switch on the terminal to the on/off switch of a typewriter failed...). I have been teaching '101 Programming' to people who had never before sat down at a computer (this was around 1990). In other words: I have long experience in making non-computer people understand a computer-systematic way of organizing the user's data structures, breakdown of the total problem into well defined, orderly tasks, discussing alternate solution methods.
I know that users with domain knowledge and experience are very good at understand even tiny little details in a computer solution, if you are willing to listen to them when they tell you something and try to talk in a similar language when you explain your proposals to them. And you are prepared for your proposals being exactly that: Proposals, that the domain expert may have objection to. You are no sort of god, even if you are the one mastering the compiler.
|
|
|
|
|
trønderen wrote: I do not thing we ought to make "our code to be unintelligible for the customer"!
I realize what you are saying.
trønderen wrote: My old mother could not distinguish between a PC
I didn't claim people were stupid. I never do that. I disdain programmers that think users are stupid.
But as I pointed out your idea is not new. COBOL was created with that in mind in that someone besides a programmer could read the code and more easily understand that.
The problem however is still that to actually create an application which is complex the details/process requires that someone somewhere must still be a 'programmer'. And all attempts to move that out of the developer space either result is something that only supports simplistic examples or it requires that someone else (like a customer) must then become a 'developer.'
trønderen wrote: I have been teaching '101 Programming' to people who had never before sat down at a computer
So you were teaching them to be programmers. Not users.
trønderen wrote: I know that users with domain knowledge and experience are very good at understand even tiny little details in a computer solution, if you are willing to listen to them when they tell you something and try to talk in a similar language when you explain your proposals to them.
I have written requirements for entire systems based on user/customer requests. Designed architectures and designs to meet the needs as they describe. While leading them through the process of not only describing what they want and need but also picking through the parts that they understand but have not verbalized such as (the very common need) of how to handle failure scenarios.
But I do that so they can focus on what they do best while others (developers) focus on what they do best.
|
|
|
|
|
I found that if I deliver software that "I would like to use", it never fails to please. I also write in such a way, that I don't need to create help files. If I needed something, a video screen capture would be all that was needed; perhaps 10 minutes.
As for "communicating", I learn the users job, and lingo, to the point I can do it. In English.
No new languages were created.
(And the end user doesn't care what "coding" language I use; as long as they're happy)
And I have no problem talking customers out of software and services, including my own, that they don't need.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
modified 15-Nov-22 13:57pm.
|
|
|
|
|
Gerry Schmitz wrote: I have no problem talking customers out of software and services, including my own, that they don't need.
My understanding now however is that...
Customers need applications to do certain tasks. From the service provider point of view you must provide that so they continue to be a customer.
Customers want applications to do certain tasks. From the service provider point of view you must provide that so they become a customer.
The two might overlap but certainly sometimes they do not.
|
|
|
|
|
Quote: Customers need applications to do certain tasks. From the service provider point of view you must provide that so they continue to be a customer.
The customer who wants an applications to do certain tasks is "wrong".
They tell me what they "need", and I tell them what "tasks" the application will perform, if any.
e.g.
(1.) "We send sports statistics for evaluation and reports". It is very slow and we need you to write a new app to speed up the process.
All they needed to do was zip the files. (True story).
(2.) Streamlining a law office.
I sent them happily off using SharePoint Online subscriptions.
§
You're suggesting you would write them a redundant app and take the money ... because "I must provide it, etc." Not for me you don't.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
I believe you are talking either about implementation and not features or just about doing a better analysis of finding a solution to a problem that the user stated.
Doesn't change what I said however in that the user still is stating needs and wants. They might want the application to be a 100 times faster but they do not need that. They might need a way to enter an external invoice number into the tracking system but they might want that to replace the internal invoice number.
Gerry Schmitz wrote: You're suggesting you would write them a redundant app and take the money
No that is not what I am suggesting.
I am talking primarily about SaaS and the difficulty in creating something that produces a profit (not just revenue) on a continuing basis.
|
|
|
|
|
You think you know a lot about how I operate but you don't.
I also "build" systems. People will tell me I need to "add this". I show them "if you do it this way, then that works too".
Which gets back to: I learn the customer's business ... so I don't build pointless functionality ... just to get paid.
And I only work as a "lead", so, I would have a lot to say about your M.O.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
Gerry Schmitz wrote: And I only work as a "lead", so, I would have a lot to say about your M.O.
Sigh...no idea what that comment has to do with anything that I said.
But since it seems to be questioning my competence....
I have been a senior developer for more than 30 years. Including in supervisory roles.
I have written requirements, architectures and designs too many times to count.
I have analyzed customer requests including official Statements of Work to insure both that the needs of a company (one I was working with/for at the time) and the customer were being met.
I have seen others fail to do that - such as the case where one third of the customer company refused to use the new system because of the failure to meet (and even discover) a single need (not a want.)
Besides working with customers and have also worked with Sales. Those people that are attempting to convince others to provide new business and not just existing business.
I will also note I have worked with at least one consultant that took several hours to present a solution (a complex one) that was really 'cool' but which presented a solution for which the company neither had a need or a want. And when I questioned the assumptions behind that I was told I didn't know what I was talking about. At which point the CEO and founder also spoke up and also noted he had no idea what assumptions the consultant was using either because they did not fit the actual business model.
So I will stick with what I was saying. There are 'needs' and 'wants'. They can and do overlap. They can and do often serve different purposes.
|
|
|
|
|
You post something that should be an article, not as a mere comment.
trønderen wrote: From a factual point of view, you are perfectly right. Well, I'm never wrong..
trønderen wrote: In application development, the great majority of your communication is done in the local language I only work in English, all comments in code are in Engrish, and all documentation is. Never, ever, do I code in the local language, as no dog is ever gonna learn Dutch just for maintaining a code base. Ever.
trønderen wrote: rather than assuming that the U.S. interpretation is globally valid. They stuck with Fahrenheits and inches.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: Never, ever, do I code in the local language, as no dog is ever gonna learn Dutch just for maintaining a code base. Ever. My idea of a token-style representation of a program is that noone should have to learn Dutch for maintaining your code base. If all they master is English, then the tokens are mapped to their English representation for that programmer. You map the tokens to Dutch when discussing the solution with your Dutch customer. Or to German if the customer (or end user) is German.
The idea is not having to learn a different language. Not even English.
|
|
|
|
|
Adding an interpreter, adds another burden and potential failure point to the tool chain. Plus the requirement for an in-house local language to interpreter translator.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
Aight, my turn, and I'll provide justification.
I'm a non English native, but all documentation is in English. It has nothing to do with imperialism, it's just the ring that binds us all. Engrish is simple to learn, and as such it became the language of documentation.
So. If you want to code, you better learn English and not French.
As for your Frenchies, I do not take your questions into consideration if you cannot speak English. Any code written containing French isn't worth the time. Or simpeler, I'd delete it
Ctrl A. delete.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: So. If you want to code, you better learn English and not French. My concern is not that I have to master English (I guess I do master it far above the level required for programming), but the customer / end user.
We programmers have a tendency to lock ourselves up in an ivory tower, where we want to lock the door and work in total isolation (although maybe as a programming team, not as individuals), most certainly isolated from the customer and the users. We refuse to communicate with anyone that doesn't master our tribal language to perfection.
I think this is very bad for our profession. We have a lot to learn from those who have the problems/tasks that we are trying to solve. They do not speak our tribal language. We have to speak their language.
|
|
|
|
|
You're talking about an IT "shop"; individuals and smaller outfits wouldn't be able to function in the outside world with that attitude.
The one thing that progress did, was create middle men (i.e. Business analysts) that separated user and "creator".
A Technical Lead cannot be a lead without having interacted with the eventual users, IMO.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
Eddy Vluggen wrote: Engrish is simple to learn
I wish you'd tell some Brits that!
(Many "native" English speakers ... don't. At least not half as well as most non-native speakers.)
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
trønderen wrote: For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments,
I would say...no.
First, at least last time I created a compiler much less studied compilers it is not "simple" to replace keywords.
Second the latter part of that statement seems to suggest exactly the problem that compilers need to solve with the first part of what I said. Compilers (and interpreters) already convert key words into tokens. Those that do not are very inefficient (as I know since I had to work with one long ago.)
trønderen wrote: I think the global software culture would be enrichened if we could disengage from the absolute binding to the English-speaking culture.
No as you already pointed out prior to that. Most discussion about what the software does happens in a natural language. Very likely in the vast majority of cases a single natural language.
Requirements, Architecture, Design are all in that natural language. All of those have much more impact on the solution than the actual code itself.
Significant failures do not happen at the code level. They happen due to a failure in the above processes. Such as the failure with the Mars Climate Observer. The specifications were exact and numerical (which is universal). The communication was not.
|
|
|
|
|
jschell wrote: trønderen wrote:For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments,
I would say...no.
Or perhaps yes. Many years ago, when I was just starting university, the community college had a very old (even then) IBM mini. I don't recall the model number, but it was somewhat larger than an "executive" office desk, with a disc-pac to one side, and one of those 7 foot tall chain-driven line printers to the other. It might have been something from the 1400 series: IBM 1400 series - Wikipedia I think the instructor said that he knew of one other example of that model of computer, but it was in a museum!
Anyway, we used that computer to learn Algol and Fortran. The Algol compiler was written somewhere like McGill university, in Quebec. As such, I seem to recall that the keywords were bi-lingual, you could use either English or French. So either if or si. But the error messages were all in French. So maybe at the time the requirement that we take first year French wasn't so pointless after all. Maybe it was written in Paris: https://dl.acm.org/doi/pdf/10.1145/872738.807150 See note about bilingual details on P113
Keep Calm and Carry On
|
|
|
|
|
Algol68 was explicitly defined for adaptation to different languages: The syntax was defined using abstract tokens that could be mapped to various sets of concrete tokens.
This is no more difficult than having a functional API definition with mappings to C++, PHP, Fortran, Java, ... Obviously, to define these mappings, you should both thoroughly understand the API, and of course the language you are mapping to. It is not always a trivial thing to do.
When you choose concrete tokens for a programming language, it is not something that you do a Friday night over a few beers. It is professional work, where you must know the semantics of those abstract tokens, and you must know the natural language from which you select your keywords. You must be just as careful when selecting a term as the English-speaking language designers when they select their English terms. If the language defines some tokens as reserved, you must honor that even for your alternate concrete mapping.
In your French Algol version, I assume that the source code was maintained in a plain text file (probably in EBCDIC, for IBM in those days), handled by the editor of your choice. Switching between English and French would require a textual replacement. If the source code was rather stored as abstract tokens, maybe even as a syntax tree, it would require an editor specifically made for this format. (Note that you could still have an selection of editors for the same format!) The editor might choose to look up the concrete syntax only for that part of the tree that is at the moment displayed on screen. 'Translation' is done by redrawing the screen, using another table of concrete symbols.
This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit. I sure can agree that it is fully possible to construct obstacles for preventing any sort of change in our ways of thinking. I am not hunting for that kind. Like you, k5054, I observe that 'It happens, so it must be possible'.
|
|
|
|
|
trønderen wrote: The syntax was defined using abstract tokens that could be mapped to various sets of concrete tokens.
In Computer Science the area of Compiler Theory is very old and very well studied.
Your statement is describing something that well designed compilers (and interpreters) already do. Only time I have ever seen a 'compiler' not do that it was coded by someone who had zero training in how the science of Compilers.
As I suggested before the problem is not in creating tokens. The problem is in creating the language in the first place such that it is deterministic and second it creating a compiler that can report errors. That last part is the most substantial part of every modern compiler (even toy ones.)
trønderen wrote: If the source code was rather stored as abstract tokens,
Parsing text into tokens is the first part of what all compilers/interpreters do.
Following is one source of the very well known process Compilers already do.
Compiler Design - Phases of Compiler[^]
What you are describing does not have anything to do with the actual problem.
English version of a standard (very standard) part of programming languages
if x then y
Now the French version
si x alors y
So in the above for just two natural languages you now have 4 keywords in the language.
Lets add Swedish
om x så y
So for every language added it is reasonable to expect that the number of keywords would be duplicated. Keywords often cannot be used in code both because it makes it much harder for the compiler to figure it out and for it to correctly report on errors. Additionally even when the context allows the compiler to figure it out it does not make it ideal for human maintenance.
Consider the following statement. If one was using a different native language to drive the compiler then the following should be legal. But in the english version do you really want to see this code?
int if = 0;
So not only would the number of keywords increase but the programmer would still need to be aware of all of those keywords while coding.
Now besides the increasing number of keywords the following are some of the problems that I see.
1. Two programmers are working on the same file. The file MUST be syntactically correct before developer A (English) goes on vacation. Because otherwise the mechanism (code) that must translate it back from the english form will not work when Developer B is french.
2. Comments cannot be supported.
3. Third party APIs would still require whatever is supported by by the 3rd party service (library, Rest, TCP, whatever.)
4. Adding new languages to the compiler after first release would mean that existing applications could break because existing code might use them. This also is a well known problem that exists right now when new functionality is added to an existing compiler. So all known languages would need to be supported on first release.
|
|
|
|
|
jschell wrote: In Computer Science the area of Compiler Theory is very old and very well studied. I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)
The problem is in creating the language in the first place such that it is deterministic and second it creating a compiler that can report errors. That last part is the most substantial part of every modern compiler (even toy ones.) Reminds me of VAX/VMS: Every message delivered by system software (including compilers) were headed by a unique but language independent numeric code. Support people always asked you to supply the code; the message text could be in any language - they never read that anyway.
So for every language added it is reasonable to expect that the number of keywords would be duplicated. You are missing my point completely. Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.
Keywords often cannot be used in code both because it makes it much harder for the compiler to figure it out and for it to correctly report on errors. Noone is suggesting that you are allowed to use the binary [if] token as a user defined symbol.
The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token. For creation of new control structures, an IDE working directly on a parse tree representation could provide function keys for inserting complete control skeletons. I have been working with several systems working that way, both for data structures, graphic strucures - and for program code, although the latter inserted textual keywords, not binary tokens the way I wish it to do. Once you get out of the habit of thinking of your program as a flat string of 7-bit-ASCII characters, it it actually quite convenient! (You can assign the common structures, like if/else, loops, methods etc. to F1-F13 keys so that you don't have to move your hand over to the mouse for selecting from a menu.)
So not only would the number of keywords increase but the programmer would still need to be aware of all of those keywords while coding. Quite to the contrary! The programmer might very well define a variable named if, which is distinct from the binary token [if]. There would be no reserved words on the textual level.
A not very well known fact: Classic FORTRAN actually managed without reserved words. I just posted an entry in 'The Weird and the Wonderful' - something from my student days that I found in a box in the basement - to illustrate the point. Note, however, that F77 philosophy is not what I am asking for: It did not represent control (and other) structures by binary tokens, but relied on semantic analysis of plain text source code.
1. Two programmers are working on the same file. The file MUST be syntactically correct before developer A (English) goes on vacation. Because otherwise the mechanism (code) that must translate it back from the english form will not work when Developer B is french. I say again: You missed my point completely. If the IDE stores the code as a parse tree, it is syntactically correct, otherwise the IDE would not have accepted it. Of course developer B may define user variables and methods with French names, but so he can in any IDE environment.
2. Comments cannot be supported. Why can't the parser define a binary 'comment' token, and store that in the parse tree? In one project I am currently working on (which is not a general programming language, but an application specific control language), we are doing exactly that. The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed. (Otherwise, when the English, say, comment is displayed, you can add a French translation of it.)
3. Third party APIs would still require whatever is supported by by the 3rd party service (library, Rest, TCP, whatever.) I have been working with third party APIs with French method and parameter names, in an otherwise English language environment; it was a nightmare ... If you define a language along the lines I am suggesting, a library would be delivered as a parse tree as well, along with one or more (i.e. different languages) symbol tables for use in the API. (This is how we do it in that application control language mentioned above). Otherwise, if the binary interface is given, the library comes in a compiled, linkable format with given entry point symbols, your parse tree interface to that library should include a mapping from a call token to the entry point symbol, unlinking that symbol from the external display. Establishing this mapping is a one-time operation that could follow the library file, similar to how a '.h' file follows a C library.
4. Adding new languages to the compiler after first release would mean that existing applications could break because existing code might use them. Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes. Adding a new binary token, with its unique token ID, would not invalidate any program whatsoever. Of course there is the question of where the display mapping is done: If the IDE does it, and imports a new compiler with new binary tokens, it might not have a proper French or Swedish word to represent it. If the new and extended compiler is delivered with a token display mapping table for a number of languages, the problem is significantly reduced. (The user may have a language fallback list, both for comments and other binary tokens, so that something meaningful is displayed, although not in the primary language.)
As I wrote in my post,
trønderen wrote: This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit. Almost all of your comments are fundamentally based on the idea that a source program really, as a matter of fact, is a string of 7-bit-ASCII characters, and this will always remain true. I am suggesting that it is not.
Compare an old style text formatter such as troff with, say, MS Word: You may argue that '\fI' is like a reserved word for italicizing text; you cannot use it as plain text (without quoting). Troff stores everything as plain text. MS Word does not - prior to .docx, the storage format was a true binary format, and even XML is just a storage encoding - internally, the working format is binary, just like before. In MS Word, '\fI' is freely available as document text without quoting. Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.
I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.
|
|
|
|
|
trønderen wrote: I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)
C? Compiler theory applies to any language (including interpreters.)
trønderen wrote: Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.
That is a non-starter.
The human needs to write the code. Using token representations that the user is responsible for memorizing would not work. If the user at any time uses something like 'if' and 'then' then those are keywords for the language. That is how it works. Just as in native languages it works that way. Changing semantics (english) does not alter the role of what a system that eventually must run code must still do in that it still must convert the keywords into something else.
And defining keywords is necessary for any computer language because it is not deterministic otherwise.
trønderen wrote: The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token.
Errr...no idea what you are talking about.
The 'bold' just becomes part of the textual representation of the keyword. No different than requiring that the keyword is in lower case.
You seem to think that because you use bold on a keyword that it is no longer a keyword. It doesn't matter how you differentiate the language specification is it still a keyword.
And no developer is going to work in a language where they need to make keywords by switching from bold and back.
trønderen wrote: Why can't the parser define a binary 'comment' token,
Because the content of the comment is NOT the token that tells the compiler that it is comment. The content of the content is what is contained by the comment. So in the following the value of the comment in text not the '//'
trønderen wrote: I have been working with third party APIs with French method and parameter names
Only when named parameters are supported and used can the parameter names matter.
And when you use it in English exactly how are you going to use that method unless you have English that tells you how to use it?
trønderen wrote: Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes
I have studied Compiler Theory formally and informally for a long time. That statement, by itself, is not possible.
As I said, tokenization itself, is not something that is new in Compiler Theory. It has been there for a very long time. That very word is the process of converting keywords to tokens. You seem to think you are going to be able to remove keywords from the definition of the language but failing to describe, in detail, how a user is then going to be able to do something without using keywords.
trønderen wrote: Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.
You do understand that a MS Work doc is a binary file which has embedded symbols in it which define the format?
The text of the document is NOT the relevant part. The analogy to code for a Work doc is that all of the text that you see in MS Word is a 'comment'.
However when you write code most of what you write and what you debug is not comments. So you are proposing the the keywords of the language would be written using combination key presses. For every single thing that one wrote.
trønderen wrote: I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.
Knock yourself out. It is call BNF - Backus Naur Form notation.
The Java example (however it has bugs in it.)
Chapter 18. Syntax[^]
|
|
|
|
|
jschell wrote: C? Compiler theory applies to any language (including interpreters.) Well, of course. And it sure is a good idea to know at least fundamental compiler theory before you sit down to create a new language, if you want to make a good one. History has shown that not all language makers have had extensive compiler theory background. Hence my comment.
The human needs to write the code. Using token representations that the user is responsible for memorizing would not work. If the user at any time uses something like 'if' and 'then' then those are keywords for the language. That is how it works.
Once again: Try to liberate yourself from this fixation on a code file always and invariably maintained and stored as a flat string of ASCII characters.
Hopefully, you are able to do that in document processing systems: You create a new chapter level two by hitting a function key or making a menu selection, not by inserting e.g. the strings '< h2>' and '< /h2>' in the body text. Sorry about the extra space after the '< 's - it is required here, because this is not a proper document editor. In, say, MS Word, I could have written the markup without any such considerations. In a document processor, there are no reserved text body words, character sequences or characters.
There is no law of nature that says there must be keywords / reserved words just because that document is source code for a compiler / interpreter, that structure must be represented by textually - that is not 'how it works'. Any WYSIWYG document processor will prove you wrong.
And defining keywords is necessary for any computer language because it is not deterministic otherwise. You certainly need to define a representation for structural elements, but try to understand that once you liberate yourself from the flat-sequence-of-characters mindset, those structure elements need not be alphabetic. In a document processor file, there are no 'keywords' to represent a hierarchical chapter / section structure; the structure is maintained in binary, non-textual format. You could do the same for a program code file. (I said this earlier; it appears necessary to repeat it.)
Errr...no idea what you are talking about.
The 'bold' just becomes part of the textual representation of the keyword. No different than requiring that the keyword is in lower case. That is because you seem to be completely stuck in the mindset of a code file by definition being a flat sequence of printable characters, maintained using 'vi', or 'TECO' if you are old style ('emacs' if you are more up to date on tools).
And no developer is going to work in a language where they need to make keywords by switching from bold and back. So you have not understood a word of what I am talking about. How is that in a document processor? You do not create a new chapter by inserting some extra space, then switching to a larger, possibly bolder and different typeface, possibly enter the next higher chapter number before typing the chapter heading, add some extra space to the first paragraph, and then resetting to the standard body text format.
No, that is not the way you work: You press e.g. Alt-1 (that is how my MSWord is set up), and the editor takes care of inserting a binary structure representing a level-1 chapter heading. It is displayed with extra space before and after, in a larger, bold typeface etc., not because I inserted space or changed the typography. I inserted a structure element that was displayed that way.
If my code editor lets me insert a structure element, say a conditional if, or a loop, or a method definition, in a similar way, the code editor may display those structure markers in one of several possible ways. I suggested that the display could be 'keyword-like', but typographically marked e.g. by being enclosed in brackets or boldfaced so that the programmer would not mistake them for being plain ASCII strings (such as user specified variable / method names). Displaying space before and after method headings (similar to how document chapters are highlighted by a document editor) would be another display indicator of structure. One obvious way of displaying code structure is to indent a loop body, a 'then clause' or 'else clause'.
The programmer will not switch to boldface, add brackets or blank lines to create a structure element, not even hit the tab key or space bar to indent a loop body, but e.g. press Alt-1 to create a namespace, Alt-2 to create a method, Alt-3 to create a conditional, Alt-4 to create a loop, and so on. The editor would display something to show the structure, but whatever it displays, it is not 'keywords' in the textual sense. And it is not editable in TECO or vi.
In a document editor, you may insert blank lines, select a larger and bolder typeface, type a number and a line of text, add a new blank line and after that revert to the typography of body text. That might look like whatever a chapter heading is displayed as, but it won't make it a chapter object, in the sense of the document editor data structure. Similarly if your IDE represents your program as a parse tree, writing 'if' (rather than hitting the function key to create a conditional statement, will not create a conditional statement. If 'if' is not a known symbol, the IDE might ask you, as soon as you complete that token, "'if' is not a known symbol - do you want to (1) create a local variable named 'if', (2) create a static variable in this module, called 'if', (3) ...)". If the IDE is English language UI, it might even suggest "(4) did you intend to insert a conditional statement in your code?", but in a Norwegian language UI, this would not be triggered for 'if', but maybe for 'hvis'. If the programmer selects this alternative, the 'keyword' is not inserted into the program; the binary conditional statement object is.
Because the content of the comment is NOT the token that tells the compiler that it is comment. The content of the content is what is contained by the comment. So in the following the value of the comment in text not the '//' Are you completely unable to imagine a binary element that is displayed as, say, '// comment text'? You might want to display it in bold, or maybe italics, to show that this is a comment, it is not user code inserted by two presses of the '/' key. If you do that, the two slashes will not be displayed in bold / italics, and will not have comment semantics - similar to writing '< h1>' in a Word document does not create a new top level chapter.
// A comment in english is useless in french. Did you notice the sentence in my previous post,
trønderen wrote: The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed. That is exactly what I am doing in my current project (which, as I mentioned earlier, is more like a scripting language than a programming language).
jschell wrote: trønderen wrote:I have been working with third party APIs with French method and parameter names
Only when named parameters are supported and used can the parameter names matter. Named parameters are quite standard in modern programming languages. But even in K&R C, you will see the parameter names in .h files, and often you have to deduce the semantics from the variable name.
If your program representation was a parse tree, as I suggest, even variables, types and methods would have internal text-independent representations; the display of them could be based on looking up that internal ID in a symbol table. This symbol table could exist in several language variants. (I said this before; you obviously overlooked it.)
And when you use it in English exactly how are you going to use that method unless you have English that tells you how to use it? If you select English as your UI language, then you see the English identifiers and English language comments. As I wrote in my previous post:
trønderen wrote: The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed. As I have state in other posts, this goes for all program elements as well, including user defined symbols and how binary structural elements are displayed.
In my current project, the user language preference is actually a preference list: If no name / comment is available in your preferred language, your second, third, ... choice is taken. It may of course happen that no one has translated the symbol table to any language that makes sense to you. That can be done at any later time, and the problem is significantly reduced from forcing every non-native-English-speaker to work in a foreign language. My project includes a search function for symbol table entries and comment texts that do not yet have a translation to the current UI language, so that you can easily find those terms that you have forgotten to translate to, say, Norwegian before presenting the code to a Norwegian speaker.
You may very well choose to insist on always including an English symbol table, as a fallback when a translation is missing, but you should be prepared to accept that not all foreigners will agree with you that English is a better fallback in their native environments.
trønderen wrote:Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes
I have studied Compiler Theory formally and informally for a long time. That statement, by itself, is not possible. Obviously, you have limited your study of Compiler Theory to purely textual input. You are clearly incapable of comprehending how an English MSWord user can select 'Heading 1' and a Norwegian MSWord user select 'Overskrift 1', and both actions lead to the same result. If the Norwegian document is moved to an English MSWord, the 'Overskrift 1' style magically is identified as 'Heading 1' - believe it or not. (I understand that this is completely incompatible with your Compiler Theory knowledge, but it is a fact.)
Again, comparing to document processors: I don't know when 'hidden text' was introduced to MSWord, when this new binary object (or maybe it was a new parameter or a new parameter value for an existing binary object definition - that makes no difference). No matter how any of my documents looked like at the time, they couldn't possibly be invalidated by the the new possibility of hiding text.
Let me exemplify the same in a programming language context:
I was programming in a language where 'for' loops could be conditionally terminated prematurely by a 'while <condition>', comparable to C 'if !<condition> break': Sometimes, you want different treatment if the loop iterates to its end or if it is terminated prematurely. E.g. when you search a list or array, and find what you are looking for (exiting prematurely), or you reach the end without finding it, requires different handling. In this language, you could specify the two alternatives by adding to the loop an 'exitwhile' clause for the premature termination, and/or an 'exitfor' clause for loop completed termination. Both clauses were executed in the context of the loop body with access to e.g. loop local variables.
If 'exitwhile' and 'exitfor' clauses were added to a textual programming language, then programs using variable names 'exitwhile' and 'exitfor' would be invalidated. If a loop is rather represented by a binary object, and this object is augmented with two new fields: One pointer to an 'exitwhile' code block, another to an 'exitfor' code block, both initially null / nil / void, then no old program would be invalidated. The updated IDE would need to provide a way for the programmer to insert exitwhile/exitfor clauses, but not through any such keyword. They might be displayed in a similar way to the 'for' and 'endfor' markers (note: not as editable text, but typographically highlighted so that you would recognize it as structure indicators) with initially empty clauses. Until you start using this facility, you and your code are completely unaffected by the new fields in the binary loop object.
I am assuming that your old loop object missing these fields would still be valid: E.g. all objects should contain a size value, and any software handling the program file would know that a shorter length loop object is a loop without the new clauses, not making any fuzz about it. (The IDE could even store any loop object not making use of the extra fields in the short form.)
You seem to think you are going to be able to remove keywords from the definition of the language but failing to describe, in detail, how a user is then going to be able to do something without using keywords. Your mind seems to be completely fixed on 'keywords' being textual. When I press Alt-1 in MSWord, you may consider that a 'binary keyword' resembling the textual '< h1>. In a programming language, Alt-1 might resemble the textual 'namespace', Alt-3 might resemble 'if ... then ... else ... endif'. One essential reason for saying that they 'resemble' textual keywords: The IDE's input processor will immediately process them, just like MSWord processes Alt-1 immediately to insert a top level chapter object, to create the binary structure objects. The file will never store the Alt-1 or Alt-3. You might assign that 'Create new namespace' or 'Create conditional statement' to any function key, menu selection etc., and different users may make different assignments - the binary objects are the same. So the assignments made by any one user is not any sort of 'reserved word'.
You do understand that a MS Work doc is a binary file which has embedded symbols in it which define the format? Most certainly - but you obviously fail to understand that I am suggesting exactly the same for a programming language code file. The specification of the binary format might be treated as the formal language definition, just like ISO/IEC is the formal definition of OOXML objects. (This is what I have been talking about all the time!)
You may consider OOXML to be a 'document programming language' - it is not defined in BNF, but as an XML schema. Functionally, those are roughly equivalent.
Knock yourself out. It is call BNF - Backus Naur Form notation. BNF is certainly not limited to specification of the syntactical interpretation of flat sequences of printable characters.
One of my fellow students, in his first job were set to identify various kinds of bacteria in microscopy photos. The various kinds of bacteria, i.e. the shapes of them in the images, were described in BNF format. The images were scanned and the scan lines 'compiled' according to the BNF defined syntax. The same image were 'compiled' according to different BNFs, each for a different bacteria, and the one(s) giving the fewest 'syntax errors' were considered primary candidates for the identification. (This was in the early 80s, when technology was less sophisticated than today, and they did not rely completely on automatic identification; they used the BNF analysis to rule out those hundreds of alternatives that most certainly did not match. A medic had to confirm the identification. Yet, this was a real work saver.)
If you create a new language to be stored as a parse tree rather than as a linear character sequence, you would most likely create even that definition in some BNF variant, or in a similar definition language. Plain BNF is semi-abstract; it uses character strings in the definition, but the only structure representation is the BNF itself. For a non-textual structure representation, it does not define a unique storage format.
So if you were to create a binary language representation, you should rather use something like ASN.1, which resembles BNF in that it defines abstract objects. Then you can select one of the defined 'encoding rules' for the generating of concrete object representations that can be stored or transmitted. If you go for ASN.1 for the abstract specification, but dislike all the existing coding rules, you can even make up your own new encoding rules - that is usually caused by a 'Not Invented Here' rather than a qualified professional evaluation of existing alternatives.
It is interesting to note that BNF was initially developed to describe natural languages, based on Chomsky's production rules and transformations (and even earlier linguistic studies). The metasymbols used by Backus and Naur was adapted to standard keyboard characters, but the principles are essentially those of Chomsky.
Curiously enough, the Java example you point to diverts strongly from 'classical' BNF, the way Backus and Naur defined it. They have even redefined the very basic '::=' symbol. If you want to refer to a BNF programming language definition, you should rather select Pascal, which was originally defined in 'classical' BNF (see Appendix D of Jensen & Wirth: Pascal User Manual and Report), although it is frequently presented in some revised BNF variant. You'll find one that is fairly close to the original at Syntax von Pascal In Backus-Naur Form (BNF)[^]
BNF is still used today, but is considered somewhat outdated by quite a few people. So various groups have extended and augmented it significantly, and later replaced it by similar languages - which might be viewed as alternative derivatives of Chomsky, rather than derivatives of BNF. Some of the changes are cosmetic, or in the style of 'I want to save a couple keystrokes when typing!', such as reducing '::=' to ':'. Whether or not any one of these alternatives is "better" than classical BNF is a matter of personal taste.
Don't misunderstand my comments: All that you say are perfectly valid as long as we limit ourselves to program code represented as as linear sequence of printable characters, all editable by the programmer.
That is a very limiting context. From my very first post in this thread (almost two months ago), I have suggested that we extend the scope to other representation formats:
trønderen wrote: My hope (but I am not very optimistic!) is that all application programming will move over to an abstract representation where the language form is merely a display phenomenon; the program itself is stored in an abstract, language independent form. For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments, ... This is what I have pushed in all my following posts.
I also wrote, in a more recent post (well ... two days later, November 14):
This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit. I sure can agree that it is fully possible to construct obstacles for preventing any sort of change in our ways of thinking. I am not hunting for that kind. I guess this point has been extensively highlighted by now.
|
|
|
|
|