Niklaus Wirth
Jump to navigation
Jump to search
Scholar
| Scholar | |
|---|---|
| wikiDataId | Q92604→Q92604 |
| name | Wirth |
| firstName | Niklaus |
| description | Swiss computer scientist (1934–2024) |
| homepage | https://people.inf.ethz.ch/wirth/ |
| orcid | → |
| dblpId | w/NiklausWirth→w/NiklausWirth |
| googleScholarUser | → |
| linkedInId | → |
| researchGate | → |
| gndId | 120777983→120777983 |
| smartCRMId | → |
| affiliations | →[[]] |
Works
On the design or programming languages
OCR by deepseek
ON THE DESIGN OF PROGRAMMING LANGUAGES*
N. WIRTH
This paper reports on some past experiments in the design of programming languages. It presents the view that a language should be simple, and that simplicity must be achieved by transparence and clarity of its features and by a regular structure, rather than by utmost conciseness and unwanted generality. The paper contains an overview of the language designer's problems and dilemmas, and ends with some hints drawn from past experience.
In order to prevent misunderstanding or even disappointment, I should like to warn the reader not to interpret this title as an announcement of a general critique of commonly used languages. Although this might be a very entertaining subject, probably all that can be said about it has been said and heard by those willing to listen. There is no reason to repeat it now. Neither do I intend to present an objective assessment of the general situation in the development of programming languages, the technical trends, the commercial influences, and the psychology of their users. The activities in this field were enormous over the past years, and it would be presumptuous to believe that a single person could present a comprehensive, objective picture. Moreover, many aspects have been aptly reviewed and commented elsewhere (1-3).
Instead, I should like to convey a view of the development of design attitudes, of the shift of emphasis in design goals over the past decade. Also, I will try to provide some insight into the multitude of problems aspects, and demands that the designer of a language is facing, and to woo gently for recognition of the difficulties of this profession. Let me start by recalling some of my own reminiscences of incidents that caused me to end up in this role of language designer.
AN EXCURSION INTO "MISTORY"
My interest in computers had been awakened when I was an engineering student in the late 1950s; I was fascinated on the one hand by simplicity and inherent reliability of the basic building blocks, the digital circuits, and on the other hand by the astounding variety of effects that could be obtained by combining many of them in different ways. Moreover, these combinations could be varied not by extensive and cumbersome use of the soldering iron, but by merely composing ingenious sequences of hexadecimal digits placed on a magnetic drum. As the general task of an engineer is the improvement of his technical gadgets, I perceived that computers were an ideal ground for engineering activities, and felt that there was ample room for further improvement. This assessment turned out to be correct up to the present day. But how was progress to be achieved? One way was by enhancing the reliability and effectiveness of their electronic components. The other - hardly defined path - seemed to be to make computers more conveniently usable, to make them less of an exclusive domain of the highly trained specialist.
This was the situation when I entered graduate school at Berkeley: In one room there stood a huge prototype of a computer with a bewildering number of wires and tubes. In a much smaller room nearby there were a few such specialists, talking about a "language" and a "translator". Luckily for me, a student helping to bring the big monster of a computer into operation didn't report too enthusiastically about that project, and I perceived the feeling that the future of computer hardware design did not lie in a university department anyway. Hence, I decided to explore the other alley, although that group was surrounded by skepticism, as word passed that some of these people did neither know Ohm's law nor Maxwell's equations.
The new and fascinating project under the direction of H.D. Huskey consisted in adding facilities to the programming language WELLAC by extending the translator program. This program was, remarkably, coded in the very language that it was compiling and, in retrospect, quite advanced for its time (4). Indeed, the fascination of the project originated much more from the sense of adventurousness than from the satisfaction of achieved perfection. For, although programming in WELLAC proved to be considerably more convenient than exercises in the cryptology of hexadecima codes, the room for still further improvements continued to appear unlimited. Looking at it from the distance, that compiler was a horror; the excitement came precisely from having insight into a machinery that nobody understood fully. There was the distinct air of sorcery.
Then Algol 60 appeared in the literature and - after the difficulties of learning the new syntactic formalism were mastered - began to provide some relief from the oppressive feeling that there was no way to bring order into large languages and compilers. Yet even then, the task of constructing an Algol
*Reprinted from Proc. IFIP Congress 74, 386-393, North-Holland, Amsterdam, North-Holland Publishing Company.
The author is with the Institut für Informatik, Eidg. Technische Hochschule, Zurich, Switzerland.
Page 2
24 Programming Languages: History and Good Design
compiler generating acceptably good code appeared enormous. The more the Algol compiler project neared completion, the more vanished order and clarity of purpose. It was then that I clearly felt the distinct yearning for simplicity for the first time. I became convinced that we should learn to master simpler tasks before tackling big ones, and what we need to be equipped with much better linguistic and mental tools. But apparently "useful" languages had to be big.
SIMPLICITY IN GENERALITY
In this situation, A. van Wijngaarden appeared like a prophet with his idea of Generalised Algol (5). His point was that languages were not only too complex, but due to this very complexity also too restrictive. "In order that a language be powerful and elegant it should not contain many concepts and it should not be defined with many words" The new trend was to discover the fundamental concepts of algorithms, to extract them from their various incarnations in different language features, and to present them in a pure, distilled form, free from arbitrary and restrictive rules of applicability.
The following short example may illustrate the principle. In Algol 60, the concept of a subprogram appears in two features: as declared procedures, and as name parameter to procedures. The passing of a parameter is realised as an assignment of an object (the actual parameter) to a variable (the formal parameter). A simplification and concurrent increase in power and flexibility of Algol can therefore be obtained by unifying the notions of procedure and parameter, by letting them become objects that can be assigned to variables like numbers or logical values. Let such an object be denoted by the program text enclosed by quote marks; the correspondence between constructs of Algol 60 and the generalised notation are shown by the following table:
Algol 60 Generalisation
procedure P: <statement> P := <statement>
Q(<expression>) Q(*<expression>)
The gain in simplicity of language is obvious and the resulting gain in flexibility is striking. For example, it is now possible to assign different subroutines to a variable at different times, or even to replace a function subroutine by a constant! We are suddenly offered the power so far reserved to the assembly language coder letting his program modify some of its own instructions.
Implementation of such far reaching generalisations were on open challenge. I decided to investigate whether these concepts could be condensed into a minimal language and compiler, as a language without compiler seemed to be of marginal value to me. This effort resulted in my first programming language, called Euler (5). It was a success in several ways, certainly if measured by the number of subsequent implementations on a wide variety of computers. The language was accepted as an intellectual challenge, a flexible and powerful vehicle. Its simplicity and compactness made it an ideal implementation exercise for many prospective compiler engineers. Its greatest value, however, lay in revealing how simplicity should not be understood and achieved.
The premise that a language should not be burdened by (syntactical) rules that define meaningful texts (5) led to a language where it was difficult and almost impossible to detect a flaw in the logic of a program. It led to what I like to call a high-level Turing machine. Making mistakes is human (particularly in programming), and we need all the help possible to avoid committing them. But how can one expect a language to aid in avoiding mistakes, if it is even incapable of assisting in their detection.
The lesson is, then, that if we try to achieve simplicity through generality of language we may end up with programs that through their very conciseness and lack of redundancy elude our limited intellectual grasp. It is a mistake to consider the prime characteristic of high-level languages to be the they allow to express programs merely in their shortest possible form and in terms of letters, words, and mathematical symbols instead of coded numbers. Instead, the language is to provide a framework of abstractions and structures that are appropriately adapted to our mental habits, capabilities, and limitations. The "distance" of these abstractions from the actual realisation in terms of a computer is an established measure for the "height" of a language's level.
The key, then, lies not so much in minimising the number of basic features of a language, but rather in keeping the included facilities simple to understand in all their consequences of usage and free from unexpected interactions when they are combined. A form must be found for these facilities which is convenient to remember and intuitively clear to a programmer, and which acts as a natural guidance in the formulation of his ideas. (The language should not be burdened with syntactical rules, it must be supported by them. They must therefore be purposeful, and prohibit the construction of ambiguities.) It is a good idea to employ adequate, concise key words, and to (privilege) that they can be used in any other way. Prolixity is to be avoided, as it introduces a wrong kind of redundancy.
LEVELS OF ABSTRACTIONS
One of the most crucial steps in the design of a language is the choice of the abstraction upon which programs are to base. They can be selected only if the designer has a clear picture of the purpose of his language, or the area of its intended application. But all too often that purpose is not neatly specified and includes so many diverse aspects that a designer is given only inadequate guidance from prospective users. But at least he should restrict his selection to abstractions from the same level which are in some sense compatible with each other. I should like to offer three examples to this topic.
Page 3
Algol 60 has chosen a well-defined set of abstract objects of computation: numbers and logical values, replacing bits and words as used on a lower level. The operations that are applicable to them are governed by mathematical laws and axioms which can be understood without referring to the number's representation in terms of bits and words. In the realm of control structures, the language introduces the operations of selective execution and repetition in the form of neatly structured statements. But their form was not sufficiently flexible - e.g. repetition is intimately coupled with a variable progressing through an arithmetic series of values, and selection is only provided among two alternatives. To provide the user with a facility for cases not covered by these control structures, the designers of Algol resorted to borrowing the universally applicable jump order from the lower level of machine coding. The goto statement is but a polished form of the jump.
This may not seem too serious in itself. But consider that now the programmer is able to use these facilities combined. The following example — which an honest Algol programmer will refrain from using, or otherwise will at least get a bad conscience — shows the point:
for i := 1 step 1 until 100 do begin S; if p then goto 17 end
The whole purpose of the for clause is to proclamate to the reader: "the qualified statement is going to be executed once for every i = 1,2,3,...,100". The jump, however, may sneakily cause this promise to be broken! Whereas jumping out of a substructure may be considered to be a matter of morale only, jumping back into a structure even raises technical problems. They require the establishment of protective rules and restrictions which are afflicted by the stigma of improvise and afterthought. They complicate the language by burdening its definition, its comprehension, and its compiler.
The second example concerns the notion of pointers or references in high-level languages. When programming in assembly code, probably the most powerful pitfall is the possibility to compute the address of a storage cell that is to be changed. The effective address may range over the entire store (and even be that of the instruction itself). A very essential feature of high-level languages is that they permit a conceptual dissection of the store into disjoint parts by declaring distinct variables. The programmer may then rely on the assertion that every assignment affects only the variable which explicitly appears to the left of the assignment operator in his program. He may then focus his attention to the change of that single variable, whereas in machine coding he always has — in principle — to consider the entire store as the state of the computation. The necessary prerequisite for being able to think in terms of safely independent variables is of course the condition that no part of the store may assume more than a single name. Whereas this highly desirable abstraction is possible for the data declared in a program as simple and array variables, it cannot be maintained in the case of variables referenced through pointers. There, the crucial rule that "no variable may be denoted by more than one name" is broken, with the consequence that an assignment to a variable P may unexpectedly affect the value of a variable Q known under a different name. This breach of abstraction is again borrowed from the lower level of absolute addressing, and consequently it introduces all the complications inherent in the lower level, particularly if the pointers may be used to address any kind of variable, as it is the case in PL/1. The less powerful (and therefore less flexible) pointer concept of Pascal avoids this confusion of levels by restricting pointers to reference a single, specific type of data. This maintains the possibility to consider the referenced data as belonging to a separate, "anonymous" store not accessible under explicitly declared names. In this way, pointers are not used to circumvent the principle of distinct names and thereby do not jeopardize the technique of localized reasoning.
The third example deals with the notion of data types. The concept of type, as it has been introduced in languages like Algol 60, Pascal, and others, is a marvellous tool to formulate abstractions of objects occurring in computational processes. By requiring that each constant, variable, parameter, and function designator be associated with a type, the programmer is forced to specify for each object precisely its abstract structure. The compiler can then verify that the operations applied to the object are in accordance with its definition. This verification is essentially a static (compile-time) check, and as such serves the purpose of detecting an inconsistency even before a program is executed. However, due to static checks being limited to properties decidable at compile time, the language designer is under strong pressure to include loopholes in order to cover cases that are not decidable, but in the programmer's mind "just couldn't go wrong". Such loopholes are, e.g., the so-called variant records with a tagfield that is not present, or — more generally — the programmer's option to disregard types and to relax checking in certain cases. The arguments for the inclusion of such features are indeed hard to refute. Let me propose a typical case: We are requested to construct a program that merges two files of persons with data identified by their names. The most efficient method to do this is to sort both files. For an efficient sort we must be able to compare names. The operation of comparison must be meaningful for the data "name". Hence, it must either be an explicitly programmed comparison of the component characters, or we must allow the built-in comparison of names as an operation applicable to operands of a certain type. If names are defined as character arrays of varying length, the latter possibility is not provided by the language (Pascal). But should we therefore be allowed to declare a type representing names in a way so that comparison of entire names is possible? An implementor might find this highly desirable; but the designer of a general purpose language would do well to resist such temptations and to keep his language as simple and safe as possible, and to consider the desire for a built-in "total comparison" operation as a justified request for a type with associated operations, i.e. a new basic type. The "string" is precisely such a type, and many programming languages regrettably ignore it. The designer must be aware, however, that he then opens a new dimension, and that soon he will be asked for strings with variable length, and string concatenation, etc. So he may well consider string handling to be beyond the intended scope of his language, and therefore decide to leave it out altogether. The important thing is to recognize that a built-in facility, in order to be genuinely useful and in order not to jeopardize the very concept of type, must be a data type in its own right, complete with a set of applicable operators. The issue of providing loopholes to the type system under the pretext of "efficiency" is a highly controversial subject, and must be handled with extreme care. A language without such loopholes is safe, but may be too confining for many applications; a language with them is flexible, but also hazardous and difficult to comprehend.
From these examples, I should like to draw the following conclusions: a language should offer facilities that are abstractions of concepts that naturally occur in applications. They must be selected from one level only, and there should be no temptation to borrow a concept from a lower level under the pretext of "efficiency" or "flexibility". The borrowing invariably leads to a breach of the chosen level of abstraction, with consequences that are technically and intellectually difficult to master. The designer of a new language should be aware of the pitfalls of providing "general" solutions, of making allowances for every possible case, of implementing every feature that seems desirable to him. In his effort not to be restrictive, he might end up with a language that is restrictive after all because nobody fully understands it. In the name of elegance and expressive power he may in fact produce the opposite. One may well formulate the rule: A language should be extended only if a feature is left out, not if a new one is added.
I am perfectly aware that my argumentation will be considered outdated, if not foolish, by many of my colleagues. In particular, it seems to be contrary to the most fashionable recent trends in programming methodology. Indeed, in the past years, the advocates of what I may call "safe" languages have been on the retreat. The emphasis is now on "extensible" languages, on "programming systems" rather than "languages". This raises the question of whether extensible languages are really the answer to the language designer's dilemmas.
EXTENSIBLE LANGUAGES
The desire to let the user tailor the language to his own specific needs is certainly an attractive one. What else can better satisfy everybody's wishes than a facility to extend a given nucleus of features? This would shift the burden from the designer's to the user's shoulders; and since there are many users, each can adapt the language to his own specific needs. The idea sounds convincing and almost irresistible. However, I shall try to show that extensibility has a number of snags.
First of all, it is important to note that an extension is something quite different from a mere abbreviation. An extension introduces new facilities, i.e. new notations for new concepts. But the main issues of language design, namely the choice of appropriate concepts, are not resolved by providing extensibility; they are merely passed on to the programmer. And an extension mechanism can only be built upon a set of basic concepts that constitute the nucleus. The crucial decision thereby remains: the choice of the nucleus. The quest for a good nucleus is exactly the same as for a good language. In fact, the extensibility idea can only be successful if the nucleus itself is so powerful, so general that all possible extensions can be mapped into it without effort, and that this mapping yields efficient object code. The history of extensible languages (6) shows that this is difficult, if not impossible, to achieve. There is a similarity here to the idea of Generalised Algol. It attempted to achieve simplicity through a minimal set of powerful concepts. The outcome was Euler, which is a marvellous demonstration of how ununderstandable a "simple" language can be.
The real cause of the difficulty lies in the fact that an extension must, in practice, be bound by the principle of "closure": the extended facility must be usable in exactly the same way as the basic facilities, i.e. it must be seamlessly integrated into the language. As a consequence, a language allowing extensions must allow them to be made for any data type and any operation. In other words, the programmer must be able to define new types and their associated operations. At this point, we suddenly realize that we have already progressed far beyond the original "simple" language: we are now dealing with a language with an extensible type system, in which types are objects themselves that can be manipulated by the program. In other words, the type concept is now part of the program's data universe. This, in turn, introduces a new level of complexity: types are now objects that can be created dynamically, and the compiler can no longer perform static type checking. The burden of type checking is shifted to run time, with a corresponding loss of efficiency and safety.
In conclusion, the idea of extensible languages, while appealing, leads to a significant increase in complexity, both for the implementor and for the user. It is not a panacea for the language designer's problems; it merely moves them to a different level.
SOME HINTS FOR DESIGNERS
After this long discussion, I should like to offer some practical hints, drawn from my own experience, for those who are about to embark on the design of a new programming language.
Define the purpose. Be clear about the intended application area of your language. Do not try to design a universal language; it will be too large and too complex. Restrict yourself to a well-defined domain.
Keep it simple. Simplicity is the most important goal. But simplicity does not mean a minimal number of features; it means that each feature is easy to understand in isolation and in combination with others. Avoid features that interact in unexpected ways.
Choose the right abstractions. Select a set of abstractions that are appropriate for the intended application area and that are at the same level. Do not mix abstractions from different levels.
Provide strong typing. A strong type system is the best tool for detecting errors early. It also serves as a form of documentation. But make the type system complete and consistent; do not provide loopholes.
Design for readability, not just writability. Programs are read much more often than they are written. The language should encourage the writing of clear, self-documenting code. Avoid cryptic notations.
Consider implementability. A language that cannot be implemented efficiently is of little practical use. Keep the implementation in mind during the design. A simple language is usually easy to implement.
Be conservative. Do not include a feature just because it seems fashionable or because it exists in another language. Every feature adds to the complexity. Be prepared to leave things out.
Test your design. Implement the language and use it for substantial projects. Only through practical experience will you discover the flaws in your design.
CONCLUSION
In this paper, I have tried to convey some of the insights I have gained from my own involvement in programming language design. The task is a difficult one, full of dilemmas and trade-offs. There is no perfect language, and there never will be. But by adhering to some basic principles, we can avoid the worst pitfalls and create languages that are useful, reliable, and even pleasant to use. The most important of these principles is simplicity, but simplicity must be understood in the right way: not as minimalism, but as clarity, transparency, and regularity. It is my hope that future language designers will keep this in mind.
REFERENCES
Sammet, J.E., Programming Languages: History and Fundamentals, Prentice-Hall, 1969.
Wegner, P., Programming Languages, Information Structures, and Machine Organization, McGraw-Hill, 1968.
Rosen, S. (Ed.), Programming Systems and Languages, McGraw-Hill, 1967.
Huskey, H.D., et al., The WELLAC System, Comm. ACM, Vol. 3, No. 3, 1960.
Wirth, N., and Weber, H., EULER: A Generalization of ALGOL, and its Formal Definition, Comm. ACM, Vol. 9, No. 1 and 2, 1966.
Solntseff, N., and Yezerski, A., A Survey of Extensible Programming Languages, Annual Review in Automatic Programming, Vol. 7, 1972.