Programming Languages – Power, Pedigree and Purpose
(or: What the 110010001100010111000011110100101 is it all about?)
Introduction
This article gives an overview of how and why programming languages have developed in the way that they have. It encompasses three main themes:
- Power: how languages can be differentiated in terms of their productivity rates
- Pedigree: how languages have developed over time
- Purpose: why power is not the sole determinant of choice
By the end the reader will be able to appreciate why clockwork is sometimes better than nuclear fusion and why Shakespeare was never asked to write flat-pack assembly instructions for IKEA.
Power
Programming languages have proliferated since the 1950s, each with its own focus (commercial, mathematical, algebraic) and power relative to that of other languages. For instance, if the code for an equivalent function requires 40 instructions to be written in Language1, 13 in Language2 and 11 in Language3, we can say that L2 is 40/13 times more powerful than L1, that L3 is 40/11 times more powerful than L1 and 13/11 times more powerful than L2.
So, whereas there’s not much difference between the amount of program code, in terms of numbers of programmer-produced lines or statements, that can be accurately realised in the same period using different languages, there can be considerable variation in what the code written in those different languages can achieve.
The concept of the relative power of programming languages has been explored before. Readers familiar with Function Point Analysis may have come across the language list authored by Capers Jones, a trailblazer and luminary in the study of software productivity, and published by Software Productivity Research in the 1990s. As an example, Capers Jones reckons that C++ out-achieves PASCAL in the approximate ratio 9:5. In other words, C++ is very nearly twice as productive as PASCAL for realising the equivalent functionality.
This is what we mean by relative power.
But how did these differences arise? And why doesn’t everyone use L3? Let’s start with a short history.
Pedigree
Every programming language is in some way a representation of the operations performed by a computer, though there may be many intermediate levels of translation and interpretation between the statements coded by the programmer on the one hand and the computer’s raw binary on the other. For example, to copy two characters from one part of the computer’s memory to another within IBM 360 architecture (born in the 1960s and still going strong) might look like this in binary:
1101001000000001001100000000000101010000000001002
You can infer from the foregoing, if you didn’t already know, that binary notation consists solely of the digits 0 and 1. These values correspond to the possible settings of a two-way switch, namely On/Off (or Yes/No or Open/Close if you prefer). Electronic processors use a lot of these switches, at extreme speed, but that’s essentially it. So now you know – a computer is just a vast array of highly-trained miniature refrigerator doors.
As the above example shows, binary notation is unwieldy to say the least. However, a more concise representation can be obtained by portraying binary numbers (whose base is 2 and wherein only the digits 0 and 1 exist) in a system with a higher base. The more familiar decimal system (base 10), supporting the digits 0 to 9, is an example of a higher-based notation system.
For computing purposes it makes sense to use a system that is a superset of binary, that is, whose base is a power of 2. Octal (base 8, or 2 to the third power) and hexadecimal (base 16, or 2 to the fourth power) both satisfy this requirement, the choice generally reflecting the hardwired architecture of the system in question. It may occur to you that hexadecimal, being broader in scope than decimal, requires a way of representing the numbers 10 through 15 as single digits. You surmise correctly; the convention is to use the letters A through F.
As an example of how brevity is determined by base: the decimal number 123 is 01111011 in binary and 7B in hexadecimal (7 x 16 then plus 11). Using hexadecimal, the string of harpoons and tadpoles at the end of the first paragraph of this section dwindles to D20130015004.
Less of a risk to sanity, to be sure, but not exactly a-quiver with meaning. So now we invent Assembler Language. This development, though still corresponding one-for-one with binary (which is why it’s referred to as a ‘low-level’ language), allows the use of mnemonics in its instructions. Of course, you have to write a special program to handle the programmer’s mnemonic-based code, which begs the question: in what language is the special program itself written? But that’s another, albeit very interesting, story.
So we might see:
MVC 1(2,3),4(5)MVC is the mnemonic for MoVe Characters (though in actuality it’s a Copy that’s being performed)
In plain (?) English:
COPY 2 CHARACTERS TO THE STORAGE THAT IS ONE CHARACTER ON FROM THE ADDRESS IN GENERAL REGISTER 3 FROM THE STORAGE THAT IS 4 CHARACTERS ON FROM THE ADDRESS IN GENERAL REGISTER 5
In practice, the areas of storage concerned would normally be given names, and the Assembler programmer would code something like: MVC PUB,AUNTIE which is clearer (though maybe counter-intuitive: in English we’d normally move from somewhere to somewhere else; in Assembler we move to somewhere from somewhere else). PUB resolves into the 1(2,3) part of the above instruction and AUNTIE provides the rest.
You would still have to ensure that the correct values were in General Registers 3 and 5 before you started, and sundry other variables would need to have been primed – requiring maybe a half-dozen or so instructions before the Move could be performed, so… time for a more sophisticated mode of expression. Step up COBOL and show us what you can do:
MOVE AUNTIE TO PUB
That’s better, even though we’ve had to write another special program, a COMPILER, to translate our pseudo-English into computer-comprehensible instructions (and even though the Compiler itself runs on that very same computer. So what compiles the Compiler? I told you it was interesting).
I describe COBOL as ‘pseudo-English’ because the Compiler recognises a precise range of English words and hyphenated constructs ( IF, END-IF, MOVE, EVALUATE, SPACES, SUBTRACT, DIVIDE etc.), rejecting anything not in its vocabulary – you can MOVE AUNTIE TO PUB but you can’t DRIVE her there, nor can you MOVE AUNTIE FROM PUB (one should not infer anything about Auntie’s habits from these examples).
Not only is COBOL easier to read than Assembler, it also takes care of functions that otherwise would have to be explicitly coded, such as initialising the general registers and telling the operating system exactly where in memory the program has been loaded. Now all this is taken care of by the Compiler, which generates the required support instructions with no effort on the part of the programmer.
COBOL is an early example of what are called ‘high-level’ languages, where there is a one-to-many relationship between the instructions coded by the programmer and those generated therefrom by the compiler.
By the way, in case you were wondering, please note that clarity is not necessarily an intrinsic feature of linguistic sophistication, as illustrated by this excerpt from a Visual Basic 6 program:
Private Declare Function RegCreateKeyEx _
Lib "advapi32" Alias "RegCreateKeyExA" (ByVal hKey As Long, _
ByVal lpSubKey As String, ByVal Reserved As Long, ByVal lpClass As String, _
ByVal dwOptions As Long, ByVal samDesired As Long, _
ByRef lpSecurityAttributes As SECURITY_ATTRIBUTES, phkResult As Long, _
lpdwDisposition As Long) As Long
So that’s clear.
Purpose
So why not always opt for the most powerful language available? One reason is that the code generated, for example, by a COBOL Compiler tends to be far less efficient than hand-crafted Assembler. The functionality of a 1000-instruction Assembler program could typically be replicated in about 280 COBOL statements but from these 280 statements the COBOL compiler might generate as many as 3000 binary Assembler instructions. Such ‘verbosity’ could be significant if performance is an issue or computer memory is size-constrained. But there is another, more compelling reason.
In the first paragraph of the section titled Power (above) I purposely said ‘an equivalent’ rather than ‘the same’ operation. Why? Because some operations are simply not available in certain languages (you couldn’t use RPG to manipulate Windows-style forms, or XML for calculating square roots). This also explains why, given a choice, we might opt for an apparently less productive language – it addresses the problems that other languages cannot reach.
Yes, you can manipulate individual binary digits in COBOL but you really don’t want to. Trust me. Horses for courses, as they say. It wouldn’t be the same if Homer Simpson exclaimed ‘11000100011111001101011011001000’,3 would it?
1 EBCDIC stands for Extended Binary Coded Decimal Interchange Code. This is an 8-bit binary representation of the EBCDIC for the word HECK.
2 8-bit binary representation of the hexadecimal string occurring later in this section
3 You guessed correctly. It’s binary for the EBCDIC for D’OH.