Steven D'Aprano wrote:
On Fri, 30 Jul 2010 05:48:29 pm Frank Heckenbach wrote:
A typical example for me is BP (or then, TP) that was written, for all we know, with a parser hand-written in assembler. They designed some problematic syntax, worst of all "^C" style character constants. A parser generator would have told them right away that they cause serious problems (and thus were a bad idea). Writing a parser by hand doesn't tell you so, so they happily added them, probably not even noting that their own new feature didn't work in many cases because they had not thought of them.
I'm sorry, I don't understand what you mean by "^C" style constants, or why they are a bad idea. Can you explain please?
Borland had the brilliant idea of adding the syntax ^C for character constants to represent "Ctrl-C", i.e. Chr (3).
The major purpose seemed to be that key handlers could be written like this:
case ReadKey of ^A: ...; ^B: ...; end;
The problem? "^" is already used in Pascal syntax, to define pointer types and to dereference pointers.
At first sight, this might seem harmless because it might seem these occur only in different contexts. But it isn't so, and that's why I said an automatic parser generator helps, because it finds the problem immediately (which Bison did for us when we implemented this feature):
type C = Integer; { or whatever } X = ^C; { pointer to C } Y = ^C .. ^D; { character subrange }
As you see, X and Y look the same until "..", but the part before ".." has a radically different meaning. It's not impossible to handle this -- in fact GPC does it with some tricks. But BP itself can't handle it, which obviously means they didn't understand their own feature.
Of course, the whole issue is so ridiculous because this feature is so superfluous. It's a new syntax element for the whole purpose of defining 26 possible constants.(*) It would have been far easier (and unproblematic) just to define them as symbolic constants, say in the CRT unit:
const CtrlA = Chr (1); [...]
(Sure "CtrlA" ist longer to type in the source code than "^C", but for a complete alphabet handling, that's a full 78 characters more -- even for 1980s PCs not an issue.)
(*) Actually that's not quite correct. BP doesn't only accept letters after "^" in this meaning, but any character -- yes, including even "{". I leave it to your darkest fantasies to imagine the ambiguities this can cause. (I suppose this was unintentional, given that the formula which characters they produce is quite strange and only makes sense for letters, but how hard can it be to check that the character is actually a letter?)
Frank