Based on a discussion taking place in another thread, I created a little program to help me test for mixed line endings in my source files; just thought I'd give it here in case it would help anyone else. Consider it public domain, no warranty, etc.
To use it, just compile (with GPC), then give it as command line arguments the name(s) of the text file(s) to be tested.
For example:
bash-2.05a$ ./countem countem.pas countem.pas --------------------- CRLF: 0 CR : 0 LF : 59
bash-2.05a$
I just thought someone might find this useful. No external dependencies, written and tested under MacOS X, but should just work in virtually any GPC-supported environment.
===== ======= Frank D. Engel, Jr.
Modify the equilibrium of the vertically-oriented particle decelerator to result in the reestablishment of its resistance to counterproductive atmospheric penetration.
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
"Frank D. Engel, Jr." wrote:
Based on a discussion taking place in another thread, I created a little program to help me test for mixed line endings in my source files; just thought I'd give it here in case it would help anyone else. Consider it public domain, no warranty, etc.
To use it, just compile (with GPC), then give it as command line arguments the name(s) of the text file(s) to be tested.
For example:
bash-2.05a$ ./countem countem.pas countem.pas
CRLF: 0 CR : 0 LF : 59
bash-2.05a$
I just thought someone might find this useful. No external dependencies, written and tested under MacOS X, but should just work in virtually any GPC-supported environment.
Name: countem.pas
countem.pas Type: application/x-unknown Encoding: base64 Description: countem.pas
I have quoted your binary attachment to meet normal usenet standards. It takes less resources (no base 64 encoding) and is safer as pure text within the message.
PROGRAM countem;
VAR f : FILE OF CHAR; cr, lf, cl, i : INTEGER; m : BOOLEAN; ch : CHAR;
BEGIN { Main Program }
IF (ParamCount < 1) THEN BEGIN WRITELN('Usage: countem <filename>'); HALT END; { Note that at least on *NIX, wildcards are expanded by the shell, so this allows 'countem *.pas' to work, for example } FOR i := 1 TO ParamCount DO BEGIN ASSIGN(f, ParamStr(i)); RESET(f); cr := 0; { counts Mac-type CR line endings } lf := 0; { counts *NIX-type LF line endings } cl := 0; { counts WinDOS-type CRLF line endings } m := FALSE; { helps to distinguish CR, LF from CRLF } WHILE NOT EOF(f) DO BEGIN READ(f, ch); IF m THEN BEGIN m := FALSE; IF (ch = CHR(10)) THEN INC(cl) ELSE IF (ch = CHR(13)) THEN BEGIN INC(cr); m := TRUE END ELSE INC(cr) END ELSE IF (ch = CHR(13)) THEN m := TRUE ELSE IF (ch = CHR(10)) THEN INC(lf) END; CLOSE(f); IF m THEN INC(cr); WRITELN(ParamStr(i)); WRITELN('---------------------'); WRITELN('CRLF: ', cl); WRITELN('CR : ', cr); WRITELN('LF : ', lf); WRITELN END
END.
Let me point out that your results are entirely dependant on how the OS and run-time handles line ending sequences. Remember that the default mode for files is text, and thus such translations would be enabled. For valid results you have to treat the file as binary, and you are then running into system variations. Probably the most portable method is basically:
TYPE byte = char; binfile = FILE OF byte;
VAR phyle : binfile;
BEGIN .... reset(phyle); WHILE NOT eof(phyle) DO BEGIN (* classify on the basis of phyle^ *) CASE ord(phyle^) OF 10: (* depends on char encoding *) 13: OTHERWISE (* ignore *) END; (* case *) get(phyle); END;
Also you should realize that, in Pascal, there is no reason for any end-of-line char or sequence to exist. When (for a text file) EOL is true, f^ is required to hold a blank. Many systems fail to implement this correctly. The raw file system may delimit lines in any way it pleases, including such things as counts in auxiliary streams, etc.
Your use of "read(f, ch)" above is the equivalent of:
ch := f^; get(f);
in any system that remotely implements any standard. Thus the presence of any so-called <lf> characters in your files is probably an illusion or an implementation failure.
A further mild criticism is that your detection of cr/lf sequences is in error, even if the non-standard assumptions made are valid.
CBFalconer wrote:
PROGRAM countem;
VAR f : FILE OF CHAR;
[...]
Let me point out that your results are entirely dependant on how the OS and run-time handles line ending sequences. Remember that the default mode for files is text,
Who says so? At least in GPC, that's the default only for textfiles.
and thus such translations would be enabled. For valid results you have to treat the file as binary, and you are then running into system variations. Probably the most portable method is basically:
TYPE byte = char; binfile = FILE OF byte;
How is this any more "binary" than `file of Char'?
Also you should realize that, in Pascal, there is no reason for any end-of-line char or sequence to exist.
That's true. -- But he only claimed portability to any GPC target, not any Pascal system. It's probably impossible to do this in pure standard (including Extended) Pascal. (But I'm ready to see examples to the contrary, especially from those who claim that standard/Extended Pascal is wholly suitable for real world programs ...)
Actually, if I read the standard correctly, the `Char' type doesn't need to contain anything but the digits.
When (for a text file) EOL is true, f^ is required to hold a blank. Many systems fail to implement this correctly. The raw file system may delimit lines in any way it pleases, including such things as counts in auxiliary streams, etc.
Your use of "read(f, ch)" above is the equivalent of:
ch := f^; get(f);
in any system that remotely implements any standard.
I think you're confusing `file of Char' with `Text'. Only textfiles have the special handling of line endings, `EOLn', etc.
A further mild criticism is that your detection of cr/lf sequences is in error, even if the non-standard assumptions made are valid.
I don't think so.
Frank
Frank Heckenbach wrote:
CBFalconer wrote:
PROGRAM countem;
VAR f : FILE OF CHAR;
[...]
Let me point out that your results are entirely dependant on how the OS and run-time handles line ending sequences. Remember that the default mode for files is text,
Who says so? At least in GPC, that's the default only for textfiles.
Ah - I didn't realize that gpc separated textfiles and file of char.
and thus such translations would be enabled. For valid results you have to treat the file as binary, and you are then running into system variations. Probably the most portable method is basically:
TYPE byte = char; binfile = FILE OF byte;
How is this any more "binary" than `file of Char'?
It isn't - I failed to read the source properly :-(
Also you should realize that, in Pascal, there is no reason for any end-of-line char or sequence to exist.
That's true. -- But he only claimed portability to any GPC target, not any Pascal system. It's probably impossible to do this in pure standard (including Extended) Pascal. (But I'm ready to see examples to the contrary, especially from those who claim that standard/Extended Pascal is wholly suitable for real world programs ...)
Actually, if I read the standard correctly, the `Char' type doesn't need to contain anything but the digits.
When (for a text file) EOL is true, f^ is required to hold a blank. Many systems fail to implement this correctly. The raw file system may delimit lines in any way it pleases, including such things as counts in auxiliary streams, etc.
Your use of "read(f, ch)" above is the equivalent of:
ch := f^; get(f);
in any system that remotely implements any standard.
I think you're confusing `file of Char' with `Text'. Only textfiles have the special handling of line endings, `EOLn', etc.
A further mild criticism is that your detection of cr/lf sequences is in error, even if the non-standard assumptions made are valid.
I don't think so.
And, re-reading that code, I agree with you.
CBFalconer wrote:
Ah - I didn't realize that gpc separated textfiles and file of char.
Yes, as required for (strict) standard conformance. ("Readln shall only be applied to textfiles." etc.)
Frank
On Wed, Aug 13, 2003 at 05:33:20AM +0200, Frank Heckenbach wrote:
CBFalconer wrote:
PROGRAM countem;
VAR f : FILE OF CHAR;
[...]
Let me point out that your results are entirely dependant on how the OS and run-time handles line ending sequences. Remember that the default mode for files is text,
Who says so? At least in GPC, that's the default only for textfiles.
and thus such translations would be enabled. For valid results you have to treat the file as binary, and you are then running into system variations. Probably the most portable method is basically:
TYPE byte = char; binfile = FILE OF byte;
How is this any more "binary" than `file of Char'?
Also you should realize that, in Pascal, there is no reason for any end-of-line char or sequence to exist.
That's true. -- But he only claimed portability to any GPC target, not any Pascal system. It's probably impossible to do this in pure standard (including Extended) Pascal. (But I'm ready to see examples to the contrary, especially from those who claim that standard/Extended Pascal is wholly suitable for real world programs ...)
Actually, if I read the standard correctly, the `Char' type doesn't need to contain anything but the digits.
And '''', per 6.1.9. Moreover, IIUIC, 6.10.* implicitly require that Char contains ' ', '-', '+', '.', and the letters 'a', 'e', 'f', 'l', 'r', 's', 't', 'u' in an implementation-defined case. Anyway, I seriously doubt that 6.4.2.2 d) was intented as "no letters at all is fine", it might be a bug in the standard.
Emil
Emil Jerabek wrote:
Actually, if I read the standard correctly, the `Char' type doesn't need to contain anything but the digits.
And '''', per 6.1.9.
Yes -- but it doesn't have to stand for the apostrophe. So it would be allowed, e.g., to let '''' mean the space, while ' ' is not valid (i.e., the space is not a stringÂcharacter), to maintain the required one-to-one correspondence.
Moreover, IIUIC, 6.10.* implicitly require that Char contains ' ', '-', '+', '.', and the letters 'a', 'e', 'f', 'l', 'r', 's', 't', 'u' in an implementation-defined case.
Indeed. Which leaves the question if 6.4.2.2 d) 2)/3) apply if only some letters exist. I wouldn't think so. So I propose the following character encoding for the "Really Stupid Pascal Compiler":
0 - 1 f 2 A 3 L 4 s 5 E 6 u 7 r 8 T 9 0 10 1 11 2 ... 18 9 19 (space) 20 . 21 +
This encoding has the obvious advantage that `fALsE' and `TruE' are represented by the characters 1 to 5 and 8 downto 5 respectively. So instead of storing them as string constants, a compiler could construct them using `for' loops internally which should add a lot in the areas of inefficiency and space overhead.
The space is following the last digits, so that (somewhat common) overrun errors when outputting digits manually will less likely result in a visible faults.
The fact that '-' < '0' < '+' is, of course, mathematically a big improvement over ASCII.
Furthermore, the digit '2' is represented by 11 which might be useful for Roman numeral applications.
For character-strings I suggest (see above) '''' to mean the space and ' ' to mean '.' and nothing else. Not allowing too many characters here will ease the work for the compiler, whereas the programmer can use `Chr', anyway.
Anyway, I seriously doubt that 6.4.2.2 d) was intented as "no letters at all is fine", it might be a bug in the standard.
Perhaps they mean that only upper or lower case letters are ok, but they don't seem to say so.
Frank
On Wed, Aug 13, 2003 at 04:26:13PM +0200, Frank Heckenbach wrote:
Emil Jerabek wrote:
Actually, if I read the standard correctly, the `Char' type doesn't need to contain anything but the digits.
And '''', per 6.1.9.
Yes -- but it doesn't have to stand for the apostrophe. So it would be allowed, e.g., to let '''' mean the space, while ' ' is not valid (i.e., the space is not a stringcharacter), to maintain the required one-to-one correspondence.
Moreover, IIUIC, 6.10.* implicitly require that Char contains ' ', '-', '+', '.', and the letters 'a', 'e', 'f', 'l', 'r', 's', 't', 'u' in an implementation-defined case.
Indeed. Which leaves the question if 6.4.2.2 d) 2)/3) apply if only some letters exist. I wouldn't think so. So I propose the following character encoding for the "Really Stupid Pascal Compiler":
0 - 1 f 2 A 3 L 4 s 5 E 6 u 7 r 8 T 9 0 10 1 11 2 ... 18 9 19 (space) 20 . 21 +
Great! A 5-bit encoding, still having 10 free slots for i18n extensions :)
Emil
Emil Jerabek wrote:
Indeed. Which leaves the question if 6.4.2.2 d) 2)/3) apply if only some letters exist. I wouldn't think so. So I propose the following character encoding for the "Really Stupid Pascal Compiler":
0 - 1 f 2 A 3 L 4 s 5 E 6 u 7 r 8 T 9 0 10 1 11 2 ... 18 9 19 (space) 20 . 21 +
Great! A 5-bit encoding, still having 10 free slots for i18n extensions :)
Indeed. Perhaps we should add one Cyrillic, one Japanese and one Klingon letter for a start.
OTOH, if we don't add any more characters we could encode it in 4.5 bits which is certainly worth considering.
Frank
--- Frank Heckenbach frank@g-n-u.de wrote:
Emil Jerabek wrote:
Indeed. Which leaves the question if 6.4.2.2 d) 2)/3) apply if
only
some letters exist. I wouldn't think so. So I propose the
following
character encoding for the "Really Stupid Pascal Compiler":
0 - 1 f 2 A 3 L 4 s 5 E 6 u 7 r 8 T 9 0 10 1 11 2 ... 18 9 19 (space) 20 . 21 +
Great! A 5-bit encoding, still having 10 free slots for i18n extensions :)
Indeed. Perhaps we should add one Cyrillic, one Japanese and one Klingon letter for a start.
OTOH, if we don't add any more characters we could encode it in 4.5 bits which is certainly worth considering.
Particularly for PACKED data types ;-)
Frank
-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E GPC To-Do list, latest features, fixed bugs: http://www.gnu-pascal.de/todo.html GPC download signing key: 51FF C1F0 1A77 C6C2 4482 4DDC 117A 9773 7F88 1707
===== ======= Frank D. Engel, Jr.
Modify the equilibrium of the vertically-oriented particle decelerator to result in the reestablishment of its resistance to counterproductive atmospheric penetration.
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
"Frank D. Engel, Jr." wrote:
--- Frank Heckenbach frank@g-n-u.de wrote:
Emil Jerabek wrote:
Indeed. Which leaves the question if 6.4.2.2 d) 2)/3) apply if
only
some letters exist. I wouldn't think so. So I propose the
following
character encoding for the "Really Stupid Pascal Compiler":
0 - 1 f 2 A 3 L 4 s 5 E 6 u 7 r 8 T 9 0 10 1 11 2 ... 18 9 19 (space) 20 . 21 +
Great! A 5-bit encoding, still having 10 free slots for i18n extensions :)
Indeed. Perhaps we should add one Cyrillic, one Japanese and one Klingon letter for a start.
OTOH, if we don't add any more characters we could encode it in 4.5 bits which is certainly worth considering.
Particularly for PACKED data types ;-)
We can store 7 of them in a 32 bit record, by using the technique of multiplying 7 values in the range 1..23. We can't allow a 0 char representation for this, but this allows expanding the char. set by 1 item. This is 4.57 bits per char.
CBFalconer wrote:
We can store 7 of them in a 32 bit record, by using the technique of multiplying 7 values in the range 1..23. We can't allow a 0 char representation for this, but this allows expanding the char. set by 1 item. This is 4.57 bits per char.
By using a special technique called bit-tuning, we can in fact store 7 of them in the range 1..24, which gives us an important extra Klingon character. 24^7=4586471424 > 4294967296= 2^32, so how is this done ? Well, (24^7)=(12^7) * 128. The numbers 7, 12 and 128 are important ... if you take a closer look at the seven octaves of a piano.
In music, an octave consists of 12 halftones on the diatonic tone scale (on which all western music is based), where the frequency proportion of an octave is 1:2. The seven octaves of a piano have a total frequency proportion of 1:2^7 = 1:128.
Now, of course, we don't only play octaves, which would be boring, but also fifths notes (7 halftones, frequency proportion 2:3), quarter notes (5 halftones, 3:4), major thirds (4 halftones, 4:5) and minor thirds (3 halftones, 5:6) etcetera.
We see that a fifth note plus a quarter note sum to an octave, with a frequency proportion of (2:3) * (3:4) = 2:4 = 1:2. Likewise, a major third plus a minor third sum to a quint, with a frequency proportion of (4:5) * (5"6) = 4:6 = 2:3.
Now, the quint is the crucial thing in bit-tuning techniques. Twelve of them form the circle of fifths, which fits exactly in the seven octaves of a piano. Those 12 quints sum to a frequency proportion of (3:2)^12 = 129,746... > 128 = 2^7, the frequency proportion of the 7 octaves. How is this done ? Well, a good piano tuner can tell you, some of the best indeed stretch the octaves somewhat in the descant (the high notes) although this is officially not allowed.
There is also a second technique, called bit resonance, used by the masters of the 16th and 17th century, but the technique has since been long forgotten, which I regret.
Not all platforms do support bit tuning (not to mention bit resonance) but --target=steinway-piano-grand does. This is a new platform, so it would be a good thing to start with a finger exercise, for example by writing a Turing machine for it. It has to use musical notation, of course, to make bit-tuning work ...
Regards,
Adriaan van Os