"Frank D. Engel, Jr." wrote:
Based on a discussion taking place in another thread, I created a little program to help me test for mixed line endings in my source files; just thought I'd give it here in case it would help anyone else. Consider it public domain, no warranty, etc.
To use it, just compile (with GPC), then give it as command line arguments the name(s) of the text file(s) to be tested.
For example:
bash-2.05a$ ./countem countem.pas countem.pas
CRLF: 0 CR : 0 LF : 59
bash-2.05a$
I just thought someone might find this useful. No external dependencies, written and tested under MacOS X, but should just work in virtually any GPC-supported environment.
Name: countem.pas
countem.pas Type: application/x-unknown Encoding: base64 Description: countem.pas
I have quoted your binary attachment to meet normal usenet standards. It takes less resources (no base 64 encoding) and is safer as pure text within the message.
PROGRAM countem;
VAR f : FILE OF CHAR; cr, lf, cl, i : INTEGER; m : BOOLEAN; ch : CHAR;
BEGIN { Main Program }
IF (ParamCount < 1) THEN BEGIN WRITELN('Usage: countem <filename>'); HALT END; { Note that at least on *NIX, wildcards are expanded by the shell, so this allows 'countem *.pas' to work, for example } FOR i := 1 TO ParamCount DO BEGIN ASSIGN(f, ParamStr(i)); RESET(f); cr := 0; { counts Mac-type CR line endings } lf := 0; { counts *NIX-type LF line endings } cl := 0; { counts WinDOS-type CRLF line endings } m := FALSE; { helps to distinguish CR, LF from CRLF } WHILE NOT EOF(f) DO BEGIN READ(f, ch); IF m THEN BEGIN m := FALSE; IF (ch = CHR(10)) THEN INC(cl) ELSE IF (ch = CHR(13)) THEN BEGIN INC(cr); m := TRUE END ELSE INC(cr) END ELSE IF (ch = CHR(13)) THEN m := TRUE ELSE IF (ch = CHR(10)) THEN INC(lf) END; CLOSE(f); IF m THEN INC(cr); WRITELN(ParamStr(i)); WRITELN('---------------------'); WRITELN('CRLF: ', cl); WRITELN('CR : ', cr); WRITELN('LF : ', lf); WRITELN END
END.
Let me point out that your results are entirely dependant on how the OS and run-time handles line ending sequences. Remember that the default mode for files is text, and thus such translations would be enabled. For valid results you have to treat the file as binary, and you are then running into system variations. Probably the most portable method is basically:
TYPE byte = char; binfile = FILE OF byte;
VAR phyle : binfile;
BEGIN .... reset(phyle); WHILE NOT eof(phyle) DO BEGIN (* classify on the basis of phyle^ *) CASE ord(phyle^) OF 10: (* depends on char encoding *) 13: OTHERWISE (* ignore *) END; (* case *) get(phyle); END;
Also you should realize that, in Pascal, there is no reason for any end-of-line char or sequence to exist. When (for a text file) EOL is true, f^ is required to hold a blank. Many systems fail to implement this correctly. The raw file system may delimit lines in any way it pleases, including such things as counts in auxiliary streams, etc.
Your use of "read(f, ch)" above is the equivalent of:
ch := f^; get(f);
in any system that remotely implements any standard. Thus the presence of any so-called <lf> characters in your files is probably an illusion or an implementation failure.
A further mild criticism is that your detection of cr/lf sequences is in error, even if the non-standard assumptions made are valid.