Re: Line Endings

12 Aug 2003

      "Frank D. Engel, Jr." wrote:
...
Based on a discussion taking place in another thread, I created a
little program to help me test for mixed line endings in my source
files; just thought I'd give it here in case it would help anyone
else.  Consider it public domain, no warranty, etc.
To use it, just compile (with GPC), then give it as command line
arguments the name(s) of the text file(s) to be tested.
For example:
bash-2.05a$ ./countem countem.pas
countem.pas

CRLF: 0
CR  : 0
LF  : 59
bash-2.05a$
I just thought someone might find this useful.  No external
dependencies, written and tested under MacOS X, but should
just work in virtually any GPC-supported environment.

                 Name: countem.pas

countem.pas       Type: application/x-unknown
                 Encoding: base64
              Description: countem.pas
I have quoted your binary attachment to meet normal usenet
standards.  It takes less resources (no base 64 encoding) and is
safer as pure text within the message.
...
PROGRAM countem;
VAR
    f             : FILE OF CHAR;
    cr, lf, cl, i : INTEGER;
    m             : BOOLEAN;
    ch            : CHAR;
BEGIN { Main Program }
IF (ParamCount < 1) THEN BEGIN
    WRITELN('Usage: countem <filename>');
    HALT
END;

{ Note that at least on *NIX, wildcards are expanded by the
  shell, so this allows 'countem *.pas' to work, for example }

FOR i := 1 TO ParamCount DO BEGIN
    ASSIGN(f, ParamStr(i));
    RESET(f);

    cr := 0;    { counts Mac-type CR line endings }
    lf := 0;    { counts *NIX-type LF line endings }
    cl := 0;    { counts WinDOS-type CRLF line endings }
    m := FALSE; { helps to distinguish CR, LF from CRLF }

    WHILE NOT EOF(f) DO BEGIN
        READ(f, ch);

        IF m THEN BEGIN
            m := FALSE;
            IF (ch = CHR(10)) THEN
                INC(cl)
            ELSE IF (ch = CHR(13)) THEN BEGIN
                INC(cr);
                m := TRUE
            END ELSE
                INC(cr)
        END ELSE IF (ch = CHR(13)) THEN
            m := TRUE
        ELSE IF (ch = CHR(10)) THEN
            INC(lf)
    END;

    CLOSE(f);

    IF m THEN
        INC(cr);

    WRITELN(ParamStr(i));
    WRITELN('---------------------');
    WRITELN('CRLF: ', cl);
    WRITELN('CR  : ', cr);
    WRITELN('LF  : ', lf);
    WRITELN
END

END.
Let me point out that your results are entirely dependant on how
the OS and run-time handles line ending sequences.  Remember that
the default mode for files is text, and thus such translations
would be enabled.  For valid results you have to treat the file as
binary, and you are then running into system variations.  Probably
the most portable method is basically:
TYPE
      byte    = char;
      binfile = FILE OF byte;
VAR
      phyle   : binfile;
BEGIN
   ....
   reset(phyle);
   WHILE NOT eof(phyle) DO BEGIN
      (* classify on the basis of phyle^ *)
      CASE ord(phyle^) OF
10:      (* depends on char encoding *)
13:
      OTHERWISE
         (* ignore *)
         END; (* case *)
      get(phyle);
      END;
Also you should realize that, in Pascal, there is no reason for
any end-of-line char or sequence to exist.  When (for a text file)
EOL is true, f^ is required to hold a blank.  Many systems fail to
implement this correctly.  The raw file system may delimit lines
in any way it pleases, including such things as counts in
auxiliary streams, etc.
Your use of "read(f, ch)" above is the equivalent of:
ch := f^; get(f);
in any system that remotely implements any standard.  Thus the
presence of any so-called <lf> characters in your files is
probably an illusion or an implementation failure.
A further mild criticism is that your detection of cr/lf sequences
is in error, even if the non-standard assumptions made are valid.
-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   http://cbfalconer.home.att.net  USE worldnet address!

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Line Endings