Hello,
My name is Scott A. Moore, I am the person who performed some of the ISO 7185 compliance testing on GPC. I have done a minor update to some of my ISO 7185 compliance testing material, and reran this against the current version of GPC (2.95). In the process, I have discovered a compliance problem with GPC.
Please reference the program:
============================================================== program test(output);
var f: text; c: char;
begin
rewrite(f); writeln(f, 'how now'); writeln(f, 'brown cow'); reset(f); write(''''); while not eof(f) do begin
if eoln(f) then write('<eoln>'); read(f, c); write(c)
end; write(''''); writeln(' s/b ''how now<eoln> brown cow<eoln> '''); rewrite(f); writeln(f, 'too much'); write(f, 'too soon'); reset(f); write(''''); while not eof(f) do begin
if eoln(f) then write('<eoln>'); read(f, c); write(c)
end; write(''''); writeln(' s/b ''too much<eoln> too soon<eoln> ''');
end. ================================================================ The results from GPC for running this command are as follows: ================================================================ C:\TEST>test 'how now<eoln> brown cow<eoln> ' s/b 'how now<eoln> brown cow<eoln> ' 'too much<eoln> too soon' s/b 'too much<eoln> too soon<eoln> '
C:\TEST> ================================================================ The first section of the test verifies that eoln in text file is replaced with space, GPC passes this.
The second section of the test verifies that eoln is automatically inserted at the end of a text file if it was terminated without a eoln. See ISO 7185 section 6.4.3.5:
===================================================================== There shall be a file-type that is denoted by the required structured-type-identifier text. The structure of the type denoted by text shall de ne an additional sequence-type whose values shall be designated lines. A line shall be a sequence cs ~S(end-of-line), where cs is a sequence of components possessing the char-type, and end-of-line shall represent a special component-value. Any assertion in clause 6 that the end-of-line value is attributed to a variable other than a component of a sequence shall be construed as an assertion that the variable has attributed to it the char-type value space. If l is a line, then no component of l other than l.last shall be an end-of-line. There shall be an implementation-defined subset of the set of char-type values, designated characters prohibited from text les; the effect of causing a character in that subset to be attributed to a component of either t.L or t.R for any text le t shall be implementation-dependent. A line-sequence, ls, shall be either the empty sequence or the sequence l ~ ls' where l is a line and ls' is a line-sequence. Every value t of the type denoted by text shall satisfy the following two rules:
a) If t.M = Inspection, then t.L ~t.R shall be a line-sequence. b) If t.M = Generation, then t.L ~t.R shall be ls ~cs, where ls is a line-sequence and cs is a sequence of components possessing the char-type. ====================================================================
Basically, the rules say that a text file shall consist of either whole lines, or be entirely empty. Whole lines are defined as "~S(end of line)", or a series of arbitrary characters terminated by end of line.
In effect, the program processor must arrange for eoln to be true at the end of a file, either by faking it while reading, or by forcing an eoln to be written at the end of a file during generation.
Thank you for your attention.
Scott Moore wrote:
My name is Scott A. Moore, I am the person who performed some of the ISO 7185 compliance testing on GPC. I have done a minor update to some of my ISO 7185 compliance testing material, and reran this against the current version of GPC (2.95). In the process, I have discovered a compliance problem with GPC.
BTW, 2.95 is the backend (GCC) version number. For Pascal-level reports (as opposed to code-generation issues), usually the GPC version number is more important. `gpc -v' will tell both. But in this case, I think it applies to all recent GPC versions, anyway ...
The second section of the test verifies that eoln is automatically inserted at the end of a text file if it was terminated without a eoln. See ISO 7185 section 6.4.3.5:
Basically, the rules say that a text file shall consist of either whole lines, or be entirely empty. Whole lines are defined as "~S(end of line)", or a series of arbitrary characters terminated by end of line.
In effect, the program processor must arrange for eoln to be true at the end of a file, either by faking it while reading, or by forcing an eoln to be written at the end of a file during generation.
(BTW, I'm not even sure if the latter would be sufficient, since the condition then wouldn't be satisfied for files written in other ways.)
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
So I've fixed it now, but only in the standard Pascal modes (`--classic-pascal', `--extended-pascal'). Though usually we try to avoid changing the behaviour based on the dialect options (only restrict extensions), there are some cases where this seems preferable (such as default width for writing Booleans, or no compile-time error for division by zero) ...
I'm not attaching a patch here, since it requires changes spread over several source files which might conflict with other changes I've made meanwhile. But it will be ok in the next GPC release.
If you don't mind, I'll put your test program in the test suite for regression testing (scott1.pas, with `{$classic-pascal}' inserted).
Frank
----- Original Message ----- From: "Frank Heckenbach" ih8mj@fjf.gnu.de To: gpc@gnu.de Sent: Tuesday, March 23, 2004 5:20 AM Subject: Re: ISO 7185 compliance issue for GPC
Scott Moore wrote:
My name is Scott A. Moore, I am the person who performed some of the ISO 7185 compliance testing on GPC. I have done a minor update to some of my ISO 7185 compliance testing material, and reran this against the current version of GPC (2.95). In the process, I have discovered a compliance problem with GPC.
BTW, 2.95 is the backend (GCC) version number. For Pascal-level reports (as opposed to code-generation issues), usually the GPC version number is more important. `gpc -v' will tell both. But in this case, I think it applies to all recent GPC versions, anyway ...
Appreciate the info.
The second section of the test verifies that eoln is automatically inserted at the end of a text file if it was terminated without a eoln. See ISO 7185 section 6.4.3.5:
Basically, the rules say that a text file shall consist of either whole lines, or be entirely empty. Whole lines are defined as "~S(end of line)", or a series of arbitrary characters terminated by end of line.
In effect, the program processor must arrange for eoln to be true at the end of a file, either by faking it while reading, or by forcing an eoln to be written at the end of a file during generation.
(BTW, I'm not even sure if the latter would be sufficient, since the condition then wouldn't be satisfied for files written in other ways.)
For the example of IP Pascal, the support library corrects input from text files both as far detecting the lack of proper trailing eoln, and corrects it, as well as allowing any form of eoln to appear, including cr/lf, lf/cf, cr or lf alone. A state variable is used to perform all of this, ie., the state of input is tracked, and corrected as needed. The utility of this really shows when I move from Windows to Linux to Mac. All of them use different line endings, but none of the programs compiled with IP Pascal care, they can input from any system at any time. For example, I can move a file from Unix to Windows, and it reads properly without any special options or other arrangements.
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
I agree, and heartly recommend the method used in IP Pascal for this, which is completely ISO 7185 compatible. The ISO rules only specify the special handling on files of "text". A declaration "file of char" (which text was originally defined to be equivalent to in Wirth prestandard) is, as allowed in the standard, truly a linear file of characters. This gives any program wanting %100 control of its output format that ability.
So I've fixed it now, but only in the standard Pascal modes (`--classic-pascal', `--extended-pascal'). Though usually we try to avoid changing the behaviour based on the dialect options (only restrict extensions), there are some cases where this seems preferable (such as default width for writing Booleans, or no compile-time error for division by zero) ...
I'm not attaching a patch here, since it requires changes spread over several source files which might conflict with other changes I've made meanwhile. But it will be ok in the next GPC release.
Ok, I'll look for that. Thanks for the fast turnaround.
If you don't mind, I'll put your test program in the test suite for regression testing (scott1.pas, with `{$classic-pascal}' inserted).
With my complements.
Frank
-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E GPC To-Do list, latest features, fixed bugs: http://www.gnu-pascal.de/todo.html GPC download signing key: 51FF C1F0 1A77 C6C2 4482 4DDC 117A 9773 7F88 1707
Scott Moore wrote:
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
I agree, and heartly recommend the method used in IP Pascal for this, which is completely ISO 7185 compatible. The ISO rules only specify the special handling on files of "text". A declaration "file of char" (which text was originally defined to be equivalent to in Wirth prestandard) is, as allowed in the standard, truly a linear file of characters. This gives any program wanting %100 control of its output format that ability.
That's true, but it has some drawbacks. First, you have to deal with different line endings yourself then. Second, it may be less comfortable, and also less efficient (reading a char vs. a line at a time).
Frank
Frank Heckenbach wrote:
Scott Moore wrote:
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
I agree, and heartly recommend the method used in IP Pascal for this, which is completely ISO 7185 compatible. The ISO rules only specify the special handling on files of "text". A declaration "file of char" (which text was originally defined to be equivalent to in Wirth prestandard) is, as allowed in the standard, truly a linear file of characters. This gives any program wanting %100 control of its output format that ability.
That's true, but it has some drawbacks. First, you have to deal with different line endings yourself then. Second, it may be less comfortable, and also less efficient (reading a char vs. a line at a time).
IIRC 'text' is not a reserved word, only a predefined type. Thus the question devolves to "what is that type". If we redefine it, are the text files input and output still available in the affected scopes? Are input and output predefined as text files?
----- Original Message ----- From: "CBFalconer" cbfalconer@yahoo.com To: gpc@gnu.de Sent: Wednesday, March 24, 2004 4:22 AM Subject: Re: ISO 7185 compliance issue for GPC
Frank Heckenbach wrote:
Scott Moore wrote:
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
I agree, and heartly recommend the method used in IP Pascal for this, which is completely ISO 7185 compatible. The ISO rules only specify the special handling on files of "text". A declaration "file of char" (which text was originally defined to be equivalent to in Wirth prestandard) is, as allowed in the standard, truly a linear file of characters. This gives any program wanting %100 control of its output format that ability.
That's true, but it has some drawbacks. First, you have to deal with different line endings yourself then. Second, it may be less comfortable, and also less efficient (reading a char vs. a line at a time).
IIRC 'text' is not a reserved word, only a predefined type. Thus the question devolves to "what is that type". If we redefine it, are the text files input and output still available in the affected scopes? Are input and output predefined as text files?
Text is indeed a type. In Wirths original language it was defined to be "file of char", which implied that the compiler be ready to recognize "file of char" as the special file applicable to writeln readln, page, etc.
The standard simply specified that only the (predefined) type text would get such treatment. For example:
type text = file of char;
Appearing in the program would break the association to the original type, and:
var f: text;
...
writeln(f)
Would no longer be valid.
Input and output, only in regards to the header files, are predefined as text.
On Wed, Mar 24, 2004 at 07:22:35AM -0500, CBFalconer wrote:
Frank Heckenbach wrote:
Scott Moore wrote:
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
I agree, and heartly recommend the method used in IP Pascal for this, which is completely ISO 7185 compatible. The ISO rules only specify the special handling on files of "text". A declaration "file of char" (which text was originally defined to be equivalent to in Wirth prestandard) is, as allowed in the standard, truly a linear file of characters. This gives any program wanting %100 control of its output format that ability.
That's true, but it has some drawbacks. First, you have to deal with different line endings yourself then. Second, it may be less comfortable, and also less efficient (reading a char vs. a line at a time).
IIRC 'text' is not a reserved word, only a predefined type. Thus the question devolves to "what is that type". If we redefine it, are the text files input and output still available in the affected scopes? Are input and output predefined as text files?
Yes, IMHO. As usual in Pascal, shadowing a type name has no effect on variables possesing the type.
Emil
----- Original Message ----- From: "Frank Heckenbach" ih8mj@fjf.gnu.de To: gpc@gnu.de Sent: Tuesday, March 23, 2004 2:40 PM Subject: Re: ISO 7185 compliance issue for GPC
Scott Moore wrote:
Well, first I must say that this is a point where I consider the standard broken. (E.g., I have a text editor written in Pascal, and I want it to be able to handle files with or without trailing EOLn's, without appending them either when reading or writing the files.)
I agree, and heartly recommend the method used in IP Pascal for this, which is completely ISO 7185 compatible. The ISO rules only specify the special handling on files of "text". A declaration "file of char" (which text was originally defined to be equivalent to in Wirth prestandard) is, as allowed in the standard, truly a linear file of characters. This gives any program wanting %100 control of its output format that ability.
That's true, but it has some drawbacks. First, you have to deal with different line endings yourself then. Second, it may be less comfortable, and also less efficient (reading a char vs. a line at a time).
Frank
Well, note that ISO 7185 has no construct for reading a line at a time, so its all off the standard in any case. To me that means "file of char", and a whole host of special procedures if that's wanted.
I guess I have become biased. I created the idea of "file of char" bypass because I wanted to "have control", but I have since come to *love* the line ending filtering specified by the standard (and it was very much a creature of the standard, and not the original language). It simplifies code and regularizes line endings, even across multiple operating systems. It should be slow, but the common effect I have observed is that I can dump output much faster to a file than to the screen, which means that GUI output overwhelms the actual serial handling by orders of magnitude.
Scott Moore wrote:
That's true, but it has some drawbacks. First, you have to deal with different line endings yourself then. Second, it may be less comfortable, and also less efficient (reading a char vs. a line at a time).
Well, note that ISO 7185 has no construct for reading a line at a time, so its all off the standard in any case.
As you probably know, Extended Pascal is also a standard (ISO 10206) and it does provide a string type. You might not want to use it, but my objection was that I'd like to read files a line at a time in my programs, while noting the presence or absence of a trailing EOLn in the file, and it still stands. With EP means I could do it, except for the standards' EOLn semantics ...
I guess I have become biased. I created the idea of "file of char" bypass because I wanted to "have control", but I have since come to *love* the line ending filtering specified by the standard (and it was very much a creature of the standard, and not the original language). It simplifies code and regularizes line endings, even across multiple operating systems. It should be slow, but the common effect I have observed is that I can dump output much faster to a file than to the screen, which means that GUI output overwhelms the actual serial handling by orders of magnitude.
GUI output may well be slower, but I don't necessarily want to dump the whole file at once. In my text editor, I read the file and only dump a screenful until the user starts moving. So the reading speed is relevant.
Frank
Scott Moore wrote:
... snip ...
write(''''); while not eof(f) do begin
if eoln(f) then write('<eoln>'); read(f, c); write(c)
end; write(''''); writeln(' s/b ''too much<eoln> too soon<eoln> ''');
end.
The results from GPC for running this command are as follows:
C:\TEST>test 'how now<eoln> brown cow<eoln> ' s/b 'how now<eoln> brown cow<eoln> ' 'too much<eoln> too soon' s/b 'too much<eoln> too soon<eoln> '
... snip ...
In effect, the program processor must arrange for eoln to be true at the end of a file, either by faking it while reading, or by forcing an eoln to be written at the end of a file during generation.
I think I disagree with the test. A file needs to be correctly read by:
WHILE NOT eof(f) DO get(f); WHILE not eoln(f) DO BEGIN output^ = f^; put(output); get(f); END; writeln; END;
and the shortcuts of using read and write should be correct if they are correctly implemented. FOR MALFORMED files this means that the appearance of eof during reading must set the eoln condition if not already set, and that a get must not be an error if eoln is true and the file is at actual eof. Now we have to reconcile this with the need for error if another get is performed, but not if another eoln test is made. This means an invisible flag somewhere to me, reset by performing a get when eoln is true and set by reaching the actual (invisible) eof.
It should not be possible to generate such a malformed text file within the Pascal system. The act of closing should detect the missing final writeln, and perform it. It doesn't matter whether the close is the result of a reset or rewrite or scope exit, or even an extension standard procedure close. The first action in those cases should be: "If the file is open, close it." There may be gyrations needed to preserve the file name under a reset or rewrite. Note that all this implies programmer invisible automatic initialization of a file variable. It is not trivial to get it all right.
At any rate, if we can't generate such a file, it is hard to test for proper action on reading them without having test files generated outside the system. Thus this business really becomes a matter of QOI rather than standards compliance.
----- Original Message ----- From: "CBFalconer" cbfalconer@yahoo.com To: gpc@gnu.de Sent: Wednesday, March 24, 2004 4:14 AM Subject: Re: ISO 7185 compliance issue for GPC
Scott Moore wrote:
... snip ...
write(''''); while not eof(f) do begin
if eoln(f) then write('<eoln>'); read(f, c); write(c)
end; write(''''); writeln(' s/b ''too much<eoln> too soon<eoln> ''');
end.
The results from GPC for running this command are as follows:
C:\TEST>test 'how now<eoln> brown cow<eoln> ' s/b 'how now<eoln> brown cow<eoln> ' 'too much<eoln> too soon' s/b 'too much<eoln> too soon<eoln> '
... snip ...
In effect, the program processor must arrange for eoln to be true at the end of a file, either by faking it while reading, or by forcing an eoln to be written at the end of a file during generation.
I think I disagree with the test. A file needs to be correctly read by:
WHILE NOT eof(f) DO get(f); WHILE not eoln(f) DO BEGIN output^ = f^; put(output); get(f); END; writeln; END;
and the shortcuts of using read and write should be correct if they are correctly implemented. FOR MALFORMED files this means that the appearance of eof during reading must set the eoln condition if not already set, and that a get must not be an error if eoln is true and the file is at actual eof. Now we have to reconcile this with the need for error if another get is performed, but not if another eoln test is made. This means an invisible flag somewhere to me, reset by performing a get when eoln is true and set by reaching the actual (invisible) eof.
It should not be possible to generate such a malformed text file within the Pascal system. The act of closing should detect the missing final writeln, and perform it. It doesn't matter whether the close is the result of a reset or rewrite or scope exit, or even an extension standard procedure close. The first action in those cases should be: "If the file is open, close it." There may be gyrations needed to preserve the file name under a reset or rewrite. Note that all this implies programmer invisible automatic initialization of a file variable. It is not trivial to get it all right.
At any rate, if we can't generate such a file, it is hard to test for proper action on reading them without having test files generated outside the system. Thus this business really becomes a matter of QOI rather than standards compliance.
Probally ideally an implementation should accept badly formed files, and refuse to generate them. I think you have described well why I don't attempt the latter, and "correction on read" (if you will) is a universal solution in that it accepts other processors' badly formed files as well.
And yes, I do test with externally created files myself. However, there are other reasons to do that as well, for example various line endings (like Unix or Windows) are accepted but only one type is generated, etc.
Scott Moore wrote:
----- Original Message ----- From: "CBFalconer" cbfalconer@yahoo.com To: gpc@gnu.de Sent: Wednesday, March 24, 2004 4:14 AM Subject: Re: ISO 7185 compliance issue for GPC
Scott Moore wrote:
... snip ...
write(''''); while not eof(f) do begin
if eoln(f) then write('<eoln>'); read(f, c); write(c)
end; write(''''); writeln(' s/b ''too much<eoln> too soon<eoln> ''');
end.
The results from GPC for running this command are as follows:
C:\TEST>test 'how now<eoln> brown cow<eoln> ' s/b 'how now<eoln> brown cow<eoln> ' 'too much<eoln> too soon' s/b 'too much<eoln> too soon<eoln> '
... snip ...
In effect, the program processor must arrange for eoln to be true at the end of a file, either by faking it while reading, or by forcing an eoln to be written at the end of a file during generation.
I think I disagree with the test. A file needs to be correctly read by:
WHILE NOT eof(f) DO get(f); WHILE not eoln(f) DO BEGIN output^ = f^; put(output); get(f); END; writeln; END;
and the shortcuts of using read and write should be correct if they are correctly implemented. FOR MALFORMED files this means that the appearance of eof during reading must set the eoln condition if not already set, and that a get must not be an error if eoln is true and the file is at actual eof. Now we have to reconcile this with the need for error if another get is performed, but not if another eoln test is made. This means an invisible flag somewhere to me, reset by performing a get when eoln is true and set by reaching the actual (invisible) eof.
It should not be possible to generate such a malformed text file within the Pascal system. The act of closing should detect the missing final writeln, and perform it. It doesn't matter whether the close is the result of a reset or rewrite or scope exit, or even an extension standard procedure close. The first action in those cases should be: "If the file is open, close it." There may be gyrations needed to preserve the file name under a reset or rewrite. Note that all this implies programmer invisible automatic initialization of a file variable. It is not trivial to get it all right.
BTW, we already do automatic initialization of file variables, and we have more than one invisible flag hanging around. So that's not new problems (though there currently is a bug of automatically closing a file local to a routine that's left by a nonlocal goto, ugh).
And, of course, noone ever claimed that anything of our I/O system was trivial, in particular since we try to support both standard and Borland I/O together ...
At any rate, if we can't generate such a file, it is hard to test for proper action on reading them without having test files generated outside the system. Thus this business really becomes a matter of QOI rather than standards compliance.
Probally ideally an implementation should accept badly formed files, and refuse to generate them. I think you have described well why I don't attempt the latter, and "correction on read" (if you will) is a universal solution in that it accepts other processors' badly formed files as well.
AFAICS, correction on either reading or writing would be sufficient for standard compliance, and for the same reasons that Scott gave, I prefer to do it on reading. (And on `Extend', but that's an Extended Pascal feature, so I suppose both of you won't care. ;-)
But as I try to explain in the other mail, I also want a way to bypass this EOLn "fixing", without giving up using `Text' files. Currently, I've coupled it to the dialect options in GPC, but if it's generally preferred to have a special flag that can be set per file, I think I could also live with that ...
Frank