Dear Kevan,
you wrote:
You propose a GPC "frontend" written in Pascal. Given a Pascal program as input, this frontend will produce a "Pascal data structure in memory" (I'm quoting you there, but I'm pretty sure I know what you mean).
Correct so far.
The existing GPC frontend is written in C. Given a Pascal program as input, this existing frontend produces... what?
A C data structure in memory, the so-called TREE_NODEs. Some C structs (records in Pascal) containing pointers to other structs, and so on.
I thought it was GCC IR code.
Yes. GCC IR code (in memory) are the TREE_NODEs.
I'm assuming that Step 1 produces something that works before we complete Step 2,
Well, it works, but it does not produce code.
Step 1 is not a compiler, but just one half of it, the frontend.
We can do a lot of things with the frontend alone. In particular we can check for many sorts of errors, so we know that the Pascal program we want to compile is valid.
To really compile a program, we need the other half of the compiler, the backend. In the current GPC, we hand over the TREE_NODEs to some C functions which optimize them and produce assembler code, which is then processed further to get an executable.
With the proposed new GPC, we must write some Pascal code which hands over the "Pascal data structure in memory" to something which can produce an executable. IMHO the most promising way to do that is to output C++ code and then call g++ to produce an executable. (See Frank's page http://fjf.gnu.de/gpc-future.html for a detailed discussion.) Others on this list seem to prefer LLVM, Ada, Modula2, or whatever.
Rewriting the frontend means to read complicated C code, which is much more difficult than to write it. Realistically this can only be done by the "old guys" - Waldek, Frank, and/or myself. On the other hand, creating the backend means to write a new Pascal program, so everyone on this list can, in principle, do that. This is the reason why I separate both steps.
Peter