Adriaan van Os wrote:
Frank Heckenbach wrote:
Jonas Maebe wrote:
What's taking so long currently, Adriaan, is probably the GPI imports. That's independent of the target, i.e. a Pascal to C++ converter would have to do it just the same, so its complexity is independent of how the output is structured. As I said, this is a separate issue which would be easier to tackle if the compiler was written in a high-level language (such as C++ or Pascal with templates). It's easier to find and experiment with efficient data structures (e.g., hash tables, trees) when they're readily available than when you have to manually implement them each and every time like in C (for the current GPC) and also in Pascal so far.
I agree that it is a separate issue and that it is easier to tackle in a future compiler. So, I am moving this to a new thread. We need not discuss it any further now, but I am still following up to answer your questions.
The problem is not so much the speed of GPI loading as such, but the fact that unit-recompilation is of order-2 (in GPC) instead of order-1 (as in FPC).
Imagine a program P that uses unit1 .. unitN, where each unit K uses unit1..unitK-1. Currently, a compile of program P with GPC
- triggers a compilation of unit1 and writes a unit1.gpi
- triggers a compilation of unit2, which uses unit1, so loads unit1.gpi and writes unit2.gpi
- triggers a compilation of unit3, which uses unit1 and unit2, so loads unit1.gpi and unit2.gpi and
writes unit3.gpi
- etcetera
- triggers a compilation of unitK, which uses unit1..unitK-1, so loads unit1.gpi..unitK-1.gpi and
writes unitK.gpi
- etcetera
So, unit1 is written once and loaded N-1 times, unit2 is written once and loaded N-2 times, etcetera, unitK is written once and loaded N-K times, etcetare. Therefore, N .gpi files are written and (N-1) + (N-2) + ... (N-N) = N * (N-1)/2 .gpi file are loaded.
Ah, I see. I thought the issue was that a unit import/export with N declarations took O(N^2) time. I think we fixed that (or at least in some cases).
In other words, compilation times increase quadratic with the number of used units. Improving .gpi load times doesn't help much (only by the square root of the load-time improvement) and the process will still be slow when N is large. The only real solution is to make compilation a linear process, where already-loaded .gpi files are not loaded again. My understanding is that this is difficult to accomplish with the current GPC back-end.
It's not about the backend, but the driver/compiler structure (gp (or "gpc --automake") calls gpc, and gpc calls gpc1), and it's not difficult, but impossible in the current structure, because each unit is compiled by a separate process which has to load all used interfaces.
The same problem exists with C/C++ header files and the common solution there is to use include guards http://en.wikipedia.org/wiki/Include_guard.
No, that's not the same problem. The problem there is that the compilation of a single source file would be O(N^2) if it includes N headers, and each header in turn includes the previous ones. (And it would lead to duplicate declarations.)
This problem would *also* occur in Pascal (so the whole compilation would be O(N^3)), if GPC (like probably all other Pascal compilers) didn't prevent it by keeping a list of imported interfaces and avoid duplicate imports.
BTW, is it the actual Mac OS X interfaces that are so huge, or your
It is the actual Mac OS X interfaces that are so huge,
So if I understand you correctly now, the OS interfaces span so many units. Now I understand why you suggested to output them all to a single C++ file (since they probably change rarely, only with upgrades, and then all at once).
Can't you just put them all in a single unit, or would it become so large that you couldn't compile it (for memory reasons or so)?
Another idea (which, if possible, works now, and might also be necessary for a future GPC, if any), is to reduce the number of dependencies (maybe requiring some restructuring). I find it hard to believe that really every unit requires all previous ones. Can't they be grouped in "topics" that are mostly independent of each other (so the "uses" graph would look more like a tree than a linear chain; e.g. a roughly binary tree would be O(N log N) which should be acceptable).
Frank