The streaming (essentially a JIT) was actually from the early architecture three weeks ago. Though I'm glad you've read even the first post. On the current architecture performance hasn't been a target yet, though the core hasn't changed, I could build a MIX that streams and it would reach the same benchmark.
And I love the question. A lot of the complexity is coming from the management of seams, places where we have to go from one representation of information to another. The tooling, diagnostics, and optimization passes are as large as they are precisely because of these seams. Consider a liveness pass in LLVM, which spends a lot of time reconstructing information thrown away by the compiler so it could emit SSA. In GDSL, a liveness pass is simply handlers in the e_stage, in the example I snuck into GDSL-C print statements stamp liveness tokens onto their children via qualifiers and at assignment nodes, those without such tokens are killed. I can do the logic in a straightforward manner because we have all the information to work with, no seams, no SSA to derive scopes from, thus why a subset of it fits in 80 lines instead of 80,000.
And I love the question. A lot of the complexity is coming from the management of seams, places where we have to go from one representation of information to another. The tooling, diagnostics, and optimization passes are as large as they are precisely because of these seams. Consider a liveness pass in LLVM, which spends a lot of time reconstructing information thrown away by the compiler so it could emit SSA. In GDSL, a liveness pass is simply handlers in the e_stage, in the example I snuck into GDSL-C print statements stamp liveness tokens onto their children via qualifiers and at assignment nodes, those without such tokens are killed. I can do the logic in a straightforward manner because we have all the information to work with, no seams, no SSA to derive scopes from, thus why a subset of it fits in 80 lines instead of 80,000.