←back to thread

122 points phsilva | 5 comments | | HN request time: 0.967s | source
Show context
riffraff ◴[] No.43111815[source]
How does this differ from direct threading interpreters?

It seems like it solves the same problem (saving the function call overhead) and has the same downsides (requires non-standard compiler extensions)

EDIT: it seems the answer is that compilers do not play well with direct-threaded interpreters and they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

replies(3): >>43112497 #>>43112649 #>>43116851 #
1. haberman ◴[] No.43116851[source]
This is a great summary. When Mike wrote the message you linked, his conclusion was that you have to drop to assembly to get reasonable code for VM interpreters. Later we developed the "musttail" technique which was able to match his assembly language sequences using C. This makes C a viable option for VM interpreters, even if you want best performance, as long as your compiler supports musttail.

> they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

It's not the size of the function that is the primary problem, it is the fully connected control flow that gums everything up. The register allocator is trying to dynamically allocate registers through each opcode's implementation, but it also has to connect the end of every opcode with the beginning of every opcode, from a register allocation perspective.

The compiler doesn't understand that every opcode has basically the same set of "hot" variables, which means we benefit from keeping those hot variables in a fixed set of registers basically all of the time.

With tail calls, we can communicate a fixed register allocation to the compiler through the use of function arguments, which are always passed in registers. When we pass this hot data in function arguments, we force the compiler to respect this fixed register allocation, at least at the beginning and the end of each opcode. Given that constraint, the compiler will usually do a pretty good job of maintaining that register allocation through the entire function.

replies(2): >>43118006 #>>43120735 #
2. riffraff ◴[] No.43118006[source]
thanks for the explanation!
3. 10000truths ◴[] No.43120735[source]
I feel like using calling conventions to massage the compiler's register allocation strategy is a hack. If the problem is manual control over register allocation, then the ideal solution should be... well, exactly that and no more? An annotation for local variables indicating "always spill this" (for cold-path locals) or "never spill this or else trigger a build error" (for hot-path locals). Isn't that literally why the "register" keyword exists in C? Why don't today's C compilers actually use it?
replies(1): >>43132117 #
4. haberman ◴[] No.43132117[source]
If the tail calling pattern made the code ugly, I would be more inclined to agree with this. But putting each opcode in its own function isn't so bad: it seems just as readable, if not more so, than a mondo function that implements every opcode.

By contrast, a mondo function that also has a bunch of register allocation annotations seems less readable.

replies(1): >>43134148 #
5. 10000truths ◴[] No.43134148{3}[source]
I don't see how a hypothetical __attribute__((never_spill)) annotation on local variables would preclude splitting opcode logic into separate functions. It just means those functions would have to be inlined into the interpreter loop to avoid conflicts with calling convention constraints.