Intermediate MUF Tutorial - Assembling Constants And Calls

Go to the first, previous, next, last section, table of contents.

Assembling Constants And Calls

Much of the useful work done by a program consists of loading needed constants onto the stack, and then calling functions on them. Let's see how to do that. We'll assume the *fun* and *asm* variables established in the previous section are still with us.

stack:
*asm* reset
stack:
"Hello, world!\n" *asm* assembleConstant
stack:
#', *asm* assembleCall
stack:
nil 0 *fun* *asm* finishAssembly --> *cfn*
stack:
*cfn* call{ -> }
Hello, world!
stack:

Whee -- everything should be so easy!

If we peek at the disassembly of *cfn* we will find:

constants:
0: "Hello, world!
"
code bytes:
00: 34 00 01 3f 2e
code disassembly:
000:      34 00       GETk   0
002:   00 3f          WRITE_OUTPUT_STREAM
004:      2e          RETURN

Let's cover some fine points not obvious from inspection of the above.

The assembleConstant function tells the assembler to append to the compiledFunction code which will result in the given constant being loaded on the stack at runtime.

(We don't know or care exactly how the assembler does this: The particular bytecode instructions used actually vary somewhat depending on the type and value of the constant, as it happens. Future releases of Muq may change change the precise set of load-constant bytecodes available to the assembler; Since the assembler takes care of this, your compiler is automatically portable across such changes.)

The current Muq assembler isn't terribly smart, but it does do one simple but handy optimization in the assembleConstant routine: If the constant you ask for has already been loaded once by the function (so that it is already available in the COMPILEDFUNCTION constant vector), then the existing constant vector slot will be re-used rather than a new one created. This is worth knowing, if only so that you don't feel obligated to implement the same optimization in your own compilers.

(In general, future releases of Muq will attempt to add optimizations to the assembler rather than the compilers: Since there will be many compilers but only one assembler, this saves effort over the long haul.)

The assembleCall function also hides some secrets. First, the value you give to it may be either a compiledFunction or a symbol with a compiledFunction functional value. Both are useful, but they are not equivalent:

Specifying a compiledFunction value generates runtime code which directly calls the given compiledFunction. This is the fastest option.
Specifying a symbol generates runtime code which looks up the function value of the symbol and calls it. This is somewhat slower than the direct call, but in return you get the added flexibility of being able to change the function called at runtime by setting the functional value of the symbol -- and also the advantage that your function need not be recompiled just because the function called got recompiled.

I think the added flexibility is well worth the small runtime speed cost, and strongly recommend that you generate calls via symbols as a normal matter of course.

To do this in the above example, we would replace the line

#', *asm* assembleCall

by the line

', *asm* assembleCall

Now, let's try something even more ambitious: A function which adds 2+2 to get 4:

stack:
*asm* reset
stack:
2 *asm* assembleConstant
stack:
2 *asm* assembleConstant
stack:
'+ *asm* assembleCall
stack:
nil -1 *fun* *asm* finishAssembly --> *cfn*
stack:
*cfn* call{ -> $ }
stack: 4

If we peek at the disassembly of *cfn* we will now find:

constants:
code bytes:
00: 33 02 33 02 0c 2e
code disassembly:
000:      33 02       GETi   2
002:      33 02       GETi   2
004:      0c          ADD    
005:      2e          RETURN

As a minor point, note that this time the Muq assembler produced a different load-constant instruction than it did last time: It switched to a special load-immediate integer instruction to avoid allocating a constant slot. This is the sort of minor optimization mentioned above which saves you as compiler writer from having to worry about such issues, and frees the server implementor to tune the bytecode instruction set in future without breaking existing compilers.

More importantly, note that in both of the previous two examples, the assembleCall function did not in fact assemble a call to the indicated function, but instead emitted a primitive bytecode to do the same thing. This is another optimization done by the assembler, trivial in terms of computation required, but very important because it again decouples the compilers from the bytecode architecture: The Muq compiler writer in general need never know while functions are implemented in-db, and which are implemented in-server, which again means that the compiler writer has less to worry about, and that the server maintainer can continue to tune the virtual machine in future release by moving functionality between the server and the db, without breaking existing compilers.

Bottom-line take-home lesson from this section:

Almost all the useful work done by the functions you compile will be done by code deposited via assembleCall calls. Some of these calls will produce server-implemented bytecode primitives, and some will produce calls to functions in the db: As a compiler writer, you need not know or care which.

The next sections explore exceptions to the above rule grin.

Go to the first, previous, next, last section, table of contents.