Slux Rationale

Introduction

This informal paper describes the motivation behind the inclusion of certain features in Slux and the omission of other features.

A driving set of guidelines:

simplicity of the specification is best
allowing as many possible implementations is best
provide the information necessary for efficient implementations
making the simplest (but non-ideal) implementation simpler is good

Some applications of these guidelines, some of which are discussed further below:

1,4: Defining as few data structures as possible, putting as much of the data as possible in the general database format.
2,3: Allowing native-endian types to be used internally
3,4: Sorting the data on disk to simplify merging.
1,4: Removing unneeded opcodes
1,4: Overloading classes and objects

Bytecode

The Slux specifications 1.0.0.1 through 1.0.0.4 specified a bytecode format that used 8-bit bytes. Versions 1.0.0.4 through 1.0.0.9 specified a 16-bit "bytecode" format to allow parameters to be larger than 8-bits. Versions 1.0.0.10 on are back to the 8-bit version. So there are two things to justify: why 8-bit bytecode, and why have a bytecode at all.

The alternative to a bytecode engine is a message-passing engine. The use of opcode overloading and VM classes is suggestive of how such a system could work; everything could be a message pass. Opcodes that don't directly connect to a particular datatype, such as 'time' or 'output' could be on a 'system object' whose only purpose is to define these messages.

There is proably a strong argument of simplicity in favor of message-passing, but in the same way that there is a basic core of datatypes that are expected to be implemented "primitively", so do we want a core set of operations which are understood to be basic, and relatively fast. Regardless, bytecode seems necessary for handling branching, and once we have that much bytecode, we might as well go whole hog. Bytecode is also likely to be a more effecient coding, since, for example, the number of arguments is implicit.

8-bit bytecode limits the VM to 128 parameters/variables and 128 literals per function for a naive implementation; but clever compilers can put additional variables/literals in data structures that are automatically indirected. This raises the complexity of a compiler for a "production quality" language, but naive compilers can simply expose those limits to program authors without significant ill effects.

Classes are Objects

In Slux, there is no special class data type. Certain objects "act as classes" if they are set as the parent class of another object.

Languages are free to implement entirely distinct classes from objects. By simply blocking access to 'setclass', limiting 'create' to take a known constant class, and in general limiting the use of class objects to restricted functions (not allowing them to be stored somewhere that they can be accessed as objects), a language can guarantee that messages are only ever sent to objects and not classes, for instance.

Languages are also free to allow loose intermixing of classes and objects, allow message sends to classes, and such like.

VM implementations can choose to analyze the class hierarchy tree and mark objects which are known to have children as being of type "class", and using internal implementations that distinguish between the two types, perhaps falling back on a generic both-class-and-object for cases where a program manipulates them as both.

For example, if a VM implements a method-lookup cache, then writes to object properties for objects which aren't even in the cache but which are part of the inheritence graph of something in the cache, the VM must invalidate the cache. Checking for this case on every object write would be inefficient. If there was a distinct 'class' type that couldn't be written to, the VM could avoid this case. However, a VM can internally code things-that-look-like-classes and objects distinctly, exactly as it would if it were part of the specification; since writes to classes should be rare--that is, the specification advises compiler writers and program authors that it may be slow, the fast path through the object-write opcode can just check that the target of the write is a "true" object.

In other words, efficient VMs can get the same performance benefit we'd get by introducing distinct class and object types without introducing the types into the specification--just by introducing them into the VM internally; thus we don't introduce them, simplifying the specification and broadening the range of implementations--at the cost of compiler writers and program authors being uncertain about how fast certain operations will be, and slightly complicating the difficulty of writing an efficient VM (but not too much--rather than the input format telling you which things are classes, you just have to look for which objects are specified as parent classes of other objects--including at run-time).

No Constant Data

A VM implementation is free to consider all program data to be effectively static (unmodifiable), to the extent of never garbage collecting it; however, the VM must allow arrays, sets, and objects to be modified in place, especially growing & shrinking them. While it would be possible to allow the compiler to specify certain things as uncollectable, it's more flexible to allow VMs to choose.

Another alternative would be to have the specification allow objects to be marked as read-only. Compilers could expose this to programs. VM implementations would all be obligated to check on every write whether the write was to a read-only object. While efficient VMs could handle this somewhat efficiently by treating them as distinct types, this might slow down the handling of reads (which would have to check for two types instead of one), and it would complicate the specification and complicate the implementation of the simplest possible VM.

Literal Pool

Each function has its own literal pool. This allows functions to participate in garbage collection correctly and efficiently. Since only the "compile" instructions can produce functions, it might seem like VMs which don't provide "compile" would never need to garbage collect functions. However, not only could a VM choose to garbage collect data out of the program file, including functions, but a VM might load a delta file created by a different VM, one which includes a new function--and such a function should participate in garbage collection.

It is possible to recompile code to let literals refer to a global literal pool, and it is easier to recompile in that direction than it would be to have a global literal pool and recompile into private literal pools.

Global Object Revectoring

By defining save/restore/markundo/undo as revectoring into the global object, the current execution enironment (the stack frame) does not need to be saved. This significantly reduces the size of the specification and frees implementations to handle nested execution with other technology.

The 'safe copying' of the passed-in parameter to restore/undo occurs because it could be complicated to preserve the data as is. The definition allows for a simple serialization of the parameter, the memory changeover, and then the regeneration of the data from the serialization. It also allows for the data to be passed through unchanged.

Register-Model operands

Whether a stack is more efficient than registers was once a hotly debated subject. Nowadays, as the cost of mispredicted branches has become much more significant than the cost of memory access, reducing the number of unpredictable branches has become an extremely significant issue for interpreters.

In a stack model machine, every parameter to an opcode must be pushed on the stack by another instruction, which is always an unpredictable branch; the stack machine wins if this was just some other instruction you needed to compute anyway--e.g. when computing (2+3)*5, the result of 2+3 is left on the stack. Still, the entire sequence requires five instructions:

   push #2
   push #3
   add
   push #5
   mul

The register machine can do this in two instructions:

   add r1, #2, #3
   mul r1, r1, #5

This requires decoding six operands, and if that decoding requires branching, there won't be a win at all. However, the output operands can always directly index into the array of frame variables (which act as the registers in the VM), so no branching is required for them; so that leaves only the four input operands to decode.

The encoding specified in Slux does, however, intermix literals and frame variables in a single number--negative codes represent indices into the literal pool and positive codes indices into frame variables. A naive implementation (such as that found in the reference Sluxe implementation) will be implemented with a conditional branch. However, it is worth noting that these conditional branches are not the normal "opcode jump" branch which is always unpredictable (its target location always changes), but rather a traditional two direction branch, which will at least always predict correctly 50% of the time, and in practice much more often since each opcode has its own copy of the jump.

Moreover, the Slux specification leaves open the possibility that the code will be recompiled into some other format; for example, for all the most popular executed opcodes, the VM could provide multiple opcode variants which explicitly encode the pattern of literals vs. frame variables into the opcode, so that the single opcode dispatch implies that pattern in the vast majority of cases.

Finally, a number of data-based strategies allow branch-free processing. For example, a VM could extend the literal pool area by the frame variable size; then, when executing a given function, copy the frame variables into the literal pool, and use the operand values as direct indices within this space; copying the values back onto the stack if the routine recurses or using an alternate implementation.

Or it could copy the literal pool into the frame variables.

Or it could create a table of pointers which is indexed by the operand value, which points to the appropriate location. However, this index would need to be updated on every function call, and since each literal pool item and frame var is (typically) the same size as a pointer, this would require writing more data than either of the above approaches.

Finally, a low-expense way of doing this is to create a two-entry table, one item pointing to the literal pool, the other to the frame variables. Then a mathematical operation can be performed to choose one of the two entries in the table, instead of a branch. In other words, you change from:

   return (n < 0) ? return literal[n] : framevar[n];

   return framevar_or_literal[(n < 0)][n];

   return framevar_or_literal[(n >> 31)][n];

This potentially induces a lot of latency because of the extra indirection, but since the initial array is only two elements long it should always be in the primary cache.

Undefined Garbage Collection

The Slux spec does not specify a method of garbage collection. Because of this, the memory consumption of a program is unpredictable. Technically, a VM which never garbage collected would be a valid Slux VM; it would just be one nobody would use, because it would suck.

Specifying any particular garbage collection algorithm or timing would be too restrictive on implementations. Dynamic memory allocation for interactive programs already has "unpredictibility" problems anyway.

The explicit destroy mechanism allows data elements to be predictably deleted, which should prove useful for, for instance, providing handles to I/O resources, or for explicitly minimizing memory consumption.

Error & Exception Handling

Slux does not provide for a catch/throw exception mechanism. If it is widely desired, one might be added in the future. However, Slux is unlikely to ever allow 'catch'ing of VM errors. While VMs are required to handle a defined set of errors, Slux takes a step down a slippery slope by accepting that a VM which does not catch all errors is still a useful VM. Since a correct Slux program never produces an error, such a VM will still run correct programs correctly. The introduction of an explicit, required ability to intercept VM errors and continue running would allow correct programs to intentionally produce errors and rely on consistent error production.

A further problem with allowing VM errors to be caught is that it would reduce backwards compatibility; it would be impossible to change the error produced by a given operation, or make it fail to produce errors in some cases, without breaking valid programs which relied on that error.

This should not be taken as carte blanche for implementors to freely omit error checking; it is more intended to guarantee that a VM which is buggy only in its error checking is still a VM which will run correct programs.

Non-Polymorphic Opcodes

In Sludge, there was a single opcode which could implement both strget and arrget. In Slux, only strget and arrget are provided. A language can provide an apparently overloaded get syntax by outputting either strget or arrget, and using opcode overloading to make the opposite type do the right thing.

This methodology allows a language to choose whether or not this overloading occurs. An alternative approach would be to provide a strget, an arrget, and a strarrget which does both. But if we also consider that a similar 'get' operation should work on objects, there become seven possible combinations.

A more practical approach is to provide a small number of opcodes, and allow the program to provide "configuration" information to the VM which describes how much polymorphism to provide for each opcode.

Slux actually takes this approach; the "configuration information" is simply the definition of the pseudoclasses. A naive implementation of Slux might pay an interpreted function call for every strget on an array; but a smart implementation of Slux can recognize when a pseudoclass's overloaded opcode definition uses a related opcode in a simple fashion. For example, almost any definition along the lines of

   #array.#strget(array a, int x)
   {
      return arrget(a,x);
   }

would compile to something like

   number of arguments = 2
   number of frame vars = 3
   @arrget 2 0 1
   @return 2

and this can be recognized by a VM and accelerated much as any other method of providing the configuration information would.

Admittedly, this is significantly more complex for a VM to recognize, but it opens up the implementation space of VMs by allowing them to totally ignore the 'special overload configuration information' and just rely on the naive overloading mechanism.

Note that this only works efficiently because the pseudoclasses and their overloaded opcodes are all cached (as described by the 'cacheglobal' and 'cachepseudo' opcodes). But, honestly, I didn't decide to make them cached to allow the above technique; revision 1.0.0.1 of Slux defined cacheglobal and cachepseudo indentically, but also defined a number of overloaded 'get' operations. The cacheing is necessary for efficient implementation of overloading in general.

The trap instructions provide another mechanism to achieve a similar effect (e.g. if a language wants both strget, arrget, and a polymorphic version, it could use an appropriately defined trap4).

Save file

The save file is not mandated as a required part of the VM specification. This allows implementations to choose their own save file formats if they're more convenient; for example, an implementation that doesn't want to sort its data need not. (The program file must always have sorted data, but compilers can afford this complexity.) An implementation that never garbage collects function pointers--on the assumption that all functions in the program file point to static data anyway, and this implementation never creates them itself--could leak a fair amount of data if it loaded a save file from another implementation which included functions in the save file (i.e. the non-static portion). This doesn't seem a big deal, but such an implementation could certainly just refuse to restore a save file with a function in it.

Pseudoclass methods

The pseudoclasses take the target of the original message as the first parameter, instead of using 'self'. This inconsistent semantic was chosen to provide the guarantee that the value of 'self' is always an object, which allows some typechecking to be skipped, and might allow some VM implementations to provide extra acceleration to operations on 'self' (e.g. getpropself).

Signature

The Slux file signature uses the same 4-byte signature for both program files and save files. The full 16-byte signature allows for quick detection of some forms of corruption that are common, a la the PNG file format signature: detection of CR/LF mangling, high-bit stripping, and special processing of ASCII 26. The mixed-case prevents normal string references to Slux from happening to be read as the signature.