Files
wazero/internal/engine/compiler/RATIONALE.md
2022-12-23 13:38:55 +09:00

86 lines
4.0 KiB
Markdown

# Compiler engine
This package implements the Compiler engine for WebAssembly *purely written in Go*.
In this README, we describe the background, technical difficulties and some design choices.
## General limitations on pure Go Compiler engines
In Go program, each Goroutine manages its own stack, and each item on Goroutine
stack is managed by Go runtime for garbage collection, etc.
These impose some difficulties on compiler engine purely written in Go because
we *cannot* use native push/pop instructions to save/restore temporary
variables spilling from registers. This results in making it impossible for us
to invoke Go functions from compiled native codes with the native `call`
instruction since it involves stack manipulations.
*TODO: maybe it is possible to hack the runtime to make it possible to achieve
function calls with `call`.*
## How to generate native codes
wazero uses its own assembler, implemented from scratch in the
[`internal/asm`](../../asm/) package. The primary rationale are wazero's zero
dependency policy, and to enable concurrent compilation (a feature the
WebAssembly binary format optimizes for).
Before this, wazero used [`twitchyliquid64/golang-asm`](https://github.com/twitchyliquid64/golang-asm).
However, this was not only a dependency (one of our goals is to have zero
dependencies), but also a large one (several megabytes added to the binary).
Moreover, any copy of golang-asm is not thread-safe, so can't be used for
concurrent compilation (See [#233](https://github.com/tetratelabs/wazero/issues/233)).
The assembled native codes are represented as `[]byte` and the slice region is
marked as executable via mmap system call.
## How to enter native codes
Assuming that we have a native code as `[]byte`, it is straightforward to enter
the native code region via Go assembly code. In this package, we have the
function without body called `nativecall`
```go
func nativecall(codeSegment, engine, memory uintptr)
```
where we pass `codeSegment uintptr` as a first argument. This pointer is to the
first instruction to be executed. The pointer can be easily derived from
`[]byte` via `unsafe.Pointer`:
```go
code := []byte{}
/* ...Compilation ...*/
codeSegment := uintptr(unsafe.Pointer(&code[0]))
nativecall(codeSegment, ...)
```
And `nativecall` is actually implemented in [arch_amd64.s](./arch_amd64.s)
as a convenience layer to comply with the Go's official calling convention.
We delegate the task to jump into the code segment to the Go assembler code.
## Why it's safe to execute runtime-generated machine codes against async Goroutine preemption
Goroutine preemption is the mechanism of the Go runtime to switch goroutines contexts on an OS thread.
There are two types of preemption: cooperative preemption and async preemption. The former happens, for example,
when making a function call, and it is not an issue for our runtime-generated functions as they do not make
direct function calls to Go-implemented functions. On the other hand, the latter, async preemption, can be problematic
since it tries to interrupt the execution of Goroutine at any point of function, and manipulates CPU register states.
Fortunately, our runtime-generated machine codes do not need to take the async preemption into account.
All the assembly codes are entered via the trampoline implemented as Go Assembler Function (e.g. [arch_amd64.s](./arch_amd64.s)),
and as of Go 1.20, these assembler functions are considered as _unsafe_ for async preemption:
- https://github.com/golang/go/blob/go1.20rc1/src/runtime/preempt.go#L406-L407
- https://github.com/golang/go/blob/9f0234214473dfb785a5ad84a8fc62a6a395cbc3/src/runtime/traceback.go#L227
From the Go runtime point of view, the execution of runtime-generated machine codes is considered as a part of
that trampoline function. Therefore, runtime-generated machine code is also correctly considered unsafe for async preemption.
## How to achieve function calls
Given that we cannot use `call` instruction at all in native code, here's how
we achieve the function calls back and forth among Go and (compiled) Wasm
native functions.
TODO: