Commit Graph

30 Commits

Author SHA1 Message Date
Nuno Cruces
52b32e4872 compiler: avoid some unsafe.Pointers (#1456)
Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>
2023-05-12 08:04:36 +10:00
Takeshi Yoneda
867459d7d5 compiler: mmap per module instead of per function (#1377)
This changes the mmap strategy used in the compiler backend.
Previously, we used mmap syscall once per function and allocated the 
executable pages each time. Basically, mmap can only allocate the 
boundary of the page size of the underlying os. Even if the requested 
executable is smaller than the page size, the entire page is marked as 
executable and won't be reused by Go runtime. Therefore, we wasted 
roughly `(len(body)%osPageSize)*function`.

Even though we still need to align each function on 16 bytes boundary
when mmaping per module, the wasted space is much smaller than before.

The following benchmark results shows that this improves the overall 
compilation performance while showing the heap usage increased. 
However, the increased heap usage is totally offset by the hidden wasted
memory page which is not measured by Go's -benchmem.
Actually, when I did the experiments, I observed that roughly 20~30mb are
wasted on arm64 previously which is larger than the increased heap usage
in this result. More importantly, this increased heap usage is a target of GC
and should be ignorable in the long-running program vs the wasted page 
is persistent until the CompiledModule is closed.

Not only the actual compilation time, the result indicates that this could 
improve the overall Go runtime's performance maybe thanks to not abusing
runtime.Finalizer since you can see this improves the subsequent interpreter 
benchmark results.

```
goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
                                   │   old.txt   │              new.txt              │
                                   │   sec/op    │   sec/op     vs base              │
Compilation_sqlite3/compiler-10      183.4m ± 0%   175.9m ± 2%  -4.10% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10   61.59m ± 0%   59.57m ± 0%  -3.29% (p=0.001 n=7)
geomean                              106.3m        102.4m       -3.69%

                                   │   old.txt    │               new.txt               │
                                   │     B/op     │     B/op      vs base               │
Compilation_sqlite3/compiler-10      42.93Mi ± 0%   54.33Mi ± 0%  +26.56% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10   51.75Mi ± 0%   51.75Mi ± 0%   -0.01% (p=0.001 n=7)
geomean                              47.13Mi        53.02Mi       +12.49%

                                   │   old.txt   │              new.txt              │
                                   │  allocs/op  │  allocs/op   vs base              │
Compilation_sqlite3/compiler-10      26.07k ± 0%   26.06k ± 0%       ~ (p=0.149 n=7)
Compilation_sqlite3/interpreter-10   13.90k ± 0%   13.90k ± 0%       ~ (p=0.421 n=7)
geomean                              19.03k        19.03k       -0.02%


goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
cpu: AMD Ryzen 9 3950X 16-Core Processor
                                   │   old.txt   │              new.txt               │
                                   │   sec/op    │   sec/op     vs base               │
Compilation_sqlite3/compiler-32      384.4m ± 2%   373.0m ± 4%   -2.97% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32   86.09m ± 4%   65.05m ± 2%  -24.44% (p=0.001 n=7)
geomean                              181.9m        155.8m       -14.38%

                                   │   old.txt    │               new.txt               │
                                   │     B/op     │     B/op      vs base               │
Compilation_sqlite3/compiler-32      49.40Mi ± 0%   59.91Mi ± 0%  +21.29% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32   51.77Mi ± 0%   51.76Mi ± 0%   -0.02% (p=0.001 n=7)
geomean                              50.57Mi        55.69Mi       +10.12%

                                   │   old.txt   │              new.txt              │
                                   │  allocs/op  │  allocs/op   vs base              │
Compilation_sqlite3/compiler-32      28.70k ± 0%   28.70k ± 0%       ~ (p=0.925 n=7)
Compilation_sqlite3/interpreter-32   14.00k ± 0%   14.00k ± 0%  -0.04% (p=0.010 n=7)
geomean                              20.05k        20.04k       -0.02%
```

resolves #1060 

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-26 14:11:37 +09:00
Takeshi Yoneda
b6d19696da compiler: reuses allocated runtimeValueLocation stacks (#1348)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-10 14:58:47 +09:00
Takeshi Yoneda
24dbf49c79 compiler: avoid alloc with stack pointer ceil and reuse bytes reader (#1344)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-06 16:47:08 +09:00
Takeshi Yoneda
cc28399052 wazeroir: reuses allocated slices for a module (#1342)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-05 20:26:44 +09:00
Edoardo Vacchi
0dc152d672 wazeroir: migrate vector, table, branch and all other remaining ops to compact repr (#1334)
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-05 09:38:49 +09:00
Takeshi Yoneda
0857336746 Removes wasm.FunctionInstance type (#1294)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-03-28 14:43:44 +09:00
Takeshi Yoneda
a4226906cf Deletes wasm.CallCtx (#1280)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-03-24 17:00:43 -07:00
Takeshi Yoneda
ad968dc3fe Pass correct api.Module to host functions (#1213)
Fixes #1211

Previously, host functions are getting api.Module for the "originating" module, 
which is the module for api.Function currently invoked, except that the api.Module 
is modified by withMemory with the caller's memory instance, therefore there 
haven't been no problem for most cases. The only issues were the methods 
besides Memory() of api.Module, and this commit fixes them.

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-03-08 15:05:48 +09:00
Takeshi Yoneda
5eab1a7307 compiler: pass runtimeValueLocationStack by values to reduce allocations (#1170)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-02-27 09:42:45 +09:00
Takeshi Yoneda
0547a8e0ab wazeroir: uses u64 as unique keys for labels (#1142)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-02-21 08:24:59 +09:00
Takeshi Yoneda
6eb1e5d9fb arm64: fixes runtimeValueType for i32.wrap_i64 (#1011)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-01-06 14:28:48 +09:00
Takeshi Yoneda
d63c747d53 asm,compiler: reduce allocations during compilation (#936)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-12-20 12:49:47 +09:00
Takeshi Yoneda
0dde445074 compiler: make moduleContext.functions non-pointer slice (#898)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-12-07 15:20:24 +09:00
Takeshi Yoneda
84d733eb0a compiler: adds support for FunctionListeners (#869)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-11-29 15:15:49 +09:00
Crypt Keeper
329ccca6b1 Switches from gofmt to gofumpt (#848)
This switches to gofumpt and applies changes, as I've noticed working
in dapr (who uses this) that it finds some things that are annoying,
such as inconsistent block formatting in test tables.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-11-09 05:48:24 +01:00
Crypt Keeper
be33572289 Adds HostFunctionBuilder to enable high performance host functions (#828)
This PR follows @hafeidejiangyou advice to not only enable end users to
avoid reflection when calling host functions, but also use that approach
ourselves internally. The performance results are staggering and will be
noticable in high performance applications.

Before
```
BenchmarkHostCall/Call
BenchmarkHostCall/Call-16            	 1000000	      1050 ns/op
Benchmark_EnvironGet/environGet
Benchmark_EnvironGet/environGet-16         	  525492	      2224 ns/op
```

Now
```
BenchmarkHostCall/Call
BenchmarkHostCall/Call-16            	14807203	        83.22 ns/op
Benchmark_EnvironGet/environGet
Benchmark_EnvironGet/environGet-16         	  951690	      1054 ns/op
```

To accomplish this, this PR consolidates code around host function
definition and enables a fast path for functions where the user takes
responsibility for defining its WebAssembly mappings. Existing users
will need to change their code a bit, as signatures have changed.

For example, we are now more strict that all host functions require a
context parameter zero. Also, we've replaced
`HostModuleBuilder.ExportFunction` and `ExportFunctions` with a new type
`HostFunctionBuilder` that consolidates the responsibility and the
documentation.

```diff
 ctx := context.Background()
-hello := func() {
+hello := func(context.Context) {
         fmt.Fprintln(stdout, "hello!")
 }
-_, err := r.NewHostModuleBuilder("env").ExportFunction("hello", hello).Instantiate(ctx, r)
+_, err := r.NewHostModuleBuilder("env").
+        NewFunctionBuilder().WithFunc(hello).Export("hello").
+        Instantiate(ctx, r)
```

Power users can now use `HostFunctionBuilder` to define functions that
won't use reflection. There are two choices of interfaces to use
depending on if that function needs access to the calling module or not:
`api.GoFunction` and `api.GoModuleFunction`. Here's an example defining
one.

```go
builder.WithGoFunction(api.GoFunc(func(ctx context.Context, params []uint64) []uint64 {
	x, y := uint32(params[0]), uint32(params[1])
	sum := x + y
	return []uint64{sum}
}, []api.ValueType{api.ValueTypeI32, api.ValueTypeI32}, []api.ValueType{api.ValueTypeI32})
```
As you'll notice and as documented, this approach is more verbose and
not for everyone. If you aren't making a low-level library, you are
likely able to afford the 1us penalty for the convenience of reflection.
However, we are happy to enable this option for foundational libraries
and those with high performance requirements (like ourselves)!

Fixes #825

Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-10-28 07:51:08 -07:00
Takeshi Yoneda
9ad8af121a compiler: simplify calling convention (#782)
This simplifies the calling convention and consolidates the call frame stack
and value stack into a single stack.

As a result, the cost of function calls decreases because we now don't need
to check the boundary twice (value and call frame stacks) at each function call.

The following is the result of the benchmark for recursive Fibonacci
function in integration_test/bench/testdata/case.go, and it shows that
this actually improves the performance of function calls.

[amd64]
name                               old time/op  new time/op  delta
Invocation/compiler/fib_for_5-32    109ns ± 3%    81ns ± 1%  -25.86%  (p=0.008 n=5+5)
Invocation/compiler/fib_for_10-32   556ns ± 3%   473ns ± 3%  -14.99%  (p=0.008 n=5+5)
Invocation/compiler/fib_for_20-32  61.4µs ± 2%  55.9µs ± 5%   -8.98%  (p=0.008 n=5+5)
Invocation/compiler/fib_for_30-32  7.41ms ± 3%  6.83ms ± 3%   -7.90%  (p=0.008 n=5+5)


[arm64]
name                               old time/op    new time/op    delta
Invocation/compiler/fib_for_5-10     67.7ns ± 1%    60.2ns ± 1%  -11.12%  (p=0.000 n=9+9)
Invocation/compiler/fib_for_10-10     487ns ± 1%     460ns ± 0%   -5.56%  (p=0.000 n=10+9)
Invocation/compiler/fib_for_20-10    58.0µs ± 1%    54.3µs ± 1%   -6.38%  (p=0.000 n=10+10)
Invocation/compiler/fib_for_30-10    7.12ms ± 1%    6.67ms ± 1%   -6.31%  (p=0.000 n=10+9)

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-09-06 13:29:56 +09:00
Takeshi Yoneda
15a774e543 compiler: hold stack base pointers in bytes, not index to the stack slice (#781)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-08-30 13:00:55 +09:00
Takeshi Yoneda
0bd2beedac Introduce CallEngine assigned to api.Function implementation. (#761)
This introduces wasm.CallEngine internal type, and assign it to the api.Function
implementations. api.Function.Call now uses that CallEngine assigned to it
to make function calls.

Internally, when creating CallEngine implementation, the compiler engine allocates
call frames and values stack. Previously, we allocate these stacks for each function calls,
which was a severe overhead as we can recognize in the benchmarks. As a result,
this reduces the memory usage (== reduces the GC jobs) as long as we reuse
the same api.Function multiple times.

As a side effect, now api.Function.Call is not goroutine-safe. So this adds the comment
about it on that method.

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-08-24 16:11:15 +09:00
Vladislav
63e438aa66 Optimize memory bulk operations: copy on x86 (#700)
Signed-off-by: Vladislav Oleshko <vladislav.oleshko@gmail.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
2022-07-26 18:22:47 +09:00
Takeshi Yoneda
6a62b794f5 Fixes compileDropRange when all registers are used up by live values (#711)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-07-25 09:18:04 +09:00
Crypt Keeper
49e5bcb8c7 Top-levels FunctionDefinition to allow access to all function metadata (#686)
This top-levels `api.FunctionDefinition` which was formerly
experimental, and also adds import metadata to it. Now, it holds all
metadata known at compile time.

Here are the public API visible changes:
* api.ExportedFunction - replaced with api.FunctionDefinition as it is
  usable for all types of functions.
* api.Function - `.ParamTypes/ResultTypes()` are replaced with
  `.Definition().
* api.FunctionDefinition - extracted from experimental and adds
  `.Import()` to get the any imported module and function name.
* experimental.FunctionDefinition - replaced with
  api.FunctionDefinition.
* experimental.FunctionListenerFactory - adds first arg of the
  instantiated module name, as it can be different than compiled.
* wazero.CompiledModule - Adds `.ImportedFunctions()` and changes result
  type of `.ExportedFunctions()` to api.FunctionDefinition.

Internally, logic to create function definition are consolidated between
host and wasm-defined functions, notably wasm.Module now includes
`.BuildFunctionDefinitions()` which reduces duplication in
wasm.ModuleInstance `.BuildFunctions()`,

This obviates #681 by deleting the `ExportedFunction` type which
overlaps with this information.

This fixes #637 as it includes more metadata including imports.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-07-13 14:16:18 +08:00
Takeshi Yoneda
cbe6170473 compiler: always allocate register to save conditional values (#666)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-06-29 09:26:50 +09:00
Takeshi Yoneda
e6f08a86fd arm64: remove NOP insertion at the beginning (#653)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-06-24 09:39:47 +09:00
Takeshi Yoneda
b0588a98c9 Completes arm64 backend for SIMD instructions (#644)
This completes the implementation of arm64 backend for SIMD instructions.
Notably, now the arm64 compiler passes 100% of WebAssemby 2.0 draft
specification tests.

Combined with the completion of the interpreter and amd64 backend (#624),
this finally resolves #484. Therefore, this also documents that wazero is
100% compatible with WebAssembly 1.0 and 2.0.

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-06-21 10:51:02 +09:00
Crypt Keeper
507ce79080 Adds sys.Walltime and sys.Nanotime for security and determinism (#616)
This adds two clock interfaces: sys.Walltime and sys.Nanotime to
allow implementations to override readings for purposes of security or
determinism.

The default values of both are a fake timestamp, to avoid the sandbox
break we formerly had by returning the real time. This is similar to how
we don't inherit OS Env values.
2022-06-04 15:14:31 +08:00
Takeshi Yoneda
2e131a1a2c SIMD: implements boolean and bitwise instructions. (#611)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-06-01 16:42:35 +09:00
Takeshi Yoneda
9a9b361ac8 Vector values support in ahead-of-time compiler (#572)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-05-19 11:02:15 -06:00
Crypt Keeper
c815060196 Renames JIT to Compiler and notes it is AOT (#564)
This notably changes NewRuntimeJIT to NewRuntimeCompiler as well renames
packages from jit to compiler.

This clarifies the implementation is AOT, not JIT, at least when
clarified to where it occurs (Runtime.CompileModule). In doing so, we
reduce any concern that compilation will happen during function
execution. We also free ourselves to create a JIT option without
confusion in the future via CompileConfig or otherwise.

Fixes #560

Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-05-17 08:50:56 +09:00