Commit Graph

31 Commits

Author SHA1 Message Date
Nuno Cruces
90f58bce75 compiler: fix compiledModule leak (#1608)
Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>
Co-authored-by: Achille Roussel <achille.roussel@gmail.com>
2023-08-02 09:14:49 +08:00
Nuno Cruces
b4d97e5e69 compiler(amd64): emit smaller instructions (#1513)
Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>
2023-06-13 06:50:56 +10:00
Achille
9780f0f4a0 compiler: zero-copy code assembly (#1481)
Signed-off-by: Achille Roussel <achille.roussel@gmail.com>
Co-authored-by: Crypt Keeper <64215+codefromthecrypt@users.noreply.github.com>
2023-05-19 07:06:30 +02:00
Takeshi Yoneda
867459d7d5 compiler: mmap per module instead of per function (#1377)
This changes the mmap strategy used in the compiler backend.
Previously, we used mmap syscall once per function and allocated the 
executable pages each time. Basically, mmap can only allocate the 
boundary of the page size of the underlying os. Even if the requested 
executable is smaller than the page size, the entire page is marked as 
executable and won't be reused by Go runtime. Therefore, we wasted 
roughly `(len(body)%osPageSize)*function`.

Even though we still need to align each function on 16 bytes boundary
when mmaping per module, the wasted space is much smaller than before.

The following benchmark results shows that this improves the overall 
compilation performance while showing the heap usage increased. 
However, the increased heap usage is totally offset by the hidden wasted
memory page which is not measured by Go's -benchmem.
Actually, when I did the experiments, I observed that roughly 20~30mb are
wasted on arm64 previously which is larger than the increased heap usage
in this result. More importantly, this increased heap usage is a target of GC
and should be ignorable in the long-running program vs the wasted page 
is persistent until the CompiledModule is closed.

Not only the actual compilation time, the result indicates that this could 
improve the overall Go runtime's performance maybe thanks to not abusing
runtime.Finalizer since you can see this improves the subsequent interpreter 
benchmark results.

```
goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
                                   │   old.txt   │              new.txt              │
                                   │   sec/op    │   sec/op     vs base              │
Compilation_sqlite3/compiler-10      183.4m ± 0%   175.9m ± 2%  -4.10% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10   61.59m ± 0%   59.57m ± 0%  -3.29% (p=0.001 n=7)
geomean                              106.3m        102.4m       -3.69%

                                   │   old.txt    │               new.txt               │
                                   │     B/op     │     B/op      vs base               │
Compilation_sqlite3/compiler-10      42.93Mi ± 0%   54.33Mi ± 0%  +26.56% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10   51.75Mi ± 0%   51.75Mi ± 0%   -0.01% (p=0.001 n=7)
geomean                              47.13Mi        53.02Mi       +12.49%

                                   │   old.txt   │              new.txt              │
                                   │  allocs/op  │  allocs/op   vs base              │
Compilation_sqlite3/compiler-10      26.07k ± 0%   26.06k ± 0%       ~ (p=0.149 n=7)
Compilation_sqlite3/interpreter-10   13.90k ± 0%   13.90k ± 0%       ~ (p=0.421 n=7)
geomean                              19.03k        19.03k       -0.02%


goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
cpu: AMD Ryzen 9 3950X 16-Core Processor
                                   │   old.txt   │              new.txt               │
                                   │   sec/op    │   sec/op     vs base               │
Compilation_sqlite3/compiler-32      384.4m ± 2%   373.0m ± 4%   -2.97% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32   86.09m ± 4%   65.05m ± 2%  -24.44% (p=0.001 n=7)
geomean                              181.9m        155.8m       -14.38%

                                   │   old.txt    │               new.txt               │
                                   │     B/op     │     B/op      vs base               │
Compilation_sqlite3/compiler-32      49.40Mi ± 0%   59.91Mi ± 0%  +21.29% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32   51.77Mi ± 0%   51.76Mi ± 0%   -0.02% (p=0.001 n=7)
geomean                              50.57Mi        55.69Mi       +10.12%

                                   │   old.txt   │              new.txt              │
                                   │  allocs/op  │  allocs/op   vs base              │
Compilation_sqlite3/compiler-32      28.70k ± 0%   28.70k ± 0%       ~ (p=0.925 n=7)
Compilation_sqlite3/interpreter-32   14.00k ± 0%   14.00k ± 0%  -0.04% (p=0.010 n=7)
geomean                              20.05k        20.04k       -0.02%
```

resolves #1060 

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-26 14:11:37 +09:00
Takeshi Yoneda
b6d19696da compiler: reuses allocated runtimeValueLocation stacks (#1348)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-10 14:58:47 +09:00
Takeshi Yoneda
24dbf49c79 compiler: avoid alloc with stack pointer ceil and reuse bytes reader (#1344)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-06 16:47:08 +09:00
Takeshi Yoneda
cc28399052 wazeroir: reuses allocated slices for a module (#1342)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-05 20:26:44 +09:00
Edoardo Vacchi
0dc152d672 wazeroir: migrate vector, table, branch and all other remaining ops to compact repr (#1334)
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-04-05 09:38:49 +09:00
Edoardo Vacchi
f3ef84c9b3 wazeroir: Load Ops, Store Ops, Set, Pick, Select, CallIndirect (#1329)
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
2023-04-01 08:00:27 +09:00
Takeshi Yoneda
ef8e12a575 compiler: bitmask for tracking used registers (#1323)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-03-31 12:28:20 +09:00
Edoardo Vacchi
8887799da7 wazeroir: move unary byte ops to UnionOperator (#1320)
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
2023-03-31 08:07:12 +09:00
Edoardo Vacchi
c5d37877bd wazeroir: migrate unary operations to UnionOperation (#1318)
* refactor: OperationCall, OperationGlobalGet, OperationGlobalSet

* refactor: Constant Operations

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

---------
2023-03-30 12:57:45 +02:00
Takeshi Yoneda
0857336746 Removes wasm.FunctionInstance type (#1294)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-03-28 14:43:44 +09:00
Takeshi Yoneda
350e81e632 Holds function types as values, not ptrs in wasm.Module (#1227)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-03-15 13:45:52 +09:00
Takeshi Yoneda
5eab1a7307 compiler: pass runtimeValueLocationStack by values to reduce allocations (#1170)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-02-27 09:42:45 +09:00
Takeshi Yoneda
a265d41d30 wazeroir: less allocations (#1138)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2023-02-20 13:29:56 +09:00
Edoardo Vacchi
5895873019 Add support for querying amd64 cpuid (#1118)
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
2023-02-11 22:12:18 +09:00
Takeshi Yoneda
d63c747d53 asm,compiler: reduce allocations during compilation (#936)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-12-20 12:49:47 +09:00
Takeshi Yoneda
0dde445074 compiler: make moduleContext.functions non-pointer slice (#898)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-12-07 15:20:24 +09:00
Crypt Keeper
329ccca6b1 Switches from gofmt to gofumpt (#848)
This switches to gofumpt and applies changes, as I've noticed working
in dapr (who uses this) that it finds some things that are annoying,
such as inconsistent block formatting in test tables.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-11-09 05:48:24 +01:00
Takeshi Yoneda
9ad8af121a compiler: simplify calling convention (#782)
This simplifies the calling convention and consolidates the call frame stack
and value stack into a single stack.

As a result, the cost of function calls decreases because we now don't need
to check the boundary twice (value and call frame stacks) at each function call.

The following is the result of the benchmark for recursive Fibonacci
function in integration_test/bench/testdata/case.go, and it shows that
this actually improves the performance of function calls.

[amd64]
name                               old time/op  new time/op  delta
Invocation/compiler/fib_for_5-32    109ns ± 3%    81ns ± 1%  -25.86%  (p=0.008 n=5+5)
Invocation/compiler/fib_for_10-32   556ns ± 3%   473ns ± 3%  -14.99%  (p=0.008 n=5+5)
Invocation/compiler/fib_for_20-32  61.4µs ± 2%  55.9µs ± 5%   -8.98%  (p=0.008 n=5+5)
Invocation/compiler/fib_for_30-32  7.41ms ± 3%  6.83ms ± 3%   -7.90%  (p=0.008 n=5+5)


[arm64]
name                               old time/op    new time/op    delta
Invocation/compiler/fib_for_5-10     67.7ns ± 1%    60.2ns ± 1%  -11.12%  (p=0.000 n=9+9)
Invocation/compiler/fib_for_10-10     487ns ± 1%     460ns ± 0%   -5.56%  (p=0.000 n=10+9)
Invocation/compiler/fib_for_20-10    58.0µs ± 1%    54.3µs ± 1%   -6.38%  (p=0.000 n=10+10)
Invocation/compiler/fib_for_30-10    7.12ms ± 1%    6.67ms ± 1%   -6.31%  (p=0.000 n=10+9)

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-09-06 13:29:56 +09:00
Takeshi Yoneda
087ab9d9fc amd64: do not load/store higher bits of 32-bit int/float (#742)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-08-12 12:08:20 +08:00
Vladislav
63e438aa66 Optimize memory bulk operations: copy on x86 (#700)
Signed-off-by: Vladislav Oleshko <vladislav.oleshko@gmail.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
2022-07-26 18:22:47 +09:00
Takeshi Yoneda
6a62b794f5 Fixes compileDropRange when all registers are used up by live values (#711)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-07-25 09:18:04 +09:00
Takeshi Yoneda
e6f08a86fd arm64: remove NOP insertion at the beginning (#653)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-06-24 09:39:47 +09:00
Takeshi Yoneda
3b4544ee48 compiler: remove embedding of pointers of jump tables (#650)
This removes the embedding of pointers of jump tables (uintptr of []byte)
used by BrTable operations. That is the last usage of unsafe.Pointer in
compiler implementations.
Alternatively, we treat jump tables as asm.StaticConst and emit them
into the constPool already implemented and used by various places.

Notably, now the native code compiled by compilers can be reusable
across multiple processes, meaning that they are independent of
any runtime pointers.

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-06-23 13:42:46 +09:00
Anuraag Agrawal
296b5f56b7 Follows go style for amd64 constants. (#586)
Signed-off-by: Anuraag Agrawal <anuraaga@gmail.com>
2022-05-23 12:51:37 +09:00
Takeshi Yoneda
18fdf3eecc compiler: fixes calling convention violation. (#583)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
2022-05-21 16:37:26 +09:00
Anuraag Agrawal
ec3ada35a0 Use correct pattern for table tests everywhere (#582)
Signed-off-by: Anuraag Agrawal <anuraaga@gmail.com>
2022-05-20 16:55:01 +09:00
Takeshi Yoneda
9a9b361ac8 Vector values support in ahead-of-time compiler (#572)
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-05-19 11:02:15 -06:00
Crypt Keeper
c815060196 Renames JIT to Compiler and notes it is AOT (#564)
This notably changes NewRuntimeJIT to NewRuntimeCompiler as well renames
packages from jit to compiler.

This clarifies the implementation is AOT, not JIT, at least when
clarified to where it occurs (Runtime.CompileModule). In doing so, we
reduce any concern that compilation will happen during function
execution. We also free ourselves to create a JIT option without
confusion in the future via CompileConfig or otherwise.

Fixes #560

Signed-off-by: Adrian Cole <adrian@tetrate.io>
2022-05-17 08:50:56 +09:00