wazero

Author	SHA1	Message	Date
Nuno Cruces	90f58bce75	compiler: fix compiledModule leak (#1608 ) Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com> Co-authored-by: Achille Roussel <achille.roussel@gmail.com>	2023-08-02 09:14:49 +08:00
Nuno Cruces	b4d97e5e69	compiler(amd64): emit smaller instructions (#1513 ) Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>	2023-06-13 06:50:56 +10:00
Achille	9780f0f4a0	compiler: zero-copy code assembly (#1481 ) Signed-off-by: Achille Roussel <achille.roussel@gmail.com> Co-authored-by: Crypt Keeper <64215+codefromthecrypt@users.noreply.github.com>	2023-05-19 07:06:30 +02:00
Takeshi Yoneda	867459d7d5	compiler: mmap per module instead of per function (#1377 ) This changes the mmap strategy used in the compiler backend. Previously, we used mmap syscall once per function and allocated the executable pages each time. Basically, mmap can only allocate the boundary of the page size of the underlying os. Even if the requested executable is smaller than the page size, the entire page is marked as executable and won't be reused by Go runtime. Therefore, we wasted roughly `(len(body)%osPageSize)*function`. Even though we still need to align each function on 16 bytes boundary when mmaping per module, the wasted space is much smaller than before. The following benchmark results shows that this improves the overall compilation performance while showing the heap usage increased. However, the increased heap usage is totally offset by the hidden wasted memory page which is not measured by Go's -benchmem. Actually, when I did the experiments, I observed that roughly 20~30mb are wasted on arm64 previously which is larger than the increased heap usage in this result. More importantly, this increased heap usage is a target of GC and should be ignorable in the long-running program vs the wasted page is persistent until the CompiledModule is closed. Not only the actual compilation time, the result indicates that this could improve the overall Go runtime's performance maybe thanks to not abusing runtime.Finalizer since you can see this improves the subsequent interpreter benchmark results. ``` goos: darwin goarch: arm64 pkg: github.com/tetratelabs/wazero/internal/integration_test/bench │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Compilation_sqlite3/compiler-10 183.4m ± 0% 175.9m ± 2% -4.10% (p=0.001 n=7) Compilation_sqlite3/interpreter-10 61.59m ± 0% 59.57m ± 0% -3.29% (p=0.001 n=7) geomean 106.3m 102.4m -3.69% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ Compilation_sqlite3/compiler-10 42.93Mi ± 0% 54.33Mi ± 0% +26.56% (p=0.001 n=7) Compilation_sqlite3/interpreter-10 51.75Mi ± 0% 51.75Mi ± 0% -0.01% (p=0.001 n=7) geomean 47.13Mi 53.02Mi +12.49% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ Compilation_sqlite3/compiler-10 26.07k ± 0% 26.06k ± 0% ~ (p=0.149 n=7) Compilation_sqlite3/interpreter-10 13.90k ± 0% 13.90k ± 0% ~ (p=0.421 n=7) geomean 19.03k 19.03k -0.02% goos: linux goarch: amd64 pkg: github.com/tetratelabs/wazero/internal/integration_test/bench cpu: AMD Ryzen 9 3950X 16-Core Processor │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Compilation_sqlite3/compiler-32 384.4m ± 2% 373.0m ± 4% -2.97% (p=0.001 n=7) Compilation_sqlite3/interpreter-32 86.09m ± 4% 65.05m ± 2% -24.44% (p=0.001 n=7) geomean 181.9m 155.8m -14.38% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ Compilation_sqlite3/compiler-32 49.40Mi ± 0% 59.91Mi ± 0% +21.29% (p=0.001 n=7) Compilation_sqlite3/interpreter-32 51.77Mi ± 0% 51.76Mi ± 0% -0.02% (p=0.001 n=7) geomean 50.57Mi 55.69Mi +10.12% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ Compilation_sqlite3/compiler-32 28.70k ± 0% 28.70k ± 0% ~ (p=0.925 n=7) Compilation_sqlite3/interpreter-32 14.00k ± 0% 14.00k ± 0% -0.04% (p=0.010 n=7) geomean 20.05k 20.04k -0.02% ``` resolves #1060 Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-04-26 14:11:37 +09:00
Takeshi Yoneda	b6d19696da	compiler: reuses allocated runtimeValueLocation stacks (#1348 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-04-10 14:58:47 +09:00
Takeshi Yoneda	24dbf49c79	compiler: avoid alloc with stack pointer ceil and reuse bytes reader (#1344 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-04-06 16:47:08 +09:00
Takeshi Yoneda	cc28399052	wazeroir: reuses allocated slices for a module (#1342 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-04-05 20:26:44 +09:00
Edoardo Vacchi	0dc152d672	wazeroir: migrate vector, table, branch and all other remaining ops to compact repr (#1334 ) Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> Co-authored-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-04-05 09:38:49 +09:00
Edoardo Vacchi	f3ef84c9b3	wazeroir: Load Ops, Store Ops, Set, Pick, Select, CallIndirect (#1329 ) Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>	2023-04-01 08:00:27 +09:00
Takeshi Yoneda	ef8e12a575	compiler: bitmask for tracking used registers (#1323 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-03-31 12:28:20 +09:00
Edoardo Vacchi	8887799da7	wazeroir: move unary byte ops to UnionOperator (#1320 ) Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>	2023-03-31 08:07:12 +09:00
Edoardo Vacchi	c5d37877bd	wazeroir: migrate unary operations to UnionOperation (#1318 ) * refactor: OperationCall, OperationGlobalGet, OperationGlobalSet * refactor: Constant Operations Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com> ---------	2023-03-30 12:57:45 +02:00
Takeshi Yoneda	0857336746	Removes wasm.FunctionInstance type (#1294 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-03-28 14:43:44 +09:00
Takeshi Yoneda	350e81e632	Holds function types as values, not ptrs in wasm.Module (#1227 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-03-15 13:45:52 +09:00
Takeshi Yoneda	5eab1a7307	compiler: pass runtimeValueLocationStack by values to reduce allocations (#1170 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-02-27 09:42:45 +09:00
Takeshi Yoneda	a265d41d30	wazeroir: less allocations (#1138 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2023-02-20 13:29:56 +09:00
Edoardo Vacchi	5895873019	Add support for querying amd64 cpuid (#1118 ) Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>	2023-02-11 22:12:18 +09:00
Takeshi Yoneda	d63c747d53	asm,compiler: reduce allocations during compilation (#936 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-12-20 12:49:47 +09:00
Takeshi Yoneda	0dde445074	compiler: make moduleContext.functions non-pointer slice (#898 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-12-07 15:20:24 +09:00
Crypt Keeper	329ccca6b1	Switches from gofmt to gofumpt (#848 ) This switches to gofumpt and applies changes, as I've noticed working in dapr (who uses this) that it finds some things that are annoying, such as inconsistent block formatting in test tables. Signed-off-by: Adrian Cole <adrian@tetrate.io>	2022-11-09 05:48:24 +01:00
Takeshi Yoneda	9ad8af121a	compiler: simplify calling convention (#782 ) This simplifies the calling convention and consolidates the call frame stack and value stack into a single stack. As a result, the cost of function calls decreases because we now don't need to check the boundary twice (value and call frame stacks) at each function call. The following is the result of the benchmark for recursive Fibonacci function in integration_test/bench/testdata/case.go, and it shows that this actually improves the performance of function calls. [amd64] name old time/op new time/op delta Invocation/compiler/fib_for_5-32 109ns ± 3% 81ns ± 1% -25.86% (p=0.008 n=5+5) Invocation/compiler/fib_for_10-32 556ns ± 3% 473ns ± 3% -14.99% (p=0.008 n=5+5) Invocation/compiler/fib_for_20-32 61.4µs ± 2% 55.9µs ± 5% -8.98% (p=0.008 n=5+5) Invocation/compiler/fib_for_30-32 7.41ms ± 3% 6.83ms ± 3% -7.90% (p=0.008 n=5+5) [arm64] name old time/op new time/op delta Invocation/compiler/fib_for_5-10 67.7ns ± 1% 60.2ns ± 1% -11.12% (p=0.000 n=9+9) Invocation/compiler/fib_for_10-10 487ns ± 1% 460ns ± 0% -5.56% (p=0.000 n=10+9) Invocation/compiler/fib_for_20-10 58.0µs ± 1% 54.3µs ± 1% -6.38% (p=0.000 n=10+10) Invocation/compiler/fib_for_30-10 7.12ms ± 1% 6.67ms ± 1% -6.31% (p=0.000 n=10+9) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-09-06 13:29:56 +09:00
Takeshi Yoneda	087ab9d9fc	amd64: do not load/store higher bits of 32-bit int/float (#742 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-08-12 12:08:20 +08:00
Vladislav	63e438aa66	Optimize memory bulk operations: copy on x86 (#700 ) Signed-off-by: Vladislav Oleshko <vladislav.oleshko@gmail.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>	2022-07-26 18:22:47 +09:00
Takeshi Yoneda	6a62b794f5	Fixes compileDropRange when all registers are used up by live values (#711 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-07-25 09:18:04 +09:00
Takeshi Yoneda	e6f08a86fd	arm64: remove NOP insertion at the beginning (#653 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-06-24 09:39:47 +09:00
Takeshi Yoneda	3b4544ee48	compiler: remove embedding of pointers of jump tables (#650 ) This removes the embedding of pointers of jump tables (uintptr of []byte) used by BrTable operations. That is the last usage of unsafe.Pointer in compiler implementations. Alternatively, we treat jump tables as asm.StaticConst and emit them into the constPool already implemented and used by various places. Notably, now the native code compiled by compilers can be reusable across multiple processes, meaning that they are independent of any runtime pointers. Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-06-23 13:42:46 +09:00
Anuraag Agrawal	296b5f56b7	Follows go style for amd64 constants. (#586 ) Signed-off-by: Anuraag Agrawal <anuraaga@gmail.com>	2022-05-23 12:51:37 +09:00
Takeshi Yoneda	18fdf3eecc	compiler: fixes calling convention violation. (#583 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>	2022-05-21 16:37:26 +09:00
Anuraag Agrawal	ec3ada35a0	Use correct pattern for table tests everywhere (#582 ) Signed-off-by: Anuraag Agrawal <anuraaga@gmail.com>	2022-05-20 16:55:01 +09:00
Takeshi Yoneda	9a9b361ac8	Vector values support in ahead-of-time compiler (#572 ) Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io> Signed-off-by: Adrian Cole <adrian@tetrate.io>	2022-05-19 11:02:15 -06:00
Crypt Keeper	c815060196	Renames JIT to Compiler and notes it is AOT (#564 ) This notably changes NewRuntimeJIT to NewRuntimeCompiler as well renames packages from jit to compiler. This clarifies the implementation is AOT, not JIT, at least when clarified to where it occurs (Runtime.CompileModule). In doing so, we reduce any concern that compilation will happen during function execution. We also free ourselves to create a JIT option without confusion in the future via CompileConfig or otherwise. Fixes #560 Signed-off-by: Adrian Cole <adrian@tetrate.io>	2022-05-17 08:50:56 +09:00

31 Commits