* experimental: give listener r/o view of globals
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* experimental: add global support to interpreter
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* exp: replace globals proxy with module interface
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* exp: trim down experimental.InternalModule
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* Replace globals view slice with constantGlobal
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* Fix tests after merge
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* Address PR feedback
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* Rename methods of experimental.Module
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* Run make check
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
* Add WazeroOnlyType to constantGlobal
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
---------
Signed-off-by: Thomas Pelletier <thomas@pelletier.codes>
This adds a new type `internalapi.WazeroOnly` which should be embedded on types users are likely to accidentally implement despite docs saying otherwise.
Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>
This changes the mmap strategy used in the compiler backend.
Previously, we used mmap syscall once per function and allocated the
executable pages each time. Basically, mmap can only allocate the
boundary of the page size of the underlying os. Even if the requested
executable is smaller than the page size, the entire page is marked as
executable and won't be reused by Go runtime. Therefore, we wasted
roughly `(len(body)%osPageSize)*function`.
Even though we still need to align each function on 16 bytes boundary
when mmaping per module, the wasted space is much smaller than before.
The following benchmark results shows that this improves the overall
compilation performance while showing the heap usage increased.
However, the increased heap usage is totally offset by the hidden wasted
memory page which is not measured by Go's -benchmem.
Actually, when I did the experiments, I observed that roughly 20~30mb are
wasted on arm64 previously which is larger than the increased heap usage
in this result. More importantly, this increased heap usage is a target of GC
and should be ignorable in the long-running program vs the wasted page
is persistent until the CompiledModule is closed.
Not only the actual compilation time, the result indicates that this could
improve the overall Go runtime's performance maybe thanks to not abusing
runtime.Finalizer since you can see this improves the subsequent interpreter
benchmark results.
```
goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Compilation_sqlite3/compiler-10 183.4m ± 0% 175.9m ± 2% -4.10% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10 61.59m ± 0% 59.57m ± 0% -3.29% (p=0.001 n=7)
geomean 106.3m 102.4m -3.69%
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
Compilation_sqlite3/compiler-10 42.93Mi ± 0% 54.33Mi ± 0% +26.56% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10 51.75Mi ± 0% 51.75Mi ± 0% -0.01% (p=0.001 n=7)
geomean 47.13Mi 53.02Mi +12.49%
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Compilation_sqlite3/compiler-10 26.07k ± 0% 26.06k ± 0% ~ (p=0.149 n=7)
Compilation_sqlite3/interpreter-10 13.90k ± 0% 13.90k ± 0% ~ (p=0.421 n=7)
geomean 19.03k 19.03k -0.02%
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
cpu: AMD Ryzen 9 3950X 16-Core Processor
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Compilation_sqlite3/compiler-32 384.4m ± 2% 373.0m ± 4% -2.97% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 86.09m ± 4% 65.05m ± 2% -24.44% (p=0.001 n=7)
geomean 181.9m 155.8m -14.38%
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
Compilation_sqlite3/compiler-32 49.40Mi ± 0% 59.91Mi ± 0% +21.29% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 51.77Mi ± 0% 51.76Mi ± 0% -0.02% (p=0.001 n=7)
geomean 50.57Mi 55.69Mi +10.12%
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Compilation_sqlite3/compiler-32 28.70k ± 0% 28.70k ± 0% ~ (p=0.925 n=7)
Compilation_sqlite3/interpreter-32 14.00k ± 0% 14.00k ± 0% -0.04% (p=0.010 n=7)
geomean 20.05k 20.04k -0.02%
```
resolves#1060
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>