This changes the mmap strategy used in the compiler backend.
Previously, we used mmap syscall once per function and allocated the
executable pages each time. Basically, mmap can only allocate the
boundary of the page size of the underlying os. Even if the requested
executable is smaller than the page size, the entire page is marked as
executable and won't be reused by Go runtime. Therefore, we wasted
roughly `(len(body)%osPageSize)*function`.
Even though we still need to align each function on 16 bytes boundary
when mmaping per module, the wasted space is much smaller than before.
The following benchmark results shows that this improves the overall
compilation performance while showing the heap usage increased.
However, the increased heap usage is totally offset by the hidden wasted
memory page which is not measured by Go's -benchmem.
Actually, when I did the experiments, I observed that roughly 20~30mb are
wasted on arm64 previously which is larger than the increased heap usage
in this result. More importantly, this increased heap usage is a target of GC
and should be ignorable in the long-running program vs the wasted page
is persistent until the CompiledModule is closed.
Not only the actual compilation time, the result indicates that this could
improve the overall Go runtime's performance maybe thanks to not abusing
runtime.Finalizer since you can see this improves the subsequent interpreter
benchmark results.
```
goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Compilation_sqlite3/compiler-10 183.4m ± 0% 175.9m ± 2% -4.10% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10 61.59m ± 0% 59.57m ± 0% -3.29% (p=0.001 n=7)
geomean 106.3m 102.4m -3.69%
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
Compilation_sqlite3/compiler-10 42.93Mi ± 0% 54.33Mi ± 0% +26.56% (p=0.001 n=7)
Compilation_sqlite3/interpreter-10 51.75Mi ± 0% 51.75Mi ± 0% -0.01% (p=0.001 n=7)
geomean 47.13Mi 53.02Mi +12.49%
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Compilation_sqlite3/compiler-10 26.07k ± 0% 26.06k ± 0% ~ (p=0.149 n=7)
Compilation_sqlite3/interpreter-10 13.90k ± 0% 13.90k ± 0% ~ (p=0.421 n=7)
geomean 19.03k 19.03k -0.02%
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
cpu: AMD Ryzen 9 3950X 16-Core Processor
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Compilation_sqlite3/compiler-32 384.4m ± 2% 373.0m ± 4% -2.97% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 86.09m ± 4% 65.05m ± 2% -24.44% (p=0.001 n=7)
geomean 181.9m 155.8m -14.38%
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
Compilation_sqlite3/compiler-32 49.40Mi ± 0% 59.91Mi ± 0% +21.29% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 51.77Mi ± 0% 51.76Mi ± 0% -0.02% (p=0.001 n=7)
geomean 50.57Mi 55.69Mi +10.12%
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Compilation_sqlite3/compiler-32 28.70k ± 0% 28.70k ± 0% ~ (p=0.925 n=7)
Compilation_sqlite3/interpreter-32 14.00k ± 0% 14.00k ± 0% -0.04% (p=0.010 n=7)
geomean 20.05k 20.04k -0.02%
```
resolves#1060
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
This introduces the new API wazero.Cache interface which can be passed to wazero.RuntimeConfig.
Users can configure this to share the underlying compilation cache across multiple wazero.Runtime.
And along the way, this deletes the experimental file cache API as it's replaced by this new API.
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Co-authored-by: Crypt Keeper <64215+codefromthecrypt@users.noreply.github.com>
This switches to gofumpt and applies changes, as I've noticed working
in dapr (who uses this) that it finds some things that are annoying,
such as inconsistent block formatting in test tables.
Signed-off-by: Adrian Cole <adrian@tetrate.io>
This adds the experimental support of the file system compilation cache.
Notably, experimental.WithCompilationCacheDirName allows users to configure
where the compiler writes the cache into.
Versioning/validation of binary compatibility has been done via the release tag
(which will be created from the end of this month). More specifically, the cache
file starts with a header with the hardcoded wazero version.
Fixes#618
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Co-authored-by: Crypt Keeper <64215+codefromthecrypt@users.noreply.github.com>