This adds the new vs target to measure the cost of host function calls.
Notably, I can see that wazero is roughly 2x to 4x times faster than CGO-based
runtimes in terms of host call boundary crossing. One implication here is that
we can just focus on the native code generation rather than how to organize the
Go function calls. For example, it's not prioritized to call Go functions directly
from the native code.
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>