Files
p256k1/bench/BENCHMARK_REPORT.md
mleku f259c9a2e1 Remove benchmark results file and update Go module dependencies
This commit deletes the `benchmark_results.txt` file, which contained performance metrics for various cryptographic operations. Additionally, the Go module has been updated to version 1.25.0, and new dependencies have been added, including `btcec` for enhanced signing capabilities. The `go.sum` file has also been updated to reflect these changes. A new benchmark report has been introduced to provide a comprehensive comparison of signer implementations.
2025-11-01 21:03:50 +00:00

6.8 KiB

Benchmark Comparison Report

Signer Implementation Comparison

This report compares three signer implementations for secp256k1 operations:

  1. P256K1Signer - This repository's new port from Bitcoin Core secp256k1 (pure Go)
  2. BtcecSigner - Pure Go wrapper around btcec/v2
  3. NextP256K Signer - CGO version using next.orly.dev/pkg/crypto/p256k (CGO bindings to libsecp256k1)

Generated: 2025-11-01
Platform: linux/amd64
CPU: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Go Version: go1.25.3


Summary Results

Operation P256K1Signer BtcecSigner NextP256K Winner
Pubkey Derivation 232,922 ns/op 63,317 ns/op 295,599 ns/op Btcec (3.7x faster)
Sign 136,560 ns/op 216,808 ns/op 53,454 ns/op NextP256K (2.6x faster)
Verify 268,771 ns/op 160,894 ns/op 38,423 ns/op NextP256K (7.0x faster)
ECDH 158,730 ns/op 130,804 ns/op 124,998 ns/op NextP256K (1.3x faster)

Detailed Results

Public Key Derivation

Deriving public key from private key (32 bytes → 32 bytes x-only pubkey).

Implementation Time per op Memory Allocations Speedup vs P256K1
P256K1Signer 232,922 ns/op 256 B/op 4 allocs/op 1.0x (baseline)
BtcecSigner 63,317 ns/op 368 B/op 7 allocs/op 3.7x faster
NextP256K 295,599 ns/op 983,395 B/op 9 allocs/op 0.8x slower

Analysis:

  • Btcec is fastest for key derivation (3.7x faster than P256K1)
  • NextP256K is slowest, likely due to CGO overhead for small operations
  • P256K1 has lowest memory allocation overhead

Signing (Schnorr)

Creating BIP-340 Schnorr signatures (32-byte message → 64-byte signature).

Implementation Time per op Memory Allocations Speedup vs P256K1
P256K1Signer 136,560 ns/op 1,152 B/op 17 allocs/op 1.0x (baseline)
BtcecSigner 216,808 ns/op 2,193 B/op 38 allocs/op 0.6x slower
NextP256K 53,454 ns/op 128 B/op 3 allocs/op 2.6x faster

Analysis:

  • NextP256K is fastest (2.6x faster than P256K1), benefiting from optimized C implementation
  • P256K1 is second fastest, showing good performance for pure Go
  • Btcec is slowest, likely due to more allocations and pure Go overhead
  • NextP256K has lowest memory usage (128 B vs 1,152 B)

Verification (Schnorr)

Verifying BIP-340 Schnorr signatures (32-byte message + 64-byte signature).

Implementation Time per op Memory Allocations Speedup vs P256K1
P256K1Signer 268,771 ns/op 576 B/op 9 allocs/op 1.0x (baseline)
BtcecSigner 160,894 ns/op 1,120 B/op 18 allocs/op 1.7x faster
NextP256K 38,423 ns/op 96 B/op 2 allocs/op 7.0x faster

Analysis:

  • NextP256K is dramatically fastest (7.0x faster), showcasing CGO advantage for verification
  • Btcec is second fastest (1.7x faster than P256K1)
  • P256K1 is slowest but still reasonable for pure Go
  • NextP256K has minimal memory footprint (96 B vs 576 B)

ECDH (Shared Secret Generation)

Generating shared secret using Elliptic Curve Diffie-Hellman.

Implementation Time per op Memory Allocations Speedup vs P256K1
P256K1Signer 158,730 ns/op 241 B/op 6 allocs/op 1.0x (baseline)
BtcecSigner 130,804 ns/op 832 B/op 13 allocs/op 1.2x faster
NextP256K 124,998 ns/op 160 B/op 3 allocs/op 1.3x faster

Analysis:

  • All implementations are relatively close in performance
  • NextP256K has slight edge (1.3x faster)
  • P256K1 has lowest memory usage (241 B)
  • Performance difference is marginal for this operation

Performance Analysis

Overall Winner: NextP256K (CGO)

The CGO-based NextP256K implementation wins in 3 out of 4 operations:

  • Signing: 2.6x faster than P256K1
  • Verification: 7.0x faster than P256K1 (largest advantage)
  • ECDH: 1.3x faster than P256K1

Best Pure Go: Mixed Results

For pure Go implementations:

  • Btcec wins for key derivation (3.7x faster)
  • P256K1 wins for signing among pure Go (though still slower than CGO)
  • Btcec is faster for verification (1.7x faster than P256K1)
  • Both are comparable for ECDH

Memory Efficiency

Implementation Avg Memory per Operation Notes
NextP256K ~300 KB avg Very efficient, minimal allocations
P256K1Signer ~500 B avg Low memory footprint
BtcecSigner ~1.1 KB avg Higher allocations, but acceptable

Note: NextP256K shows high memory in pubkey derivation (983 KB) due to one-time CGO initialization overhead, but this is amortized across operations.


Recommendations

Use NextP256K (CGO) when:

  • Maximum performance is critical
  • CGO is acceptable in your build environment
  • Low memory footprint is important
  • Verification speed is critical (7x faster)

Use P256K1Signer when:

  • Pure Go is required (no CGO)
  • Good balance of performance and simplicity
  • Lower memory allocations are preferred
  • You want to avoid external C dependencies

Use BtcecSigner when:

  • Pure Go is required
  • Key derivation performance matters (3.7x faster)
  • You're already using btcec in your project
  • Verification needs to be faster than P256K1 but CGO isn't available

Conclusion

The benchmarks demonstrate that:

  1. CGO implementations (NextP256K) provide significant performance advantages for cryptographic operations, especially verification (7x faster)

  2. Pure Go implementations are competitive for most operations, with Btcec showing strength in key derivation and verification

  3. P256K1Signer provides a good middle ground with reasonable performance and clean API

  4. Memory efficiency varies by operation, with NextP256K generally being most efficient

The choice between implementations depends on your specific requirements:

  • Performance-critical applications: Use NextP256K (CGO)
  • Pure Go requirements: Choose between Btcec (faster) or P256K1 (cleaner API)
  • Balance: P256K1Signer offers good performance with pure Go simplicity

Running the Benchmarks

To reproduce these benchmarks:

# Run all benchmarks
CGO_ENABLED=1 go test -tags=cgo ./bench -bench=. -benchmem

# Run specific operation
CGO_ENABLED=1 go test -tags=cgo ./bench -bench=BenchmarkSign

# Run specific implementation
CGO_ENABLED=1 go test -tags=cgo ./bench -bench=Benchmark.*_P256K1

Note: All benchmarks require CGO to be enabled (CGO_ENABLED=1) and the cgo build tag.