# Benchmark Comparison Report ## Signer Implementation Comparison This report compares three signer implementations for secp256k1 operations: 1. **P256K1Signer** - This repository's new port from Bitcoin Core secp256k1 (pure Go) 2. **BtcecSigner** - Pure Go wrapper around btcec/v2 3. **NextP256K Signer** - CGO version using next.orly.dev/pkg/crypto/p256k (CGO bindings to libsecp256k1) **Generated:** 2025-11-01 **Platform:** linux/amd64 **CPU:** AMD Ryzen 5 PRO 4650G with Radeon Graphics **Go Version:** go1.25.3 --- ## Summary Results | Operation | P256K1Signer | BtcecSigner | NextP256K | Winner | |-----------|-------------|-------------|-----------|--------| | **Pubkey Derivation** | 232,922 ns/op | 63,317 ns/op | 295,599 ns/op | Btcec (3.7x faster) | | **Sign** | 136,560 ns/op | 216,808 ns/op | 53,454 ns/op | NextP256K (2.6x faster) | | **Verify** | 268,771 ns/op | 160,894 ns/op | 38,423 ns/op | NextP256K (7.0x faster) | | **ECDH** | 158,730 ns/op | 130,804 ns/op | 124,998 ns/op | NextP256K (1.3x faster) | --- ## Detailed Results ### Public Key Derivation Deriving public key from private key (32 bytes → 32 bytes x-only pubkey). | Implementation | Time per op | Memory | Allocations | Speedup vs P256K1 | |----------------|-------------|--------|-------------|-------------------| | **P256K1Signer** | 232,922 ns/op | 256 B/op | 4 allocs/op | 1.0x (baseline) | | **BtcecSigner** | 63,317 ns/op | 368 B/op | 7 allocs/op | **3.7x faster** | | **NextP256K** | 295,599 ns/op | 983,395 B/op | 9 allocs/op | 0.8x slower | **Analysis:** - Btcec is fastest for key derivation (3.7x faster than P256K1) - NextP256K is slowest, likely due to CGO overhead for small operations - P256K1 has lowest memory allocation overhead ### Signing (Schnorr) Creating BIP-340 Schnorr signatures (32-byte message → 64-byte signature). | Implementation | Time per op | Memory | Allocations | Speedup vs P256K1 | |----------------|-------------|--------|-------------|-------------------| | **P256K1Signer** | 136,560 ns/op | 1,152 B/op | 17 allocs/op | 1.0x (baseline) | | **BtcecSigner** | 216,808 ns/op | 2,193 B/op | 38 allocs/op | 0.6x slower | | **NextP256K** | 53,454 ns/op | 128 B/op | 3 allocs/op | **2.6x faster** | **Analysis:** - NextP256K is fastest (2.6x faster than P256K1), benefiting from optimized C implementation - P256K1 is second fastest, showing good performance for pure Go - Btcec is slowest, likely due to more allocations and pure Go overhead - NextP256K has lowest memory usage (128 B vs 1,152 B) ### Verification (Schnorr) Verifying BIP-340 Schnorr signatures (32-byte message + 64-byte signature). | Implementation | Time per op | Memory | Allocations | Speedup vs P256K1 | |----------------|-------------|--------|-------------|-------------------| | **P256K1Signer** | 268,771 ns/op | 576 B/op | 9 allocs/op | 1.0x (baseline) | | **BtcecSigner** | 160,894 ns/op | 1,120 B/op | 18 allocs/op | 1.7x faster | | **NextP256K** | 38,423 ns/op | 96 B/op | 2 allocs/op | **7.0x faster** | **Analysis:** - NextP256K is dramatically fastest (7.0x faster), showcasing CGO advantage for verification - Btcec is second fastest (1.7x faster than P256K1) - P256K1 is slowest but still reasonable for pure Go - NextP256K has minimal memory footprint (96 B vs 576 B) ### ECDH (Shared Secret Generation) Generating shared secret using Elliptic Curve Diffie-Hellman. | Implementation | Time per op | Memory | Allocations | Speedup vs P256K1 | |----------------|-------------|--------|-------------|-------------------| | **P256K1Signer** | 158,730 ns/op | 241 B/op | 6 allocs/op | 1.0x (baseline) | | **BtcecSigner** | 130,804 ns/op | 832 B/op | 13 allocs/op | 1.2x faster | | **NextP256K** | 124,998 ns/op | 160 B/op | 3 allocs/op | **1.3x faster** | **Analysis:** - All implementations are relatively close in performance - NextP256K has slight edge (1.3x faster) - P256K1 has lowest memory usage (241 B) - Performance difference is marginal for this operation --- ## Performance Analysis ### Overall Winner: NextP256K (CGO) The CGO-based NextP256K implementation wins in 3 out of 4 operations: - **Signing:** 2.6x faster than P256K1 - **Verification:** 7.0x faster than P256K1 (largest advantage) - **ECDH:** 1.3x faster than P256K1 ### Best Pure Go: Mixed Results For pure Go implementations: - **Btcec** wins for key derivation (3.7x faster) - **P256K1** wins for signing among pure Go (though still slower than CGO) - **Btcec** is faster for verification (1.7x faster than P256K1) - Both are comparable for ECDH ### Memory Efficiency | Implementation | Avg Memory per Operation | Notes | |----------------|-------------------------|-------| | **NextP256K** | ~300 KB avg | Very efficient, minimal allocations | | **P256K1Signer** | ~500 B avg | Low memory footprint | | **BtcecSigner** | ~1.1 KB avg | Higher allocations, but acceptable | **Note:** NextP256K shows high memory in pubkey derivation (983 KB) due to one-time CGO initialization overhead, but this is amortized across operations. --- ## Recommendations ### Use NextP256K (CGO) when: - Maximum performance is critical - CGO is acceptable in your build environment - Low memory footprint is important - Verification speed is critical (7x faster) ### Use P256K1Signer when: - Pure Go is required (no CGO) - Good balance of performance and simplicity - Lower memory allocations are preferred - You want to avoid external C dependencies ### Use BtcecSigner when: - Pure Go is required - Key derivation performance matters (3.7x faster) - You're already using btcec in your project - Verification needs to be faster than P256K1 but CGO isn't available --- ## Conclusion The benchmarks demonstrate that: 1. **CGO implementations (NextP256K) provide significant performance advantages** for cryptographic operations, especially verification (7x faster) 2. **Pure Go implementations are competitive** for most operations, with Btcec showing strength in key derivation and verification 3. **P256K1Signer** provides a good middle ground with reasonable performance and clean API 4. **Memory efficiency** varies by operation, with NextP256K generally being most efficient The choice between implementations depends on your specific requirements: - **Performance-critical applications:** Use NextP256K (CGO) - **Pure Go requirements:** Choose between Btcec (faster) or P256K1 (cleaner API) - **Balance:** P256K1Signer offers good performance with pure Go simplicity --- ## Running the Benchmarks To reproduce these benchmarks: ```bash # Run all benchmarks CGO_ENABLED=1 go test -tags=cgo ./bench -bench=. -benchmem # Run specific operation CGO_ENABLED=1 go test -tags=cgo ./bench -bench=BenchmarkSign # Run specific implementation CGO_ENABLED=1 go test -tags=cgo ./bench -bench=Benchmark.*_P256K1 ``` **Note:** All benchmarks require CGO to be enabled (`CGO_ENABLED=1`) and the `cgo` build tag.