cleanup
This commit is contained in:
@@ -1,375 +0,0 @@
|
||||
# Four-Phase Implementation Plan - secp256k1 Go Port
|
||||
|
||||
## Overview
|
||||
|
||||
This document outlines the complete four-phase implementation plan for porting the secp256k1 cryptographic library from C to Go. The implementation follows the C reference implementation exactly, ensuring mathematical correctness and compatibility.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Core Infrastructure & Mathematical Primitives ✅
|
||||
|
||||
### Status: **100% Complete** (25/25 tests passing)
|
||||
|
||||
### Objectives
|
||||
Establish the mathematical foundation and core infrastructure for all cryptographic operations.
|
||||
|
||||
### Completed Components
|
||||
|
||||
#### 1. **Field Element Operations** ✅
|
||||
- **File**: `field.go`, `field_mul.go`, `field_test.go`
|
||||
- **Status**: 100% complete (9/9 tests passing)
|
||||
- **Key Features**:
|
||||
- Field arithmetic (addition, subtraction, multiplication, squaring)
|
||||
- Field normalization and reduction
|
||||
- Field inverse computation (Fermat's little theorem)
|
||||
- Field square root computation
|
||||
- 512-bit to 256-bit modular reduction (matches C reference exactly)
|
||||
- Constant-time operations where required
|
||||
- Secure memory clearing
|
||||
|
||||
#### 2. **Scalar Operations** ✅
|
||||
- **File**: `scalar.go`, `scalar_test.go`
|
||||
- **Status**: 100% complete (11/11 tests passing)
|
||||
- **Key Features**:
|
||||
- Scalar arithmetic (addition, subtraction, multiplication)
|
||||
- Scalar modular inverse
|
||||
- Scalar exponentiation
|
||||
- Scalar halving
|
||||
- 512-bit to 256-bit modular reduction (three-stage reduction from C)
|
||||
- Private key validation
|
||||
- Constant-time conditional operations
|
||||
|
||||
#### 3. **Context Management** ✅
|
||||
- **File**: `context.go`, `context_test.go`
|
||||
- **Status**: 100% complete (5/5 tests passing)
|
||||
- **Key Features**:
|
||||
- Context creation with capability flags (signing/verification)
|
||||
- Context destruction and cleanup
|
||||
- Context randomization for side-channel protection
|
||||
- Static verification-only context
|
||||
- Capability checking
|
||||
|
||||
#### 4. **Group Operations** ✅
|
||||
- **File**: `group.go`, `group_test.go`
|
||||
- **Status**: 100% complete (4/4 tests passing)
|
||||
- **Key Features**:
|
||||
- `GroupElementAffine` and `GroupElementJacobian` types
|
||||
- Affine coordinate operations (complete)
|
||||
- Jacobian coordinate operations (optimized)
|
||||
- Point doubling (`double`) - C reference implementation
|
||||
- Point addition in Jacobian coordinates (`addVar`) - C reference implementation (~78x faster)
|
||||
- Point addition with affine input (`addGE`) - C reference implementation (optimized)
|
||||
- Coordinate conversion (affine ↔ Jacobian)
|
||||
- Generator point initialization
|
||||
- Storage format conversion
|
||||
- Field element `normalizesToZeroVar` helper for efficient point comparison
|
||||
|
||||
#### 5. **Public Key Operations** ✅
|
||||
- **File**: `pubkey.go`, `pubkey_test.go`
|
||||
- **Status**: 100% complete (4/4 tests passing)
|
||||
- **Key Features**:
|
||||
- `PublicKey` type with 64-byte internal representation
|
||||
- Public key parsing (compressed/uncompressed)
|
||||
- Public key serialization
|
||||
- Public key comparison (working)
|
||||
- Public key creation from private key (scalar multiplication working)
|
||||
|
||||
#### 6. **Generator Multiplication** ✅
|
||||
- **File**: `ecmult_gen.go`
|
||||
- **Status**: Infrastructure complete
|
||||
- **Key Features**:
|
||||
- `EcmultGenContext` for precomputed tables
|
||||
- `EcmultGen` function for `n * G` computation
|
||||
- Binary method implementation (ready for optimization)
|
||||
|
||||
### Remaining Issues
|
||||
|
||||
None - Phase 1 is complete! ✅
|
||||
|
||||
### Test Coverage
|
||||
- **Total Tests**: 25 test functions
|
||||
- **Passing**: 25 tests ✅
|
||||
- **Failing**: 0 tests ✅
|
||||
- **Success Rate**: 100%
|
||||
|
||||
### Files Created
|
||||
```
|
||||
├── context.go ✅ Context management (COMPLETE)
|
||||
├── context_test.go ✅ Context tests (ALL PASSING)
|
||||
├── field.go ✅ Field arithmetic (COMPLETE)
|
||||
├── field_mul.go ✅ Field multiplication/operations (COMPLETE)
|
||||
├── field_test.go ✅ Field tests (ALL PASSING)
|
||||
├── scalar.go ✅ Scalar arithmetic (COMPLETE)
|
||||
├── scalar_test.go ✅ Scalar tests (ALL PASSING)
|
||||
├── group.go ✅ Group operations (COMPLETE)
|
||||
├── group_test.go ✅ Group tests (ALL PASSING)
|
||||
├── ecmult_gen.go ✅ Generator multiplication (INFRASTRUCTURE)
|
||||
├── pubkey.go ✅ Public key operations (COMPLETE)
|
||||
└── pubkey_test.go ✅ Public key tests (ALL PASSING)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: ECDSA Signatures & Hash Functions ✅
|
||||
|
||||
### Status: **100% Complete**
|
||||
|
||||
### Objectives
|
||||
Implement ECDSA signature creation and verification, along with cryptographic hash functions.
|
||||
|
||||
### Planned Components
|
||||
|
||||
#### 1. **Hash Functions** ✅
|
||||
- **Files**: `hash.go`, `hash_test.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- SHA-256 implementation (using sha256-simd)
|
||||
- Tagged SHA-256 (BIP-340 style)
|
||||
- RFC6979 nonce generation (deterministic signing)
|
||||
- HMAC-SHA256 implementation
|
||||
- Hash-to-field element conversion
|
||||
- Hash-to-scalar conversion
|
||||
- Message hashing utilities
|
||||
|
||||
#### 2. **ECDSA Signatures** ✅
|
||||
- **Files**: `ecdsa.go`, `ecdsa_test.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- `ECDSASign` - Create signatures from message hash and private key
|
||||
- `ECDSAVerify` - Verify signatures against message hash and public key
|
||||
- Compact signature format (64-byte)
|
||||
- Signature normalization (low-S)
|
||||
- RFC6979 deterministic nonce generation
|
||||
|
||||
#### 3. **Private Key Operations** ✅
|
||||
- **Files**: `eckey.go`, `eckey_test.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- Private key generation (`ECSeckeyGenerate`)
|
||||
- Private key validation (`ECSeckeyVerify`)
|
||||
- Private key negation (`ECSeckeyNegate`)
|
||||
- Key pair generation (`ECKeyPairGenerate`)
|
||||
- Key tweaking (add/multiply) for BIP32-style derivation
|
||||
- Public key tweaking (add/multiply)
|
||||
|
||||
#### 4. **Benchmarks**
|
||||
- **Files**: `ecdsa_bench_test.go`, `BENCHMARK_RESULTS.md`
|
||||
- **Features**:
|
||||
- Signing performance benchmarks ✅
|
||||
- Verification performance benchmarks ✅
|
||||
- Hash function benchmarks ✅
|
||||
- Key generation benchmarks ✅
|
||||
- Comparison with C implementation ✅
|
||||
- Memory usage profiling ✅
|
||||
- Comprehensive benchmark results document ✅
|
||||
|
||||
### Dependencies
|
||||
- ✅ Phase 1: Field arithmetic, scalar arithmetic, group operations
|
||||
- ✅ Point doubling algorithm working correctly
|
||||
- ✅ Scalar multiplication working correctly
|
||||
|
||||
### Success Criteria
|
||||
- [x] All ECDSA signing tests pass ✅
|
||||
- [x] All ECDSA verification tests pass ✅
|
||||
- [x] Hash functions match reference implementation ✅
|
||||
- [x] RFC6979 nonce generation produces correct results ✅
|
||||
- [x] Performance benchmarks implemented and documented ✅
|
||||
- Signing: ~5ms/op (2-3x slower than C, acceptable for production)
|
||||
- Verification: ~10ms/op (2-3x slower than C, zero allocations)
|
||||
- Full benchmark suite: 17 benchmarks covering all operations
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: ECDH Key Exchange ✅
|
||||
|
||||
### Status: **100% Complete**
|
||||
|
||||
### Objectives
|
||||
Implement Elliptic Curve Diffie-Hellman key exchange for secure key derivation.
|
||||
|
||||
### Completed Components
|
||||
|
||||
#### 1. **ECDH Operations** ✅
|
||||
- **Files**: `ecdh.go`, `ecdh_test.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- `ECDH` - Compute shared secret from private key and public key ✅
|
||||
- `ECDHWithHKDF` - ECDH with HKDF key derivation ✅
|
||||
- `ECDHXOnly` - X-only ECDH (BIP-340 style) ✅
|
||||
- Custom hash function support ✅
|
||||
- Secure memory clearing ✅
|
||||
|
||||
#### 2. **Advanced Point Multiplication** ✅
|
||||
- **Files**: `ecdh.go` (includes EcmultConst and Ecmult)
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- `EcmultConst` - Constant-time multiplication for arbitrary points ✅
|
||||
- `Ecmult` - Variable-time optimized multiplication ✅
|
||||
- Binary method implementation (ready for further optimization) ✅
|
||||
|
||||
#### 3. **HKDF Support** ✅
|
||||
- **Files**: `ecdh.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- `HKDF` - HMAC-based Key Derivation Function (RFC 5869) ✅
|
||||
- Extract and Expand phases ✅
|
||||
- Supports arbitrary output length ✅
|
||||
- Secure memory clearing ✅
|
||||
|
||||
### Dependencies
|
||||
- ✅ Phase 1: Group operations, scalar multiplication
|
||||
- ✅ Phase 2: Hash functions (for HKDF)
|
||||
|
||||
### Success Criteria
|
||||
- [x] ECDH computes correct shared secrets ✅
|
||||
- [x] X-only ECDH matches reference implementation ✅
|
||||
- [x] HKDF key derivation works correctly ✅
|
||||
- [x] All ECDH tests pass ✅
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Schnorr Signatures & Advanced Features ✅
|
||||
|
||||
### Status: **100% Complete**
|
||||
|
||||
### Objectives
|
||||
Implement BIP-340 Schnorr signatures and advanced cryptographic features.
|
||||
|
||||
### Completed Components
|
||||
|
||||
#### 1. **Schnorr Signatures** ✅
|
||||
- **Files**: `schnorr.go`, `schnorr_test.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- `SchnorrSign` - Create BIP-340 compliant signatures ✅
|
||||
- `SchnorrVerify` - Verify BIP-340 signatures ✅
|
||||
- `NonceFunctionBIP340` - BIP-340 nonce generation ✅
|
||||
- Tagged hash support (BIP-340 style) ✅
|
||||
- Auxiliary randomness support ✅
|
||||
- Secure memory clearing ✅
|
||||
|
||||
#### 2. **Extended Public Keys** ✅
|
||||
- **Files**: `extrakeys.go`, `extrakeys_test.go`
|
||||
- **Status**: 100% complete
|
||||
- **Key Features**:
|
||||
- `XOnlyPubkey` type (32-byte X coordinate) ✅
|
||||
- `KeyPair` type for Schnorr signatures ✅
|
||||
- `XOnlyPubkeyParse` - Parse x-only public keys ✅
|
||||
- `XOnlyPubkeyFromPubkey` - Convert full pubkey to x-only ✅
|
||||
- `XOnlyPubkeyCmp` - Compare x-only public keys ✅
|
||||
- `KeyPairCreate` - Create keypair from secret key ✅
|
||||
- `KeyPairGenerate` - Generate random keypair ✅
|
||||
- Public key parity extraction ✅
|
||||
|
||||
### Dependencies
|
||||
- ✅ Phase 1: Complete core infrastructure
|
||||
- ✅ Phase 2: Hash functions (TaggedHash already implemented)
|
||||
- ✅ Phase 3: ECDH, optimized multiplication
|
||||
|
||||
### Success Criteria
|
||||
- [x] Schnorr signatures match BIP-340 specification ✅
|
||||
- [x] All Schnorr signature tests pass ✅
|
||||
- [x] X-only public keys work correctly ✅
|
||||
- [x] Keypair operations work correctly ✅
|
||||
- [x] All Phase 4 tests pass ✅
|
||||
|
||||
---
|
||||
|
||||
## Overall Implementation Strategy
|
||||
|
||||
### Principles
|
||||
1. **Exact C Reference**: Follow C implementation algorithms exactly
|
||||
2. **Test-Driven**: Write comprehensive tests for each component
|
||||
3. **Incremental**: Complete each phase before moving to next
|
||||
4. **Performance**: Optimize where possible without sacrificing correctness
|
||||
5. **Go Idioms**: Use Go's type system and error handling appropriately
|
||||
|
||||
### Testing Strategy
|
||||
- **Unit Tests**: Every function has dedicated tests
|
||||
- **Integration Tests**: End-to-end operation tests
|
||||
- **Property Tests**: Cryptographic property verification
|
||||
- **Benchmarks**: Performance measurement and comparison
|
||||
- **Edge Cases**: Boundary condition testing
|
||||
|
||||
### Code Quality
|
||||
- **Documentation**: Comprehensive comments matching C reference
|
||||
- **Type Safety**: Strong typing throughout
|
||||
- **Error Handling**: Proper error propagation
|
||||
- **Memory Safety**: Secure memory clearing
|
||||
- **Constant-Time**: Where required for security
|
||||
|
||||
---
|
||||
|
||||
## Current Status Summary
|
||||
|
||||
### Phase 1: ✅ 100% Complete
|
||||
- Field arithmetic: ✅ 100%
|
||||
- Scalar arithmetic: ✅ 100%
|
||||
- Context management: ✅ 100%
|
||||
- Group operations: ✅ 100% (optimized Jacobian addition complete)
|
||||
- Public key operations: ✅ 100%
|
||||
|
||||
### Phase 2: ✅ 100% Complete
|
||||
- Hash functions: ✅ 100%
|
||||
- ECDSA signatures: ✅ 100%
|
||||
- Private key operations: ✅ 100%
|
||||
- Key pair generation: ✅ 100%
|
||||
|
||||
### Phase 3: ✅ 100% Complete
|
||||
- ECDH operations: ✅ 100%
|
||||
- Point multiplication: ✅ 100%
|
||||
- HKDF key derivation: ✅ 100%
|
||||
|
||||
### Phase 4: ✅ 100% Complete
|
||||
- Schnorr signatures: ✅ 100%
|
||||
- X-only public keys: ✅ 100%
|
||||
- Keypair operations: ✅ 100%
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Immediate (Phase 1 Completion)
|
||||
✅ Phase 1 is complete! All tests passing.
|
||||
|
||||
### Short-term (Phase 2)
|
||||
✅ Phase 2 is complete! All tests passing.
|
||||
|
||||
### Medium-term (Phase 3)
|
||||
✅ Phase 3 is complete! All tests passing.
|
||||
|
||||
### Long-term (Phase 4)
|
||||
✅ Phase 4 is complete! All tests passing.
|
||||
|
||||
---
|
||||
|
||||
## Files Structure (Complete)
|
||||
|
||||
```
|
||||
p256k1.mleku.dev/
|
||||
├── go.mod, go.sum
|
||||
├── Phase 1 (Complete)
|
||||
│ ├── context.go, context_test.go
|
||||
│ ├── field.go, field_mul.go, field_test.go
|
||||
│ ├── scalar.go, scalar_test.go
|
||||
│ ├── group.go, group_test.go
|
||||
│ ├── pubkey.go, pubkey_test.go
|
||||
│ └── ecmult_gen.go
|
||||
├── Phase 2 (Complete)
|
||||
│ ├── hash.go, hash_test.go
|
||||
│ ├── ecdsa.go, ecdsa_test.go
|
||||
│ ├── eckey.go, eckey_test.go
|
||||
│ ├── ecdsa_bench_test.go
|
||||
│ └── BENCHMARK_RESULTS.md
|
||||
├── Phase 3 (Complete)
|
||||
│ ├── ecdh.go, ecdh_test.go
|
||||
│ └── (ecmult functions included in ecdh.go)
|
||||
└── Phase 4 (Complete)
|
||||
├── schnorr.go, schnorr_test.go
|
||||
└── extrakeys.go, extrakeys_test.go
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: Phase 4 implementation complete, 100% test success. All four phases complete! Schnorr signatures, X-only public keys, and keypair operations all working.
|
||||
**Target**: Complete port of secp256k1 C library to Go with full feature parity
|
||||
@@ -1,184 +0,0 @@
|
||||
# Verification Performance Analysis: NextP256K vs P256K1
|
||||
|
||||
## Summary
|
||||
|
||||
NextP256K's verification is **4.7x faster** than p256k1 (40,017 ns/op vs 186,054 ns/op) because it uses libsecp256k1's highly optimized C implementation, while p256k1 uses a simple binary multiplication algorithm.
|
||||
|
||||
## Root Cause
|
||||
|
||||
The performance bottleneck is in `EcmultConst`, which is used to compute `e*P` during Schnorr verification.
|
||||
|
||||
### Schnorr Verification Algorithm
|
||||
|
||||
```186:289:schnorr.go
|
||||
// SchnorrVerify verifies a Schnorr signature following BIP-340
|
||||
func SchnorrVerify(sig64 []byte, msg32 []byte, xonlyPubkey *XOnlyPubkey) bool {
|
||||
// ... validation ...
|
||||
|
||||
// Compute R = s*G - e*P
|
||||
// First compute s*G
|
||||
var sG GroupElementJacobian
|
||||
EcmultGen(&sG, &s) // Fast: uses optimized precomputed tables
|
||||
|
||||
// Compute e*P where P is the x-only pubkey
|
||||
var eP GroupElementJacobian
|
||||
EcmultConst(&eP, &pk, &e) // Slow: uses simple binary method
|
||||
|
||||
// ... rest of verification ...
|
||||
}
|
||||
```
|
||||
|
||||
### Performance Breakdown
|
||||
|
||||
1. **s*G computation** (`EcmultGen`):
|
||||
- Uses 8-bit byte-based precomputed tables
|
||||
- Highly optimized: ~58,618 ns/op for pubkey derivation
|
||||
- Fast because the generator point G is fixed and precomputed
|
||||
|
||||
2. **e*P computation** (`EcmultConst`):
|
||||
- Uses simple binary method with 256 iterations
|
||||
- Each iteration: double, check bit, potentially add
|
||||
- **This is the bottleneck**
|
||||
|
||||
### Current EcmultConst Implementation
|
||||
|
||||
```10:48:ecdh.go
|
||||
// EcmultConst computes r = q * a using constant-time multiplication
|
||||
// This is a simplified implementation for Phase 3 - can be optimized later
|
||||
func EcmultConst(r *GroupElementJacobian, a *GroupElementAffine, q *Scalar) {
|
||||
// ... edge cases ...
|
||||
|
||||
// Process bits from MSB to LSB
|
||||
for i := 0; i < 256; i++ {
|
||||
if i > 0 {
|
||||
r.double(r)
|
||||
}
|
||||
|
||||
// Get bit i (from MSB)
|
||||
bit := q.getBits(uint(255-i), 1)
|
||||
if bit != 0 {
|
||||
if r.isInfinity() {
|
||||
*r = base
|
||||
} else {
|
||||
r.addVar(r, &base)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Problem:** This performs 256 iterations, each requiring:
|
||||
- One field element doubling operation
|
||||
- One bit extraction
|
||||
- Potentially one point addition
|
||||
|
||||
For verification, this means **256 doublings + up to 256 additions** per verification, which is extremely inefficient.
|
||||
|
||||
## Why NextP256K is Faster
|
||||
|
||||
NextP256K uses libsecp256k1's optimized C implementation (`secp256k1_ecmult_const`) which:
|
||||
|
||||
1. **Uses GLV Endomorphism**:
|
||||
- Splits the scalar into two smaller components using the curve's endomorphism
|
||||
- Computes two smaller multiplications instead of one large one
|
||||
- Reduces the effective bit length from 256 to ~128 bits per component
|
||||
|
||||
2. **Windowed Precomputation**:
|
||||
- Precomputes a table of multiples of the base point
|
||||
- Uses windowed lookups instead of processing bits one at a time
|
||||
- Processes multiple bits per iteration (typically 4-6 bits at a time)
|
||||
|
||||
3. **Signed-Digit Multi-Comb Algorithm**:
|
||||
- Uses a more efficient representation that reduces the number of additions
|
||||
- Minimizes the number of point operations required
|
||||
|
||||
4. **Assembly Optimizations**:
|
||||
- Field arithmetic operations are optimized in assembly
|
||||
- Hand-tuned for specific CPU architectures
|
||||
|
||||
### Reference Implementation
|
||||
|
||||
The C reference shows the complexity:
|
||||
|
||||
```124:268:src/ecmult_const_impl.h
|
||||
static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, const secp256k1_scalar *q) {
|
||||
/* The approach below combines the signed-digit logic from Mike Hamburg's
|
||||
* "Fast and compact elliptic-curve cryptography" (https://eprint.iacr.org/2012/309)
|
||||
* Section 3.3, with the GLV endomorphism.
|
||||
* ... */
|
||||
|
||||
/* Precompute table for base point and lambda * base point */
|
||||
|
||||
/* Process bits in groups using windowed lookups */
|
||||
for (group = ECMULT_CONST_GROUPS - 1; group >= 0; --group) {
|
||||
/* Lookup precomputed points */
|
||||
ECMULT_CONST_TABLE_GET_GE(&t, pre_a, bits1);
|
||||
/* ... */
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Impact
|
||||
|
||||
### Benchmark Results
|
||||
|
||||
| Operation | P256K1 | NextP256K | Speedup |
|
||||
|-----------|--------|-----------|---------|
|
||||
| **Verification** | 186,054 ns/op | 40,017 ns/op | **4.7x** |
|
||||
| Signing | 31,937 ns/op | 52,060 ns/op | 0.6x (slower) |
|
||||
| Pubkey Derivation | 58,618 ns/op | 280,835 ns/op | 0.2x (slower) |
|
||||
|
||||
**Note:** NextP256K is slower for signing and pubkey derivation due to CGO overhead for smaller operations, but much faster for verification because the computation is more complex.
|
||||
|
||||
## Optimization Opportunities
|
||||
|
||||
To improve p256k1's verification performance, `EcmultConst` should be optimized to:
|
||||
|
||||
1. **Implement GLV Endomorphism**:
|
||||
- Split scalar using secp256k1's endomorphism
|
||||
- Compute two smaller multiplications
|
||||
- Combine results
|
||||
|
||||
2. **Add Windowed Precomputation**:
|
||||
- Precompute a table of multiples of the base point
|
||||
- Process bits in groups (windows) instead of individually
|
||||
- Use lookup tables instead of repeated additions
|
||||
|
||||
3. **Consider Variable-Time Optimization**:
|
||||
- For verification (public operation), variable-time algorithms are acceptable
|
||||
- Could use `Ecmult` instead of `EcmultConst` if constant-time isn't required
|
||||
|
||||
4. **Implement Signed-Digit Representation**:
|
||||
- Use signed-digit multi-comb algorithm
|
||||
- Reduce the number of additions required
|
||||
|
||||
## Complexity Comparison
|
||||
|
||||
### Current (Simple Binary Method)
|
||||
- **Operations:** O(256) doublings + O(256) additions (worst case)
|
||||
- **Complexity:** ~256 point operations
|
||||
|
||||
### Optimized (Windowed + GLV)
|
||||
- **Operations:** O(64) doublings + O(16) additions (with window size 4)
|
||||
- **Complexity:** ~80 point operations (4x improvement)
|
||||
|
||||
### With Assembly Optimizations
|
||||
- **Additional:** 2-3x speedup from optimized field arithmetic
|
||||
- **Total:** ~10-15x faster than simple binary method
|
||||
|
||||
## Conclusion
|
||||
|
||||
The 4.7x performance difference is primarily due to:
|
||||
1. **Algorithmic efficiency**: Windowed multiplication vs. simple binary method
|
||||
2. **GLV endomorphism**: Splitting scalar into smaller components
|
||||
3. **Assembly optimizations**: Hand-tuned field arithmetic in C
|
||||
4. **Better memory access patterns**: Precomputed tables vs. repeated computations
|
||||
|
||||
The optimization is non-trivial and would require implementing:
|
||||
- GLV endomorphism support
|
||||
- Windowed precomputation tables
|
||||
- Signed-digit multi-comb algorithm
|
||||
- Potentially assembly optimizations for field arithmetic
|
||||
|
||||
For now, NextP256K's advantage in verification is expected given its use of the mature, highly optimized libsecp256k1 C library.
|
||||
|
||||
@@ -1,107 +0,0 @@
|
||||
# Verify Function Performance Analysis: C vs Go
|
||||
|
||||
## Key Finding: The C Version Uses Strauss-WNAF Algorithm
|
||||
|
||||
The C implementation of `secp256k1_schnorrsig_verify` uses a **highly optimized Strauss-WNAF algorithm** that computes `r = s*G + (-e)*P` in a **single interleaved operation** rather than two separate multiplications.
|
||||
|
||||
## Current Go Implementation (verify.go:692-722)
|
||||
|
||||
```go
|
||||
func secp256k1_ecmult(r *secp256k1_gej, a *secp256k1_gej, na *secp256k1_scalar, ng *secp256k1_scalar) {
|
||||
// r = na * a + ng * G
|
||||
// First compute na * a
|
||||
var naa GroupElementJacobian
|
||||
Ecmult(&naa, &geja, &sna) // ~43 iterations (6-bit windows)
|
||||
|
||||
// Then compute ng * G
|
||||
var ngg GroupElementJacobian
|
||||
EcmultGen(&ngg, &sng) // ~32 iterations (byte-based)
|
||||
|
||||
// Add them together
|
||||
gejr.addVar(&naa, &ngg)
|
||||
}
|
||||
```
|
||||
|
||||
**Performance**: ~75 iterations total (43 + 32), plus one addition
|
||||
|
||||
## C Implementation (src/ecmult_impl.h:321-342)
|
||||
|
||||
```c
|
||||
for (i = bits - 1; i >= 0; i--) {
|
||||
secp256k1_gej_double_var(r, r, NULL); // ONE doubling per iteration
|
||||
// Check na*a contribution
|
||||
if (i < bits_na_1 && (n = wnaf_na_1[i])) {
|
||||
secp256k1_ecmult_table_get_ge(&tmpa, pre_a, n, WINDOW_A);
|
||||
secp256k1_gej_add_ge_var(r, r, &tmpa, NULL);
|
||||
}
|
||||
// Check ng*G contribution
|
||||
if (i < bits_ng_1 && (n = wnaf_ng_1[i])) {
|
||||
secp256k1_ecmult_table_get_ge_storage(&tmpa, secp256k1_pre_g, n, WINDOW_G);
|
||||
secp256k1_gej_add_zinv_var(r, r, &tmpa, &Z);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Performance**: ~129 iterations total (max bits needed), with interleaved additions
|
||||
|
||||
## Why C is Faster
|
||||
|
||||
### 1. **Interleaved Operations**
|
||||
- **C**: Processes both scalars bit-by-bit in ONE loop
|
||||
- Each iteration: double once, then potentially add from either table
|
||||
- Total: ~129 iterations (the maximum bits needed)
|
||||
|
||||
- **Go**: Computes two separate multiplications
|
||||
- `na*a`: ~43 iterations (6-bit windows)
|
||||
- `ng*G`: ~32 iterations (byte-based)
|
||||
- Total: ~75 iterations PLUS one final addition
|
||||
|
||||
### 2. **GLV Endomorphism Optimization**
|
||||
The C version uses scalar splitting with lambda endomorphism:
|
||||
- Splits `na` into `na_1` and `na_lam` (~128 bits each)
|
||||
- Uses precomputed lambda table for faster operations
|
||||
- Reduces effective scalar size from 256 bits to ~128 bits
|
||||
|
||||
### 3. **WNAF (Windowed Non-Adjacent Form)**
|
||||
- Sparse representation: non-zero entries separated by at least (w-1) zeroes
|
||||
- Reduces number of additions needed
|
||||
- Uses signed digits: can subtract instead of just add
|
||||
|
||||
### 4. **Precomputed Tables**
|
||||
- C uses optimized precomputed tables for both `a` and `G`
|
||||
- Uses isomorphic curve representation for faster affine additions
|
||||
- Stores points in optimized storage format
|
||||
|
||||
### 5. **Fewer Doublings**
|
||||
- **C**: ~129 doublings (one per bit position)
|
||||
- **Go**: ~43 doublings for `na*a` + ~32 doublings for `ng*G` = ~75 doublings
|
||||
- But C also does fewer additions due to WNAF sparsity
|
||||
|
||||
## Performance Impact
|
||||
|
||||
The C version is ~3-4x faster because:
|
||||
1. **Single loop**: Processes everything in one pass (~129 iterations vs ~75+1)
|
||||
2. **Sparse operations**: WNAF reduces additions (maybe 20-30 additions vs 32+)
|
||||
3. **Optimized tables**: Precomputed tables with isomorphic curve optimization
|
||||
4. **Better cache locality**: Everything in one loop, better CPU cache usage
|
||||
|
||||
## Recommendation
|
||||
|
||||
To match C performance, implement the Strauss-WNAF algorithm in Go:
|
||||
1. Implement WNAF conversion for scalars
|
||||
2. Implement GLV endomorphism scalar splitting
|
||||
3. Implement interleaved multiplication loop
|
||||
4. Use precomputed tables with isomorphic curve optimization
|
||||
5. This will require implementing several missing functions:
|
||||
- `secp256k1_scalar_split_lambda`
|
||||
- `secp256k1_scalar_split_128`
|
||||
- `secp256k1_ecmult_wnaf`
|
||||
- `secp256k1_ecmult_odd_multiples_table`
|
||||
- `secp256k1_ge_table_set_globalz`
|
||||
- `secp256k1_ecmult_table_get_ge`
|
||||
- `secp256k1_ecmult_table_get_ge_lambda`
|
||||
- `secp256k1_ecmult_table_get_ge_storage`
|
||||
- And the GLV lambda constant/endomorphism functions
|
||||
|
||||
This is a significant optimization that would bring Go performance much closer to C.
|
||||
|
||||
Binary file not shown.
Reference in New Issue
Block a user