This commit is contained in:
2025-11-02 03:33:02 +00:00
parent 47244a371b
commit 106349d6eb
5 changed files with 0 additions and 666 deletions

View File

@@ -1,375 +0,0 @@
# Four-Phase Implementation Plan - secp256k1 Go Port
## Overview
This document outlines the complete four-phase implementation plan for porting the secp256k1 cryptographic library from C to Go. The implementation follows the C reference implementation exactly, ensuring mathematical correctness and compatibility.
---
## Phase 1: Core Infrastructure & Mathematical Primitives ✅
### Status: **100% Complete** (25/25 tests passing)
### Objectives
Establish the mathematical foundation and core infrastructure for all cryptographic operations.
### Completed Components
#### 1. **Field Element Operations** ✅
- **File**: `field.go`, `field_mul.go`, `field_test.go`
- **Status**: 100% complete (9/9 tests passing)
- **Key Features**:
- Field arithmetic (addition, subtraction, multiplication, squaring)
- Field normalization and reduction
- Field inverse computation (Fermat's little theorem)
- Field square root computation
- 512-bit to 256-bit modular reduction (matches C reference exactly)
- Constant-time operations where required
- Secure memory clearing
#### 2. **Scalar Operations** ✅
- **File**: `scalar.go`, `scalar_test.go`
- **Status**: 100% complete (11/11 tests passing)
- **Key Features**:
- Scalar arithmetic (addition, subtraction, multiplication)
- Scalar modular inverse
- Scalar exponentiation
- Scalar halving
- 512-bit to 256-bit modular reduction (three-stage reduction from C)
- Private key validation
- Constant-time conditional operations
#### 3. **Context Management** ✅
- **File**: `context.go`, `context_test.go`
- **Status**: 100% complete (5/5 tests passing)
- **Key Features**:
- Context creation with capability flags (signing/verification)
- Context destruction and cleanup
- Context randomization for side-channel protection
- Static verification-only context
- Capability checking
#### 4. **Group Operations** ✅
- **File**: `group.go`, `group_test.go`
- **Status**: 100% complete (4/4 tests passing)
- **Key Features**:
- `GroupElementAffine` and `GroupElementJacobian` types
- Affine coordinate operations (complete)
- Jacobian coordinate operations (optimized)
- Point doubling (`double`) - C reference implementation
- Point addition in Jacobian coordinates (`addVar`) - C reference implementation (~78x faster)
- Point addition with affine input (`addGE`) - C reference implementation (optimized)
- Coordinate conversion (affine ↔ Jacobian)
- Generator point initialization
- Storage format conversion
- Field element `normalizesToZeroVar` helper for efficient point comparison
#### 5. **Public Key Operations** ✅
- **File**: `pubkey.go`, `pubkey_test.go`
- **Status**: 100% complete (4/4 tests passing)
- **Key Features**:
- `PublicKey` type with 64-byte internal representation
- Public key parsing (compressed/uncompressed)
- Public key serialization
- Public key comparison (working)
- Public key creation from private key (scalar multiplication working)
#### 6. **Generator Multiplication** ✅
- **File**: `ecmult_gen.go`
- **Status**: Infrastructure complete
- **Key Features**:
- `EcmultGenContext` for precomputed tables
- `EcmultGen` function for `n * G` computation
- Binary method implementation (ready for optimization)
### Remaining Issues
None - Phase 1 is complete! ✅
### Test Coverage
- **Total Tests**: 25 test functions
- **Passing**: 25 tests ✅
- **Failing**: 0 tests ✅
- **Success Rate**: 100%
### Files Created
```
├── context.go ✅ Context management (COMPLETE)
├── context_test.go ✅ Context tests (ALL PASSING)
├── field.go ✅ Field arithmetic (COMPLETE)
├── field_mul.go ✅ Field multiplication/operations (COMPLETE)
├── field_test.go ✅ Field tests (ALL PASSING)
├── scalar.go ✅ Scalar arithmetic (COMPLETE)
├── scalar_test.go ✅ Scalar tests (ALL PASSING)
├── group.go ✅ Group operations (COMPLETE)
├── group_test.go ✅ Group tests (ALL PASSING)
├── ecmult_gen.go ✅ Generator multiplication (INFRASTRUCTURE)
├── pubkey.go ✅ Public key operations (COMPLETE)
└── pubkey_test.go ✅ Public key tests (ALL PASSING)
```
---
## Phase 2: ECDSA Signatures & Hash Functions ✅
### Status: **100% Complete**
### Objectives
Implement ECDSA signature creation and verification, along with cryptographic hash functions.
### Planned Components
#### 1. **Hash Functions** ✅
- **Files**: `hash.go`, `hash_test.go`
- **Status**: 100% complete
- **Key Features**:
- SHA-256 implementation (using sha256-simd)
- Tagged SHA-256 (BIP-340 style)
- RFC6979 nonce generation (deterministic signing)
- HMAC-SHA256 implementation
- Hash-to-field element conversion
- Hash-to-scalar conversion
- Message hashing utilities
#### 2. **ECDSA Signatures** ✅
- **Files**: `ecdsa.go`, `ecdsa_test.go`
- **Status**: 100% complete
- **Key Features**:
- `ECDSASign` - Create signatures from message hash and private key
- `ECDSAVerify` - Verify signatures against message hash and public key
- Compact signature format (64-byte)
- Signature normalization (low-S)
- RFC6979 deterministic nonce generation
#### 3. **Private Key Operations** ✅
- **Files**: `eckey.go`, `eckey_test.go`
- **Status**: 100% complete
- **Key Features**:
- Private key generation (`ECSeckeyGenerate`)
- Private key validation (`ECSeckeyVerify`)
- Private key negation (`ECSeckeyNegate`)
- Key pair generation (`ECKeyPairGenerate`)
- Key tweaking (add/multiply) for BIP32-style derivation
- Public key tweaking (add/multiply)
#### 4. **Benchmarks**
- **Files**: `ecdsa_bench_test.go`, `BENCHMARK_RESULTS.md`
- **Features**:
- Signing performance benchmarks ✅
- Verification performance benchmarks ✅
- Hash function benchmarks ✅
- Key generation benchmarks ✅
- Comparison with C implementation ✅
- Memory usage profiling ✅
- Comprehensive benchmark results document ✅
### Dependencies
- ✅ Phase 1: Field arithmetic, scalar arithmetic, group operations
- ✅ Point doubling algorithm working correctly
- ✅ Scalar multiplication working correctly
### Success Criteria
- [x] All ECDSA signing tests pass ✅
- [x] All ECDSA verification tests pass ✅
- [x] Hash functions match reference implementation ✅
- [x] RFC6979 nonce generation produces correct results ✅
- [x] Performance benchmarks implemented and documented ✅
- Signing: ~5ms/op (2-3x slower than C, acceptable for production)
- Verification: ~10ms/op (2-3x slower than C, zero allocations)
- Full benchmark suite: 17 benchmarks covering all operations
---
## Phase 3: ECDH Key Exchange ✅
### Status: **100% Complete**
### Objectives
Implement Elliptic Curve Diffie-Hellman key exchange for secure key derivation.
### Completed Components
#### 1. **ECDH Operations** ✅
- **Files**: `ecdh.go`, `ecdh_test.go`
- **Status**: 100% complete
- **Key Features**:
- `ECDH` - Compute shared secret from private key and public key ✅
- `ECDHWithHKDF` - ECDH with HKDF key derivation ✅
- `ECDHXOnly` - X-only ECDH (BIP-340 style) ✅
- Custom hash function support ✅
- Secure memory clearing ✅
#### 2. **Advanced Point Multiplication** ✅
- **Files**: `ecdh.go` (includes EcmultConst and Ecmult)
- **Status**: 100% complete
- **Key Features**:
- `EcmultConst` - Constant-time multiplication for arbitrary points ✅
- `Ecmult` - Variable-time optimized multiplication ✅
- Binary method implementation (ready for further optimization) ✅
#### 3. **HKDF Support** ✅
- **Files**: `ecdh.go`
- **Status**: 100% complete
- **Key Features**:
- `HKDF` - HMAC-based Key Derivation Function (RFC 5869) ✅
- Extract and Expand phases ✅
- Supports arbitrary output length ✅
- Secure memory clearing ✅
### Dependencies
- ✅ Phase 1: Group operations, scalar multiplication
- ✅ Phase 2: Hash functions (for HKDF)
### Success Criteria
- [x] ECDH computes correct shared secrets ✅
- [x] X-only ECDH matches reference implementation ✅
- [x] HKDF key derivation works correctly ✅
- [x] All ECDH tests pass ✅
---
## Phase 4: Schnorr Signatures & Advanced Features ✅
### Status: **100% Complete**
### Objectives
Implement BIP-340 Schnorr signatures and advanced cryptographic features.
### Completed Components
#### 1. **Schnorr Signatures** ✅
- **Files**: `schnorr.go`, `schnorr_test.go`
- **Status**: 100% complete
- **Key Features**:
- `SchnorrSign` - Create BIP-340 compliant signatures ✅
- `SchnorrVerify` - Verify BIP-340 signatures ✅
- `NonceFunctionBIP340` - BIP-340 nonce generation ✅
- Tagged hash support (BIP-340 style) ✅
- Auxiliary randomness support ✅
- Secure memory clearing ✅
#### 2. **Extended Public Keys** ✅
- **Files**: `extrakeys.go`, `extrakeys_test.go`
- **Status**: 100% complete
- **Key Features**:
- `XOnlyPubkey` type (32-byte X coordinate) ✅
- `KeyPair` type for Schnorr signatures ✅
- `XOnlyPubkeyParse` - Parse x-only public keys ✅
- `XOnlyPubkeyFromPubkey` - Convert full pubkey to x-only ✅
- `XOnlyPubkeyCmp` - Compare x-only public keys ✅
- `KeyPairCreate` - Create keypair from secret key ✅
- `KeyPairGenerate` - Generate random keypair ✅
- Public key parity extraction ✅
### Dependencies
- ✅ Phase 1: Complete core infrastructure
- ✅ Phase 2: Hash functions (TaggedHash already implemented)
- ✅ Phase 3: ECDH, optimized multiplication
### Success Criteria
- [x] Schnorr signatures match BIP-340 specification ✅
- [x] All Schnorr signature tests pass ✅
- [x] X-only public keys work correctly ✅
- [x] Keypair operations work correctly ✅
- [x] All Phase 4 tests pass ✅
---
## Overall Implementation Strategy
### Principles
1. **Exact C Reference**: Follow C implementation algorithms exactly
2. **Test-Driven**: Write comprehensive tests for each component
3. **Incremental**: Complete each phase before moving to next
4. **Performance**: Optimize where possible without sacrificing correctness
5. **Go Idioms**: Use Go's type system and error handling appropriately
### Testing Strategy
- **Unit Tests**: Every function has dedicated tests
- **Integration Tests**: End-to-end operation tests
- **Property Tests**: Cryptographic property verification
- **Benchmarks**: Performance measurement and comparison
- **Edge Cases**: Boundary condition testing
### Code Quality
- **Documentation**: Comprehensive comments matching C reference
- **Type Safety**: Strong typing throughout
- **Error Handling**: Proper error propagation
- **Memory Safety**: Secure memory clearing
- **Constant-Time**: Where required for security
---
## Current Status Summary
### Phase 1: ✅ 100% Complete
- Field arithmetic: ✅ 100%
- Scalar arithmetic: ✅ 100%
- Context management: ✅ 100%
- Group operations: ✅ 100% (optimized Jacobian addition complete)
- Public key operations: ✅ 100%
### Phase 2: ✅ 100% Complete
- Hash functions: ✅ 100%
- ECDSA signatures: ✅ 100%
- Private key operations: ✅ 100%
- Key pair generation: ✅ 100%
### Phase 3: ✅ 100% Complete
- ECDH operations: ✅ 100%
- Point multiplication: ✅ 100%
- HKDF key derivation: ✅ 100%
### Phase 4: ✅ 100% Complete
- Schnorr signatures: ✅ 100%
- X-only public keys: ✅ 100%
- Keypair operations: ✅ 100%
---
## Next Steps
### Immediate (Phase 1 Completion)
✅ Phase 1 is complete! All tests passing.
### Short-term (Phase 2)
✅ Phase 2 is complete! All tests passing.
### Medium-term (Phase 3)
✅ Phase 3 is complete! All tests passing.
### Long-term (Phase 4)
✅ Phase 4 is complete! All tests passing.
---
## Files Structure (Complete)
```
p256k1.mleku.dev/
├── go.mod, go.sum
├── Phase 1 (Complete)
│ ├── context.go, context_test.go
│ ├── field.go, field_mul.go, field_test.go
│ ├── scalar.go, scalar_test.go
│ ├── group.go, group_test.go
│ ├── pubkey.go, pubkey_test.go
│ └── ecmult_gen.go
├── Phase 2 (Complete)
│ ├── hash.go, hash_test.go
│ ├── ecdsa.go, ecdsa_test.go
│ ├── eckey.go, eckey_test.go
│ ├── ecdsa_bench_test.go
│ └── BENCHMARK_RESULTS.md
├── Phase 3 (Complete)
│ ├── ecdh.go, ecdh_test.go
│ └── (ecmult functions included in ecdh.go)
└── Phase 4 (Complete)
├── schnorr.go, schnorr_test.go
└── extrakeys.go, extrakeys_test.go
```
---
**Last Updated**: Phase 4 implementation complete, 100% test success. All four phases complete! Schnorr signatures, X-only public keys, and keypair operations all working.
**Target**: Complete port of secp256k1 C library to Go with full feature parity

View File

@@ -1,184 +0,0 @@
# Verification Performance Analysis: NextP256K vs P256K1
## Summary
NextP256K's verification is **4.7x faster** than p256k1 (40,017 ns/op vs 186,054 ns/op) because it uses libsecp256k1's highly optimized C implementation, while p256k1 uses a simple binary multiplication algorithm.
## Root Cause
The performance bottleneck is in `EcmultConst`, which is used to compute `e*P` during Schnorr verification.
### Schnorr Verification Algorithm
```186:289:schnorr.go
// SchnorrVerify verifies a Schnorr signature following BIP-340
func SchnorrVerify(sig64 []byte, msg32 []byte, xonlyPubkey *XOnlyPubkey) bool {
// ... validation ...
// Compute R = s*G - e*P
// First compute s*G
var sG GroupElementJacobian
EcmultGen(&sG, &s) // Fast: uses optimized precomputed tables
// Compute e*P where P is the x-only pubkey
var eP GroupElementJacobian
EcmultConst(&eP, &pk, &e) // Slow: uses simple binary method
// ... rest of verification ...
}
```
### Performance Breakdown
1. **s*G computation** (`EcmultGen`):
- Uses 8-bit byte-based precomputed tables
- Highly optimized: ~58,618 ns/op for pubkey derivation
- Fast because the generator point G is fixed and precomputed
2. **e*P computation** (`EcmultConst`):
- Uses simple binary method with 256 iterations
- Each iteration: double, check bit, potentially add
- **This is the bottleneck**
### Current EcmultConst Implementation
```10:48:ecdh.go
// EcmultConst computes r = q * a using constant-time multiplication
// This is a simplified implementation for Phase 3 - can be optimized later
func EcmultConst(r *GroupElementJacobian, a *GroupElementAffine, q *Scalar) {
// ... edge cases ...
// Process bits from MSB to LSB
for i := 0; i < 256; i++ {
if i > 0 {
r.double(r)
}
// Get bit i (from MSB)
bit := q.getBits(uint(255-i), 1)
if bit != 0 {
if r.isInfinity() {
*r = base
} else {
r.addVar(r, &base)
}
}
}
}
```
**Problem:** This performs 256 iterations, each requiring:
- One field element doubling operation
- One bit extraction
- Potentially one point addition
For verification, this means **256 doublings + up to 256 additions** per verification, which is extremely inefficient.
## Why NextP256K is Faster
NextP256K uses libsecp256k1's optimized C implementation (`secp256k1_ecmult_const`) which:
1. **Uses GLV Endomorphism**:
- Splits the scalar into two smaller components using the curve's endomorphism
- Computes two smaller multiplications instead of one large one
- Reduces the effective bit length from 256 to ~128 bits per component
2. **Windowed Precomputation**:
- Precomputes a table of multiples of the base point
- Uses windowed lookups instead of processing bits one at a time
- Processes multiple bits per iteration (typically 4-6 bits at a time)
3. **Signed-Digit Multi-Comb Algorithm**:
- Uses a more efficient representation that reduces the number of additions
- Minimizes the number of point operations required
4. **Assembly Optimizations**:
- Field arithmetic operations are optimized in assembly
- Hand-tuned for specific CPU architectures
### Reference Implementation
The C reference shows the complexity:
```124:268:src/ecmult_const_impl.h
static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, const secp256k1_scalar *q) {
/* The approach below combines the signed-digit logic from Mike Hamburg's
* "Fast and compact elliptic-curve cryptography" (https://eprint.iacr.org/2012/309)
* Section 3.3, with the GLV endomorphism.
* ... */
/* Precompute table for base point and lambda * base point */
/* Process bits in groups using windowed lookups */
for (group = ECMULT_CONST_GROUPS - 1; group >= 0; --group) {
/* Lookup precomputed points */
ECMULT_CONST_TABLE_GET_GE(&t, pre_a, bits1);
/* ... */
}
}
```
## Performance Impact
### Benchmark Results
| Operation | P256K1 | NextP256K | Speedup |
|-----------|--------|-----------|---------|
| **Verification** | 186,054 ns/op | 40,017 ns/op | **4.7x** |
| Signing | 31,937 ns/op | 52,060 ns/op | 0.6x (slower) |
| Pubkey Derivation | 58,618 ns/op | 280,835 ns/op | 0.2x (slower) |
**Note:** NextP256K is slower for signing and pubkey derivation due to CGO overhead for smaller operations, but much faster for verification because the computation is more complex.
## Optimization Opportunities
To improve p256k1's verification performance, `EcmultConst` should be optimized to:
1. **Implement GLV Endomorphism**:
- Split scalar using secp256k1's endomorphism
- Compute two smaller multiplications
- Combine results
2. **Add Windowed Precomputation**:
- Precompute a table of multiples of the base point
- Process bits in groups (windows) instead of individually
- Use lookup tables instead of repeated additions
3. **Consider Variable-Time Optimization**:
- For verification (public operation), variable-time algorithms are acceptable
- Could use `Ecmult` instead of `EcmultConst` if constant-time isn't required
4. **Implement Signed-Digit Representation**:
- Use signed-digit multi-comb algorithm
- Reduce the number of additions required
## Complexity Comparison
### Current (Simple Binary Method)
- **Operations:** O(256) doublings + O(256) additions (worst case)
- **Complexity:** ~256 point operations
### Optimized (Windowed + GLV)
- **Operations:** O(64) doublings + O(16) additions (with window size 4)
- **Complexity:** ~80 point operations (4x improvement)
### With Assembly Optimizations
- **Additional:** 2-3x speedup from optimized field arithmetic
- **Total:** ~10-15x faster than simple binary method
## Conclusion
The 4.7x performance difference is primarily due to:
1. **Algorithmic efficiency**: Windowed multiplication vs. simple binary method
2. **GLV endomorphism**: Splitting scalar into smaller components
3. **Assembly optimizations**: Hand-tuned field arithmetic in C
4. **Better memory access patterns**: Precomputed tables vs. repeated computations
The optimization is non-trivial and would require implementing:
- GLV endomorphism support
- Windowed precomputation tables
- Signed-digit multi-comb algorithm
- Potentially assembly optimizations for field arithmetic
For now, NextP256K's advantage in verification is expected given its use of the mature, highly optimized libsecp256k1 C library.

View File

@@ -1,107 +0,0 @@
# Verify Function Performance Analysis: C vs Go
## Key Finding: The C Version Uses Strauss-WNAF Algorithm
The C implementation of `secp256k1_schnorrsig_verify` uses a **highly optimized Strauss-WNAF algorithm** that computes `r = s*G + (-e)*P` in a **single interleaved operation** rather than two separate multiplications.
## Current Go Implementation (verify.go:692-722)
```go
func secp256k1_ecmult(r *secp256k1_gej, a *secp256k1_gej, na *secp256k1_scalar, ng *secp256k1_scalar) {
// r = na * a + ng * G
// First compute na * a
var naa GroupElementJacobian
Ecmult(&naa, &geja, &sna) // ~43 iterations (6-bit windows)
// Then compute ng * G
var ngg GroupElementJacobian
EcmultGen(&ngg, &sng) // ~32 iterations (byte-based)
// Add them together
gejr.addVar(&naa, &ngg)
}
```
**Performance**: ~75 iterations total (43 + 32), plus one addition
## C Implementation (src/ecmult_impl.h:321-342)
```c
for (i = bits - 1; i >= 0; i--) {
secp256k1_gej_double_var(r, r, NULL); // ONE doubling per iteration
// Check na*a contribution
if (i < bits_na_1 && (n = wnaf_na_1[i])) {
secp256k1_ecmult_table_get_ge(&tmpa, pre_a, n, WINDOW_A);
secp256k1_gej_add_ge_var(r, r, &tmpa, NULL);
}
// Check ng*G contribution
if (i < bits_ng_1 && (n = wnaf_ng_1[i])) {
secp256k1_ecmult_table_get_ge_storage(&tmpa, secp256k1_pre_g, n, WINDOW_G);
secp256k1_gej_add_zinv_var(r, r, &tmpa, &Z);
}
}
```
**Performance**: ~129 iterations total (max bits needed), with interleaved additions
## Why C is Faster
### 1. **Interleaved Operations**
- **C**: Processes both scalars bit-by-bit in ONE loop
- Each iteration: double once, then potentially add from either table
- Total: ~129 iterations (the maximum bits needed)
- **Go**: Computes two separate multiplications
- `na*a`: ~43 iterations (6-bit windows)
- `ng*G`: ~32 iterations (byte-based)
- Total: ~75 iterations PLUS one final addition
### 2. **GLV Endomorphism Optimization**
The C version uses scalar splitting with lambda endomorphism:
- Splits `na` into `na_1` and `na_lam` (~128 bits each)
- Uses precomputed lambda table for faster operations
- Reduces effective scalar size from 256 bits to ~128 bits
### 3. **WNAF (Windowed Non-Adjacent Form)**
- Sparse representation: non-zero entries separated by at least (w-1) zeroes
- Reduces number of additions needed
- Uses signed digits: can subtract instead of just add
### 4. **Precomputed Tables**
- C uses optimized precomputed tables for both `a` and `G`
- Uses isomorphic curve representation for faster affine additions
- Stores points in optimized storage format
### 5. **Fewer Doublings**
- **C**: ~129 doublings (one per bit position)
- **Go**: ~43 doublings for `na*a` + ~32 doublings for `ng*G` = ~75 doublings
- But C also does fewer additions due to WNAF sparsity
## Performance Impact
The C version is ~3-4x faster because:
1. **Single loop**: Processes everything in one pass (~129 iterations vs ~75+1)
2. **Sparse operations**: WNAF reduces additions (maybe 20-30 additions vs 32+)
3. **Optimized tables**: Precomputed tables with isomorphic curve optimization
4. **Better cache locality**: Everything in one loop, better CPU cache usage
## Recommendation
To match C performance, implement the Strauss-WNAF algorithm in Go:
1. Implement WNAF conversion for scalars
2. Implement GLV endomorphism scalar splitting
3. Implement interleaved multiplication loop
4. Use precomputed tables with isomorphic curve optimization
5. This will require implementing several missing functions:
- `secp256k1_scalar_split_lambda`
- `secp256k1_scalar_split_128`
- `secp256k1_ecmult_wnaf`
- `secp256k1_ecmult_odd_multiples_table`
- `secp256k1_ge_table_set_globalz`
- `secp256k1_ecmult_table_get_ge`
- `secp256k1_ecmult_table_get_ge_lambda`
- `secp256k1_ecmult_table_get_ge_storage`
- And the GLV lambda constant/endomorphism functions
This is a significant optimization that would bring Go performance much closer to C.

BIN
mem.prof

Binary file not shown.

Binary file not shown.