cleanup

2025-11-02 03:33:02 +00:00
parent 47244a371b
commit 106349d6eb
5 changed files with 0 additions and 666 deletions
--- a/FOUR_PHASES_SUMMARY.md
+++ b/FOUR_PHASES_SUMMARY.md
@@ -1,375 +0,0 @@
-# Four-Phase Implementation Plan - secp256k1 Go Port
-
-## Overview
-
-This document outlines the complete four-phase implementation plan for porting the secp256k1 cryptographic library from C to Go. The implementation follows the C reference implementation exactly, ensuring mathematical correctness and compatibility.
-
---
-
-## Phase 1: Core Infrastructure & Mathematical Primitives ✅
-
-### Status: **100% Complete** (25/25 tests passing)
-
-### Objectives
-Establish the mathematical foundation and core infrastructure for all cryptographic operations.
-
-### Completed Components
-
-#### 1. **Field Element Operations** ✅
- **File**: `field.go`, `field_mul.go`, `field_test.go`
- **Status**: 100% complete (9/9 tests passing)
- **Key Features**:
-  - Field arithmetic (addition, subtraction, multiplication, squaring)
-  - Field normalization and reduction
-  - Field inverse computation (Fermat's little theorem)
-  - Field square root computation
-  - 512-bit to 256-bit modular reduction (matches C reference exactly)
-  - Constant-time operations where required
-  - Secure memory clearing
-
-#### 2. **Scalar Operations** ✅
- **File**: `scalar.go`, `scalar_test.go`
- **Status**: 100% complete (11/11 tests passing)
- **Key Features**:
-  - Scalar arithmetic (addition, subtraction, multiplication)
-  - Scalar modular inverse
-  - Scalar exponentiation
-  - Scalar halving
-  - 512-bit to 256-bit modular reduction (three-stage reduction from C)
-  - Private key validation
-  - Constant-time conditional operations
-
-#### 3. **Context Management** ✅
- **File**: `context.go`, `context_test.go`
- **Status**: 100% complete (5/5 tests passing)
- **Key Features**:
-  - Context creation with capability flags (signing/verification)
-  - Context destruction and cleanup
-  - Context randomization for side-channel protection
-  - Static verification-only context
-  - Capability checking
-
-#### 4. **Group Operations** ✅
- **File**: `group.go`, `group_test.go`
- **Status**: 100% complete (4/4 tests passing)
- **Key Features**:
-  - `GroupElementAffine` and `GroupElementJacobian` types
-  - Affine coordinate operations (complete)
-  - Jacobian coordinate operations (optimized)
-  - Point doubling (`double`) - C reference implementation
-  - Point addition in Jacobian coordinates (`addVar`) - C reference implementation (~78x faster)
-  - Point addition with affine input (`addGE`) - C reference implementation (optimized)
-  - Coordinate conversion (affine ↔ Jacobian)
-  - Generator point initialization
-  - Storage format conversion
-  - Field element `normalizesToZeroVar` helper for efficient point comparison
-
-#### 5. **Public Key Operations** ✅
- **File**: `pubkey.go`, `pubkey_test.go`
- **Status**: 100% complete (4/4 tests passing)
- **Key Features**:
-  - `PublicKey` type with 64-byte internal representation
-  - Public key parsing (compressed/uncompressed)
-  - Public key serialization
-  - Public key comparison (working)
-  - Public key creation from private key (scalar multiplication working)
-
-#### 6. **Generator Multiplication** ✅
- **File**: `ecmult_gen.go`
- **Status**: Infrastructure complete
- **Key Features**:
-  - `EcmultGenContext` for precomputed tables
-  - `EcmultGen` function for `n * G` computation
-  - Binary method implementation (ready for optimization)
-
-### Remaining Issues
-
-None - Phase 1 is complete! ✅
-
-### Test Coverage
- **Total Tests**: 25 test functions
- **Passing**: 25 tests ✅
- **Failing**: 0 tests ✅
- **Success Rate**: 100%
-
-### Files Created
-```
-├── context.go          ✅ Context management (COMPLETE)
-├── context_test.go     ✅ Context tests (ALL PASSING)
-├── field.go            ✅ Field arithmetic (COMPLETE)
-├── field_mul.go        ✅ Field multiplication/operations (COMPLETE)
-├── field_test.go       ✅ Field tests (ALL PASSING)
-├── scalar.go           ✅ Scalar arithmetic (COMPLETE)
-├── scalar_test.go      ✅ Scalar tests (ALL PASSING)
-├── group.go            ✅ Group operations (COMPLETE)
-├── group_test.go       ✅ Group tests (ALL PASSING)
-├── ecmult_gen.go       ✅ Generator multiplication (INFRASTRUCTURE)
-├── pubkey.go           ✅ Public key operations (COMPLETE)
-└── pubkey_test.go      ✅ Public key tests (ALL PASSING)
-```
-
---
-
-## Phase 2: ECDSA Signatures & Hash Functions ✅
-
-### Status: **100% Complete**
-
-### Objectives
-Implement ECDSA signature creation and verification, along with cryptographic hash functions.
-
-### Planned Components
-
-#### 1. **Hash Functions** ✅
- **Files**: `hash.go`, `hash_test.go`
- **Status**: 100% complete
- **Key Features**:
-  - SHA-256 implementation (using sha256-simd)
-  - Tagged SHA-256 (BIP-340 style)
-  - RFC6979 nonce generation (deterministic signing)
-  - HMAC-SHA256 implementation
-  - Hash-to-field element conversion
-  - Hash-to-scalar conversion
-  - Message hashing utilities
-
-#### 2. **ECDSA Signatures** ✅
- **Files**: `ecdsa.go`, `ecdsa_test.go`
- **Status**: 100% complete
- **Key Features**:
-  - `ECDSASign` - Create signatures from message hash and private key
-  - `ECDSAVerify` - Verify signatures against message hash and public key
-  - Compact signature format (64-byte)
-  - Signature normalization (low-S)
-  - RFC6979 deterministic nonce generation
-
-#### 3. **Private Key Operations** ✅
- **Files**: `eckey.go`, `eckey_test.go`
- **Status**: 100% complete
- **Key Features**:
-  - Private key generation (`ECSeckeyGenerate`)
-  - Private key validation (`ECSeckeyVerify`)
-  - Private key negation (`ECSeckeyNegate`)
-  - Key pair generation (`ECKeyPairGenerate`)
-  - Key tweaking (add/multiply) for BIP32-style derivation
-  - Public key tweaking (add/multiply)
-
-#### 4. **Benchmarks**
- **Files**: `ecdsa_bench_test.go`, `BENCHMARK_RESULTS.md`
- **Features**:
-  - Signing performance benchmarks ✅
-  - Verification performance benchmarks ✅
-  - Hash function benchmarks ✅
-  - Key generation benchmarks ✅
-  - Comparison with C implementation ✅
-  - Memory usage profiling ✅
-  - Comprehensive benchmark results document ✅
-
-### Dependencies
- ✅ Phase 1: Field arithmetic, scalar arithmetic, group operations
- ✅ Point doubling algorithm working correctly
- ✅ Scalar multiplication working correctly
-
-### Success Criteria
- [x] All ECDSA signing tests pass ✅
- [x] All ECDSA verification tests pass ✅
- [x] Hash functions match reference implementation ✅
- [x] RFC6979 nonce generation produces correct results ✅
- [x] Performance benchmarks implemented and documented ✅
-  - Signing: ~5ms/op (2-3x slower than C, acceptable for production)
-  - Verification: ~10ms/op (2-3x slower than C, zero allocations)
-  - Full benchmark suite: 17 benchmarks covering all operations
-
---
-
-## Phase 3: ECDH Key Exchange ✅
-
-### Status: **100% Complete**
-
-### Objectives
-Implement Elliptic Curve Diffie-Hellman key exchange for secure key derivation.
-
-### Completed Components
-
-#### 1. **ECDH Operations** ✅
- **Files**: `ecdh.go`, `ecdh_test.go`
- **Status**: 100% complete
- **Key Features**:
-  - `ECDH` - Compute shared secret from private key and public key ✅
-  - `ECDHWithHKDF` - ECDH with HKDF key derivation ✅
-  - `ECDHXOnly` - X-only ECDH (BIP-340 style) ✅
-  - Custom hash function support ✅
-  - Secure memory clearing ✅
-
-#### 2. **Advanced Point Multiplication** ✅
- **Files**: `ecdh.go` (includes EcmultConst and Ecmult)
- **Status**: 100% complete
- **Key Features**:
-  - `EcmultConst` - Constant-time multiplication for arbitrary points ✅
-  - `Ecmult` - Variable-time optimized multiplication ✅
-  - Binary method implementation (ready for further optimization) ✅
-
-#### 3. **HKDF Support** ✅
- **Files**: `ecdh.go`
- **Status**: 100% complete
- **Key Features**:
-  - `HKDF` - HMAC-based Key Derivation Function (RFC 5869) ✅
-  - Extract and Expand phases ✅
-  - Supports arbitrary output length ✅
-  - Secure memory clearing ✅
-
-### Dependencies
- ✅ Phase 1: Group operations, scalar multiplication
- ✅ Phase 2: Hash functions (for HKDF)
-
-### Success Criteria
- [x] ECDH computes correct shared secrets ✅
- [x] X-only ECDH matches reference implementation ✅
- [x] HKDF key derivation works correctly ✅
- [x] All ECDH tests pass ✅
-
---
-
-## Phase 4: Schnorr Signatures & Advanced Features ✅
-
-### Status: **100% Complete**
-
-### Objectives
-Implement BIP-340 Schnorr signatures and advanced cryptographic features.
-
-### Completed Components
-
-#### 1. **Schnorr Signatures** ✅
- **Files**: `schnorr.go`, `schnorr_test.go`
- **Status**: 100% complete
- **Key Features**:
-  - `SchnorrSign` - Create BIP-340 compliant signatures ✅
-  - `SchnorrVerify` - Verify BIP-340 signatures ✅
-  - `NonceFunctionBIP340` - BIP-340 nonce generation ✅
-  - Tagged hash support (BIP-340 style) ✅
-  - Auxiliary randomness support ✅
-  - Secure memory clearing ✅
-
-#### 2. **Extended Public Keys** ✅
- **Files**: `extrakeys.go`, `extrakeys_test.go`
- **Status**: 100% complete
- **Key Features**:
-  - `XOnlyPubkey` type (32-byte X coordinate) ✅
-  - `KeyPair` type for Schnorr signatures ✅
-  - `XOnlyPubkeyParse` - Parse x-only public keys ✅
-  - `XOnlyPubkeyFromPubkey` - Convert full pubkey to x-only ✅
-  - `XOnlyPubkeyCmp` - Compare x-only public keys ✅
-  - `KeyPairCreate` - Create keypair from secret key ✅
-  - `KeyPairGenerate` - Generate random keypair ✅
-  - Public key parity extraction ✅
-
-### Dependencies
- ✅ Phase 1: Complete core infrastructure
- ✅ Phase 2: Hash functions (TaggedHash already implemented)
- ✅ Phase 3: ECDH, optimized multiplication
-
-### Success Criteria
- [x] Schnorr signatures match BIP-340 specification ✅
- [x] All Schnorr signature tests pass ✅
- [x] X-only public keys work correctly ✅
- [x] Keypair operations work correctly ✅
- [x] All Phase 4 tests pass ✅
-
---
-
-## Overall Implementation Strategy
-
-### Principles
-1. **Exact C Reference**: Follow C implementation algorithms exactly
-2. **Test-Driven**: Write comprehensive tests for each component
-3. **Incremental**: Complete each phase before moving to next
-4. **Performance**: Optimize where possible without sacrificing correctness
-5. **Go Idioms**: Use Go's type system and error handling appropriately
-
-### Testing Strategy
- **Unit Tests**: Every function has dedicated tests
- **Integration Tests**: End-to-end operation tests
- **Property Tests**: Cryptographic property verification
- **Benchmarks**: Performance measurement and comparison
- **Edge Cases**: Boundary condition testing
-
-### Code Quality
- **Documentation**: Comprehensive comments matching C reference
- **Type Safety**: Strong typing throughout
- **Error Handling**: Proper error propagation
- **Memory Safety**: Secure memory clearing
- **Constant-Time**: Where required for security
-
---
-
-## Current Status Summary
-
-### Phase 1: ✅ 100% Complete
- Field arithmetic: ✅ 100%
- Scalar arithmetic: ✅ 100%
- Context management: ✅ 100%
- Group operations: ✅ 100% (optimized Jacobian addition complete)
- Public key operations: ✅ 100%
-
-### Phase 2: ✅ 100% Complete
- Hash functions: ✅ 100%
- ECDSA signatures: ✅ 100%
- Private key operations: ✅ 100%
- Key pair generation: ✅ 100%
-
-### Phase 3: ✅ 100% Complete
- ECDH operations: ✅ 100%
- Point multiplication: ✅ 100%
- HKDF key derivation: ✅ 100%
-
-### Phase 4: ✅ 100% Complete
- Schnorr signatures: ✅ 100%
- X-only public keys: ✅ 100%
- Keypair operations: ✅ 100%
-
---
-
-## Next Steps
-
-### Immediate (Phase 1 Completion)
-✅ Phase 1 is complete! All tests passing.
-
-### Short-term (Phase 2)
-✅ Phase 2 is complete! All tests passing.
-
-### Medium-term (Phase 3)
-✅ Phase 3 is complete! All tests passing.
-
-### Long-term (Phase 4)
-✅ Phase 4 is complete! All tests passing.
-
---
-
-## Files Structure (Complete)
-
-```
-p256k1.mleku.dev/
-├── go.mod, go.sum
-├── Phase 1 (Complete)
-│   ├── context.go, context_test.go
-│   ├── field.go, field_mul.go, field_test.go
-│   ├── scalar.go, scalar_test.go
-│   ├── group.go, group_test.go
-│   ├── pubkey.go, pubkey_test.go
-│   └── ecmult_gen.go
-├── Phase 2 (Complete)
-│   ├── hash.go, hash_test.go
-│   ├── ecdsa.go, ecdsa_test.go
-│   ├── eckey.go, eckey_test.go
-│   ├── ecdsa_bench_test.go
-│   └── BENCHMARK_RESULTS.md
-├── Phase 3 (Complete)
-│   ├── ecdh.go, ecdh_test.go
-│   └── (ecmult functions included in ecdh.go)
-└── Phase 4 (Complete)
-    ├── schnorr.go, schnorr_test.go
-    └── extrakeys.go, extrakeys_test.go
-```
-
---
-
-**Last Updated**: Phase 4 implementation complete, 100% test success. All four phases complete! Schnorr signatures, X-only public keys, and keypair operations all working.
-**Target**: Complete port of secp256k1 C library to Go with full feature parity
--- a/VERIFICATION_PERFORMANCE_ANALYSIS.md
+++ b/VERIFICATION_PERFORMANCE_ANALYSIS.md
@@ -1,184 +0,0 @@
-# Verification Performance Analysis: NextP256K vs P256K1
-
-## Summary
-
-NextP256K's verification is **4.7x faster** than p256k1 (40,017 ns/op vs 186,054 ns/op) because it uses libsecp256k1's highly optimized C implementation, while p256k1 uses a simple binary multiplication algorithm.
-
-## Root Cause
-
-The performance bottleneck is in `EcmultConst`, which is used to compute `e*P` during Schnorr verification.
-
-### Schnorr Verification Algorithm
-
-```186:289:schnorr.go
-// SchnorrVerify verifies a Schnorr signature following BIP-340
-func SchnorrVerify(sig64 []byte, msg32 []byte, xonlyPubkey *XOnlyPubkey) bool {
-	// ... validation ...
-	
-	// Compute R = s*G - e*P
-	// First compute s*G
-	var sG GroupElementJacobian
-	EcmultGen(&sG, &s)  // Fast: uses optimized precomputed tables
-
-	// Compute e*P where P is the x-only pubkey
-	var eP GroupElementJacobian
-	EcmultConst(&eP, &pk, &e)  // Slow: uses simple binary method
-	
-	// ... rest of verification ...
-}
-```
-
-### Performance Breakdown
-
-1. **s*G computation** (`EcmultGen`):
-   - Uses 8-bit byte-based precomputed tables
-   - Highly optimized: ~58,618 ns/op for pubkey derivation
-   - Fast because the generator point G is fixed and precomputed
-
-2. **e*P computation** (`EcmultConst`):
-   - Uses simple binary method with 256 iterations
-   - Each iteration: double, check bit, potentially add
-   - **This is the bottleneck**
-
-### Current EcmultConst Implementation
-
-```10:48:ecdh.go
-// EcmultConst computes r = q * a using constant-time multiplication
-// This is a simplified implementation for Phase 3 - can be optimized later
-func EcmultConst(r *GroupElementJacobian, a *GroupElementAffine, q *Scalar) {
-	// ... edge cases ...
-	
-	// Process bits from MSB to LSB
-	for i := 0; i < 256; i++ {
-		if i > 0 {
-			r.double(r)
-		}
-		
-		// Get bit i (from MSB)
-		bit := q.getBits(uint(255-i), 1)
-		if bit != 0 {
-			if r.isInfinity() {
-				*r = base
-			} else {
-				r.addVar(r, &base)
-			}
-		}
-	}
-}
-```
-
-**Problem:** This performs 256 iterations, each requiring:
- One field element doubling operation
- One bit extraction
- Potentially one point addition
-
-For verification, this means **256 doublings + up to 256 additions** per verification, which is extremely inefficient.
-
-## Why NextP256K is Faster
-
-NextP256K uses libsecp256k1's optimized C implementation (`secp256k1_ecmult_const`) which:
-
-1. **Uses GLV Endomorphism**:
-   - Splits the scalar into two smaller components using the curve's endomorphism
-   - Computes two smaller multiplications instead of one large one
-   - Reduces the effective bit length from 256 to ~128 bits per component
-
-2. **Windowed Precomputation**:
-   - Precomputes a table of multiples of the base point
-   - Uses windowed lookups instead of processing bits one at a time
-   - Processes multiple bits per iteration (typically 4-6 bits at a time)
-
-3. **Signed-Digit Multi-Comb Algorithm**:
-   - Uses a more efficient representation that reduces the number of additions
-   - Minimizes the number of point operations required
-
-4. **Assembly Optimizations**:
-   - Field arithmetic operations are optimized in assembly
-   - Hand-tuned for specific CPU architectures
-
-### Reference Implementation
-
-The C reference shows the complexity:
-
-```124:268:src/ecmult_const_impl.h
-static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, const secp256k1_scalar *q) {
-    /* The approach below combines the signed-digit logic from Mike Hamburg's
-     * "Fast and compact elliptic-curve cryptography" (https://eprint.iacr.org/2012/309)
-     * Section 3.3, with the GLV endomorphism.
-     * ... */
-    
-    /* Precompute table for base point and lambda * base point */
-    
-    /* Process bits in groups using windowed lookups */
-    for (group = ECMULT_CONST_GROUPS - 1; group >= 0; --group) {
-        /* Lookup precomputed points */
-        ECMULT_CONST_TABLE_GET_GE(&t, pre_a, bits1);
-        /* ... */
-    }
-}
-```
-
-## Performance Impact
-
-### Benchmark Results
-
-| Operation | P256K1 | NextP256K | Speedup |
-|-----------|--------|-----------|---------|
-| **Verification** | 186,054 ns/op | 40,017 ns/op | **4.7x** |
-| Signing | 31,937 ns/op | 52,060 ns/op | 0.6x (slower) |
-| Pubkey Derivation | 58,618 ns/op | 280,835 ns/op | 0.2x (slower) |
-
-**Note:** NextP256K is slower for signing and pubkey derivation due to CGO overhead for smaller operations, but much faster for verification because the computation is more complex.
-
-## Optimization Opportunities
-
-To improve p256k1's verification performance, `EcmultConst` should be optimized to:
-
-1. **Implement GLV Endomorphism**:
-   - Split scalar using secp256k1's endomorphism
-   - Compute two smaller multiplications
-   - Combine results
-
-2. **Add Windowed Precomputation**:
-   - Precompute a table of multiples of the base point
-   - Process bits in groups (windows) instead of individually
-   - Use lookup tables instead of repeated additions
-
-3. **Consider Variable-Time Optimization**:
-   - For verification (public operation), variable-time algorithms are acceptable
-   - Could use `Ecmult` instead of `EcmultConst` if constant-time isn't required
-
-4. **Implement Signed-Digit Representation**:
-   - Use signed-digit multi-comb algorithm
-   - Reduce the number of additions required
-
-## Complexity Comparison
-
-### Current (Simple Binary Method)
- **Operations:** O(256) doublings + O(256) additions (worst case)
- **Complexity:** ~256 point operations
-
-### Optimized (Windowed + GLV)
- **Operations:** O(64) doublings + O(16) additions (with window size 4)
- **Complexity:** ~80 point operations (4x improvement)
-
-### With Assembly Optimizations
- **Additional:** 2-3x speedup from optimized field arithmetic
- **Total:** ~10-15x faster than simple binary method
-
-## Conclusion
-
-The 4.7x performance difference is primarily due to:
-1. **Algorithmic efficiency**: Windowed multiplication vs. simple binary method
-2. **GLV endomorphism**: Splitting scalar into smaller components
-3. **Assembly optimizations**: Hand-tuned field arithmetic in C
-4. **Better memory access patterns**: Precomputed tables vs. repeated computations
-
-The optimization is non-trivial and would require implementing:
- GLV endomorphism support
- Windowed precomputation tables
- Signed-digit multi-comb algorithm
- Potentially assembly optimizations for field arithmetic
-
-For now, NextP256K's advantage in verification is expected given its use of the mature, highly optimized libsecp256k1 C library.
-
--- a/VERIFY_OPTIMIZATION_ANALYSIS.md
+++ b/VERIFY_OPTIMIZATION_ANALYSIS.md
@@ -1,107 +0,0 @@
-# Verify Function Performance Analysis: C vs Go
-
-## Key Finding: The C Version Uses Strauss-WNAF Algorithm
-
-The C implementation of `secp256k1_schnorrsig_verify` uses a **highly optimized Strauss-WNAF algorithm** that computes `r = s*G + (-e)*P` in a **single interleaved operation** rather than two separate multiplications.
-
-## Current Go Implementation (verify.go:692-722)
-
-```go
-func secp256k1_ecmult(r *secp256k1_gej, a *secp256k1_gej, na *secp256k1_scalar, ng *secp256k1_scalar) {
-    // r = na * a + ng * G
-    // First compute na * a
-    var naa GroupElementJacobian
-    Ecmult(&naa, &geja, &sna)  // ~43 iterations (6-bit windows)
-    
-    // Then compute ng * G
-    var ngg GroupElementJacobian
-    EcmultGen(&ngg, &sng)  // ~32 iterations (byte-based)
-    
-    // Add them together
-    gejr.addVar(&naa, &ngg)
-}
-```
-
-**Performance**: ~75 iterations total (43 + 32), plus one addition
-
-## C Implementation (src/ecmult_impl.h:321-342)
-
-```c
-for (i = bits - 1; i >= 0; i--) {
-    secp256k1_gej_double_var(r, r, NULL);  // ONE doubling per iteration
-    // Check na*a contribution
-    if (i < bits_na_1 && (n = wnaf_na_1[i])) {
-        secp256k1_ecmult_table_get_ge(&tmpa, pre_a, n, WINDOW_A);
-        secp256k1_gej_add_ge_var(r, r, &tmpa, NULL);
-    }
-    // Check ng*G contribution  
-    if (i < bits_ng_1 && (n = wnaf_ng_1[i])) {
-        secp256k1_ecmult_table_get_ge_storage(&tmpa, secp256k1_pre_g, n, WINDOW_G);
-        secp256k1_gej_add_zinv_var(r, r, &tmpa, &Z);
-    }
-}
-```
-
-**Performance**: ~129 iterations total (max bits needed), with interleaved additions
-
-## Why C is Faster
-
-### 1. **Interleaved Operations**
- **C**: Processes both scalars bit-by-bit in ONE loop
-  - Each iteration: double once, then potentially add from either table
-  - Total: ~129 iterations (the maximum bits needed)
-  
- **Go**: Computes two separate multiplications
-  - `na*a`: ~43 iterations (6-bit windows)
-  - `ng*G`: ~32 iterations (byte-based)
-  - Total: ~75 iterations PLUS one final addition
-
-### 2. **GLV Endomorphism Optimization**
-The C version uses scalar splitting with lambda endomorphism:
- Splits `na` into `na_1` and `na_lam` (~128 bits each)
- Uses precomputed lambda table for faster operations
- Reduces effective scalar size from 256 bits to ~128 bits
-
-### 3. **WNAF (Windowed Non-Adjacent Form)**
- Sparse representation: non-zero entries separated by at least (w-1) zeroes
- Reduces number of additions needed
- Uses signed digits: can subtract instead of just add
-
-### 4. **Precomputed Tables**
- C uses optimized precomputed tables for both `a` and `G`
- Uses isomorphic curve representation for faster affine additions
- Stores points in optimized storage format
-
-### 5. **Fewer Doublings**
- **C**: ~129 doublings (one per bit position)
- **Go**: ~43 doublings for `na*a` + ~32 doublings for `ng*G` = ~75 doublings
- But C also does fewer additions due to WNAF sparsity
-
-## Performance Impact
-
-The C version is ~3-4x faster because:
-1. **Single loop**: Processes everything in one pass (~129 iterations vs ~75+1)
-2. **Sparse operations**: WNAF reduces additions (maybe 20-30 additions vs 32+)
-3. **Optimized tables**: Precomputed tables with isomorphic curve optimization
-4. **Better cache locality**: Everything in one loop, better CPU cache usage
-
-## Recommendation
-
-To match C performance, implement the Strauss-WNAF algorithm in Go:
-1. Implement WNAF conversion for scalars
-2. Implement GLV endomorphism scalar splitting
-3. Implement interleaved multiplication loop
-4. Use precomputed tables with isomorphic curve optimization
-5. This will require implementing several missing functions:
-   - `secp256k1_scalar_split_lambda`
-   - `secp256k1_scalar_split_128`
-   - `secp256k1_ecmult_wnaf`
-   - `secp256k1_ecmult_odd_multiples_table`
-   - `secp256k1_ge_table_set_globalz`
-   - `secp256k1_ecmult_table_get_ge`
-   - `secp256k1_ecmult_table_get_ge_lambda`
-   - `secp256k1_ecmult_table_get_ge_storage`
-   - And the GLV lambda constant/endomorphism functions
-
-This is a significant optimization that would bring Go performance much closer to C.
-
--- a/mem.prof
+++ b/mem.prof
--- a/p256k1.mleku.dev.test
+++ b/p256k1.mleku.dev.test