# secp256k1 Go Implementation This package provides a pure Go implementation of the secp256k1 elliptic curve cryptographic primitives, ported from the libsecp256k1 C library. ## Features Implemented ### ✅ Core Components - **Field Arithmetic** (`field.go`, `field_mul.go`): Complete implementation of field operations modulo the secp256k1 field prime (2^256 - 2^32 - 977) - 5x52-bit limb representation for efficient arithmetic - Addition, multiplication, squaring, inversion operations - Constant-time normalization and magnitude management - **Scalar Arithmetic** (`scalar.go`): Complete implementation of scalar operations modulo the group order - 4x64-bit limb representation - Addition, multiplication, inversion, negation operations - Proper overflow handling and reduction - **Group Operations** (`group.go`): Elliptic curve point operations - Affine and Jacobian coordinate representations - Point addition, doubling, negation - Coordinate conversion between representations - **Context Management** (`context.go`): Context objects for enhanced security - Context creation, cloning, destruction - Randomization for side-channel protection - Callback management for error handling - **Main API** (`secp256k1.go`): Core secp256k1 API functions - Public key parsing, serialization, and comparison - ECDSA signature parsing and serialization - Key generation and verification - Basic ECDSA signing and verification (simplified implementation) - **Utilities** (`util.go`): Helper functions and constants - Memory management utilities - Endianness conversion functions - Bit manipulation utilities - Error handling and callbacks ### ✅ Testing - Comprehensive test suite (`secp256k1_test.go`) covering: - Basic functionality and self-tests - Field element operations - Scalar operations - Key generation - Signature operations - Public key operations - Performance benchmarks ## Usage ```go package main import ( "fmt" "crypto/rand" p256k1 "p256k1.mleku.dev/pkg" ) func main() { // Create context ctx, err := p256k1.ContextCreate(p256k1.ContextNone) if err != nil { panic(err) } defer p256k1.ContextDestroy(ctx) // Generate secret key var seckey [32]byte rand.Read(seckey[:]) // Verify secret key if !p256k1.ECSecKeyVerify(ctx, seckey[:]) { panic("Invalid secret key") } // Create public key var pubkey p256k1.PublicKey if !p256k1.ECPubkeyCreate(ctx, &pubkey, seckey[:]) { panic("Failed to create public key") } fmt.Println("Successfully created secp256k1 key pair!") } ``` ## Architecture The implementation follows the same architectural patterns as libsecp256k1: 1. **Layered Design**: Low-level field/scalar arithmetic → Group operations → High-level API 2. **Constant-Time Operations**: Designed to prevent timing side-channel attacks 3. **Magnitude Tracking**: Field elements track their "magnitude" to optimize operations 4. **Context Objects**: Encapsulate state and provide enhanced security features ## Performance Benchmark results on AMD Ryzen 5 PRO 4650G: - Field Addition: ~2.4 ns/op - Scalar Multiplication: ~9.9 ns/op ## AVX2 Acceleration Opportunities The Scalar and FieldElement types and their operations are designed with data layouts that are amenable to AVX2 SIMD acceleration: ### Scalar Type (`scalar.go`) - **Representation**: 4×64-bit limbs (`[4]uint64`) representing 256-bit scalars - **AVX2-Acceleratable Operations**: - `scalarAdd` / `scalarMul`: 256-bit integer arithmetic using `VPADDD/Q`, `VPMULUDQ` - `mul512`: Full 512-bit product computation - can use AVX2's 256-bit registers to process limb pairs in parallel - `reduce512`: Modular reduction with Montgomery-style operations - `wNAF`: Window Non-Adjacent Form conversion for scalar multiplication - `splitLambda`: GLV endomorphism scalar splitting ### FieldElement Type (`field.go`, `field_mul.go`) - **Representation**: 5×52-bit limbs (`[5]uint64`) in base 2^52 for efficient multiplication - **AVX2-Acceleratable Operations**: - `mul` / `sqr`: Field multiplication/squaring using 128-bit intermediate products - `normalize` / `normalizeWeak`: Carry propagation across limbs - `add` / `negate`: Parallel limb operations ideal for `VPADDQ`, `VPSUBQ` - `inv`: Modular inversion via Fermat's little theorem (chain of sqr/mul) - `sqrt`: Square root computation using addition chains ### Affine/Jacobian Group Operations (`group.go`) - **Types**: `GroupElementAffine` (x, y coordinates), `GroupElementJacobian` (x, y, z coordinates) - **AVX2-Acceleratable Operations**: - `double`: Point doubling - multiple independent field operations - `addVar` / `addGE`: Point addition - parallelizable field multiplications - `setGEJ`: Coordinate conversion with batch field inversions ### Key AVX2 Instructions for Implementation | Operation | Relevant AVX2 Instructions | |-----------|---------------------------| | 128-bit limb add | `VPADDQ` (packed 64-bit add) with carry chain | | Limb multiplication | `VPMULUDQ` (unsigned 32×32→64), `VPCLMULQDQ` (carryless multiply) | | 128-bit arithmetic | `VPMULLD`, `VPMULUDQ` for multi-precision products | | Carry propagation | `VPSRLQ`/`VPSLLQ` (shift), `VPAND` (mask), `VPALIGNR` | | Conditional moves | `VPBLENDVB` (blend based on mask) | | Data movement | `VMOVDQU` (unaligned load/store), `VBROADCASTI128` | ### 128-bit Limb Representation with AVX2 AVX2's 256-bit YMM registers can natively hold two 128-bit limbs, enabling more efficient representations: **Scalar (256-bit) with 2×128-bit limbs:** ``` YMM0 = [scalar.d[1]:scalar.d[0]] | [scalar.d[3]:scalar.d[2]] ├── 128-bit limb 0 ───────┤ ├── 128-bit limb 1 ───────┤ ``` - A single 256-bit scalar fits in one YMM register as two 128-bit limbs - Addition/subtraction can use `VPADDQ` with manual carry handling between 64-bit halves - The 4×64-bit representation naturally maps to 2×128-bit by treating pairs **FieldElement (260-bit effective) with 128-bit limbs:** ``` YMM0 = [fe.n[0]:fe.n[1]] (lower 104 bits used per pair) YMM1 = [fe.n[2]:fe.n[3]] XMM2 = [fe.n[4]:0] (upper 48 bits) ``` - 5×52-bit limbs can be reorganized into 3×128-bit containers - Multiplication benefits from `VPMULUDQ` processing two 64×64→128 products simultaneously **512-bit Intermediate Products:** - Scalar multiplication produces 512-bit intermediates - Two YMM registers hold the full product: `YMM0 = [l[1]:l[0]], YMM1 = [l[3]:l[2]], YMM2 = [l[5]:l[4]], YMM3 = [l[7]:l[6]]` - Reduction can proceed in parallel across register pairs ### Implementation Approach AVX2 acceleration can be added via Go assembly (`.s` files) using the patterns described in `AVX.md`: ```go //go:build amd64 package p256k1 // FieldMulAVX2 multiplies two field elements using AVX2 // Uses 128-bit limb operations for ~2x throughput //go:noescape func FieldMulAVX2(r, a, b *FieldElement) // ScalarMulAVX2 multiplies two scalars using AVX2 // Processes scalar as 2×128-bit limbs in a single YMM register //go:noescape func ScalarMulAVX2(r, a, b *Scalar) // ScalarAdd256AVX2 adds two 256-bit scalars using 128-bit limb arithmetic //go:noescape func ScalarAdd256AVX2(r, a, b *Scalar) bool ``` The key insight is that AVX2's 256-bit registers holding 128-bit limb pairs enable: - **2x parallelism** for addition/subtraction across limb pairs - **Efficient carry chains** using `VPSRLQ` to extract carries and `VPADDQ` to propagate - **Reduced loop iterations** for multi-precision arithmetic (2 iterations for 256-bit instead of 4) ## Implementation Status ### ✅ Completed - Core field and scalar arithmetic - Basic group operations - Context management - Main API structure - Key generation and verification - Basic signature operations - Comprehensive test suite ### 🚧 Simplified/Placeholder - **ECDSA Implementation**: Basic structure in place, but signing/verification uses simplified algorithms - **Field Multiplication**: Uses simplified approach instead of optimized assembly - **Point Validation**: Curve equation checking is simplified - **Nonce Generation**: Uses crypto/rand instead of RFC 6979 ### ❌ Not Yet Implemented - **Hash Functions**: SHA-256 and tagged hash implementations - **Optimized Multiplication**: Full constant-time field multiplication - **Precomputed Tables**: Optimized scalar multiplication with precomputed points - **Optional Modules**: Schnorr signatures, ECDH, extra keys - **Recovery**: Public key recovery from signatures - **Complete ECDSA**: Full constant-time ECDSA implementation ## Security Considerations ⚠️ **This implementation is for educational/development purposes and should not be used in production without further security review and completion of the cryptographic implementations.** Key security features implemented: - Constant-time field operations (basic level) - Magnitude tracking to prevent overflows - Memory clearing for sensitive data - Context randomization support Key security features still needed: - Complete constant-time ECDSA implementation - Proper nonce generation (RFC 6979) - Side-channel resistance verification - Comprehensive security testing ## Building and Testing ```bash cd pkg/ go test -v # Run all tests go test -bench=. # Run benchmarks go build # Build the package ``` ## License This implementation is derived from libsecp256k1 and maintains the same MIT license.