Field Arithmetic (field.go, field_mul.go): Complete implementation of field operations modulo the secp256k1 field prime (2^256 - 2^32 - 977)
- 5x52-bit limb representation for efficient arithmetic
- Addition, multiplication, squaring, inversion operations
- Constant-time normalization and magnitude management
Scalar Arithmetic (scalar.go): Complete implementation of scalar operations modulo the group order
- 4x64-bit limb representation
- Addition, multiplication, inversion, negation operations
- Proper overflow handling and reduction
Group Operations (group.go): Elliptic curve point operations
- Affine and Jacobian coordinate representations
- Point addition, doubling, negation
- Coordinate conversion between representations
Context Management (context.go): Context objects for enhanced security
- Context creation, cloning, destruction
- Randomization for side-channel protection
- Callback management for error handling
Main API (secp256k1.go): Core secp256k1 API functions
- Public key parsing, serialization, and comparison
- ECDSA signature parsing and serialization
- Key generation and verification
- Basic ECDSA signing and verification (simplified implementation)
Utilities (util.go): Helper functions and constants
- Memory management utilities
- Endianness conversion functions
- Bit manipulation utilities
- Error handling and callbacks

✅ Testing

Comprehensive test suite (secp256k1_test.go) covering:
- Basic functionality and self-tests
- Field element operations
- Scalar operations
- Key generation
- Signature operations
- Public key operations
- Performance benchmarks

Usage

package main

import (
    "fmt"
    "crypto/rand"
    p256k1 "p256k1.mleku.dev/pkg"
)

func main() {
    // Create context
    ctx, err := p256k1.ContextCreate(p256k1.ContextNone)
    if err != nil {
        panic(err)
    }
    defer p256k1.ContextDestroy(ctx)
    
    // Generate secret key
    var seckey [32]byte
    rand.Read(seckey[:])
    
    // Verify secret key
    if !p256k1.ECSecKeyVerify(ctx, seckey[:]) {
        panic("Invalid secret key")
    }
    
    // Create public key
    var pubkey p256k1.PublicKey
    if !p256k1.ECPubkeyCreate(ctx, &pubkey, seckey[:]) {
        panic("Failed to create public key")
    }
    
    fmt.Println("Successfully created secp256k1 key pair!")
}

Architecture

The implementation follows the same architectural patterns as libsecp256k1:

Layered Design: Low-level field/scalar arithmetic → Group operations → High-level API
Constant-Time Operations: Designed to prevent timing side-channel attacks
Magnitude Tracking: Field elements track their "magnitude" to optimize operations
Context Objects: Encapsulate state and provide enhanced security features

Performance

Benchmark results on AMD Ryzen 5 PRO 4650G:

Field Addition: ~2.4 ns/op
Scalar Multiplication: ~9.9 ns/op

AVX2 Acceleration Opportunities

The Scalar and FieldElement types and their operations are designed with data layouts that are amenable to AVX2 SIMD acceleration:

Scalar Type (`scalar.go`)

Representation: 4×64-bit limbs ([4]uint64) representing 256-bit scalars
AVX2-Acceleratable Operations:
- scalarAdd / scalarMul: 256-bit integer arithmetic using VPADDD/Q, VPMULUDQ
- mul512: Full 512-bit product computation - can use AVX2's 256-bit registers to process limb pairs in parallel
- reduce512: Modular reduction with Montgomery-style operations
- wNAF: Window Non-Adjacent Form conversion for scalar multiplication
- splitLambda: GLV endomorphism scalar splitting

FieldElement Type (`field.go`, `field_mul.go`)

Representation: 5×52-bit limbs ([5]uint64) in base 2^52 for efficient multiplication
AVX2-Acceleratable Operations:
- mul / sqr: Field multiplication/squaring using 128-bit intermediate products
- normalize / normalizeWeak: Carry propagation across limbs
- add / negate: Parallel limb operations ideal for VPADDQ, VPSUBQ
- inv: Modular inversion via Fermat's little theorem (chain of sqr/mul)
- sqrt: Square root computation using addition chains

Affine/Jacobian Group Operations (`group.go`)

Types: GroupElementAffine (x, y coordinates), GroupElementJacobian (x, y, z coordinates)
AVX2-Acceleratable Operations:
- double: Point doubling - multiple independent field operations
- addVar / addGE: Point addition - parallelizable field multiplications
- setGEJ: Coordinate conversion with batch field inversions

Key AVX2 Instructions for Implementation

Operation	Relevant AVX2 Instructions
128-bit limb add	`VPADDQ` (packed 64-bit add) with carry chain
Limb multiplication	`VPMULUDQ` (unsigned 32×32→64), `VPCLMULQDQ` (carryless multiply)
128-bit arithmetic	`VPMULLD`, `VPMULUDQ` for multi-precision products
Carry propagation	`VPSRLQ`/`VPSLLQ` (shift), `VPAND` (mask), `VPALIGNR`
Conditional moves	`VPBLENDVB` (blend based on mask)
Data movement	`VMOVDQU` (unaligned load/store), `VBROADCASTI128`

128-bit Limb Representation with AVX2

AVX2's 256-bit YMM registers can natively hold two 128-bit limbs, enabling more efficient representations:

Scalar (256-bit) with 2×128-bit limbs:

YMM0 = [scalar.d[1]:scalar.d[0]] | [scalar.d[3]:scalar.d[2]]
       ├── 128-bit limb 0 ───────┤ ├── 128-bit limb 1 ───────┤

A single 256-bit scalar fits in one YMM register as two 128-bit limbs
Addition/subtraction can use VPADDQ with manual carry handling between 64-bit halves
The 4×64-bit representation naturally maps to 2×128-bit by treating pairs

FieldElement (260-bit effective) with 128-bit limbs:

YMM0 = [fe.n[0]:fe.n[1]] (lower 104 bits used per pair)
YMM1 = [fe.n[2]:fe.n[3]]
XMM2 = [fe.n[4]:0]       (upper 48 bits)

5×52-bit limbs can be reorganized into 3×128-bit containers
Multiplication benefits from VPMULUDQ processing two 64×64→128 products simultaneously

512-bit Intermediate Products:

Scalar multiplication produces 512-bit intermediates
Two YMM registers hold the full product: YMM0 = [l[1]:l[0]], YMM1 = [l[3]:l[2]], YMM2 = [l[5]:l[4]], YMM3 = [l[7]:l[6]]
Reduction can proceed in parallel across register pairs

Implementation Approach

AVX2 acceleration can be added via Go assembly (.s files) using the patterns described in AVX.md:

//go:build amd64

package p256k1

// FieldMulAVX2 multiplies two field elements using AVX2
// Uses 128-bit limb operations for ~2x throughput
//go:noescape
func FieldMulAVX2(r, a, b *FieldElement)

// ScalarMulAVX2 multiplies two scalars using AVX2
// Processes scalar as 2×128-bit limbs in a single YMM register
//go:noescape
func ScalarMulAVX2(r, a, b *Scalar)

// ScalarAdd256AVX2 adds two 256-bit scalars using 128-bit limb arithmetic
//go:noescape
func ScalarAdd256AVX2(r, a, b *Scalar) bool

The key insight is that AVX2's 256-bit registers holding 128-bit limb pairs enable:

2x parallelism for addition/subtraction across limb pairs
Efficient carry chains using VPSRLQ to extract carries and VPADDQ to propagate
Reduced loop iterations for multi-precision arithmetic (2 iterations for 256-bit instead of 4)

Implementation Status

✅ Completed

Core field and scalar arithmetic
Basic group operations
Context management
Main API structure
Key generation and verification
Basic signature operations
Comprehensive test suite

🚧 Simplified/Placeholder

ECDSA Implementation: Basic structure in place, but signing/verification uses simplified algorithms
Field Multiplication: Uses simplified approach instead of optimized assembly
Point Validation: Curve equation checking is simplified
Nonce Generation: Uses crypto/rand instead of RFC 6979

❌ Not Yet Implemented

Hash Functions: SHA-256 and tagged hash implementations
Optimized Multiplication: Full constant-time field multiplication
Precomputed Tables: Optimized scalar multiplication with precomputed points
Optional Modules: Schnorr signatures, ECDH, extra keys
Recovery: Public key recovery from signatures
Complete ECDSA: Full constant-time ECDSA implementation

Security Considerations

⚠️ This implementation is for educational/development purposes and should not be used in production without further security review and completion of the cryptographic implementations.

Key security features implemented:

Constant-time field operations (basic level)
Magnitude tracking to prevent overflows
Memory clearing for sensitive data
Context randomization support

Key security features still needed:

Complete constant-time ECDSA implementation
Proper nonce generation (RFC 6979)
Side-channel resistance verification
Comprehensive security testing

Building and Testing

cd pkg/
go test -v          # Run all tests
go test -bench=.    # Run benchmarks
go build            # Build the package

License

This implementation is derived from libsecp256k1 and maintains the same MIT license.

README.md Unescape Escape

secp256k1 Go Implementation

Features Implemented

✅ Core Components