2025-11-28 16:35:08 +00:00
2025-11-01 17:49:38 +00:00
2025-11-28 16:35:08 +00:00
2025-11-01 17:49:38 +00:00
2025-11-02 02:45:59 +00:00
2025-11-02 02:45:59 +00:00
2025-11-01 17:49:38 +00:00
2025-11-02 02:45:59 +00:00
2025-11-28 16:35:08 +00:00
2025-11-02 02:45:59 +00:00

secp256k1 Go Implementation

This package provides a pure Go implementation of the secp256k1 elliptic curve cryptographic primitives, ported from the libsecp256k1 C library.

Features Implemented

Core Components

  • Field Arithmetic (field.go, field_mul.go): Complete implementation of field operations modulo the secp256k1 field prime (2^256 - 2^32 - 977)

    • 5x52-bit limb representation for efficient arithmetic
    • Addition, multiplication, squaring, inversion operations
    • Constant-time normalization and magnitude management
  • Scalar Arithmetic (scalar.go): Complete implementation of scalar operations modulo the group order

    • 4x64-bit limb representation
    • Addition, multiplication, inversion, negation operations
    • Proper overflow handling and reduction
  • Group Operations (group.go): Elliptic curve point operations

    • Affine and Jacobian coordinate representations
    • Point addition, doubling, negation
    • Coordinate conversion between representations
  • Context Management (context.go): Context objects for enhanced security

    • Context creation, cloning, destruction
    • Randomization for side-channel protection
    • Callback management for error handling
  • Main API (secp256k1.go): Core secp256k1 API functions

    • Public key parsing, serialization, and comparison
    • ECDSA signature parsing and serialization
    • Key generation and verification
    • Basic ECDSA signing and verification (simplified implementation)
  • Utilities (util.go): Helper functions and constants

    • Memory management utilities
    • Endianness conversion functions
    • Bit manipulation utilities
    • Error handling and callbacks

Testing

  • Comprehensive test suite (secp256k1_test.go) covering:
    • Basic functionality and self-tests
    • Field element operations
    • Scalar operations
    • Key generation
    • Signature operations
    • Public key operations
    • Performance benchmarks

Usage

package main

import (
    "fmt"
    "crypto/rand"
    p256k1 "p256k1.mleku.dev/pkg"
)

func main() {
    // Create context
    ctx, err := p256k1.ContextCreate(p256k1.ContextNone)
    if err != nil {
        panic(err)
    }
    defer p256k1.ContextDestroy(ctx)
    
    // Generate secret key
    var seckey [32]byte
    rand.Read(seckey[:])
    
    // Verify secret key
    if !p256k1.ECSecKeyVerify(ctx, seckey[:]) {
        panic("Invalid secret key")
    }
    
    // Create public key
    var pubkey p256k1.PublicKey
    if !p256k1.ECPubkeyCreate(ctx, &pubkey, seckey[:]) {
        panic("Failed to create public key")
    }
    
    fmt.Println("Successfully created secp256k1 key pair!")
}

Architecture

The implementation follows the same architectural patterns as libsecp256k1:

  1. Layered Design: Low-level field/scalar arithmetic → Group operations → High-level API
  2. Constant-Time Operations: Designed to prevent timing side-channel attacks
  3. Magnitude Tracking: Field elements track their "magnitude" to optimize operations
  4. Context Objects: Encapsulate state and provide enhanced security features

Performance

Benchmark results on AMD Ryzen 5 PRO 4650G:

  • Field Addition: ~2.4 ns/op
  • Scalar Multiplication: ~9.9 ns/op

AVX2 Acceleration Opportunities

The Scalar and FieldElement types and their operations are designed with data layouts that are amenable to AVX2 SIMD acceleration:

Scalar Type (scalar.go)

  • Representation: 4×64-bit limbs ([4]uint64) representing 256-bit scalars
  • AVX2-Acceleratable Operations:
    • scalarAdd / scalarMul: 256-bit integer arithmetic using VPADDD/Q, VPMULUDQ
    • mul512: Full 512-bit product computation - can use AVX2's 256-bit registers to process limb pairs in parallel
    • reduce512: Modular reduction with Montgomery-style operations
    • wNAF: Window Non-Adjacent Form conversion for scalar multiplication
    • splitLambda: GLV endomorphism scalar splitting

FieldElement Type (field.go, field_mul.go)

  • Representation: 5×52-bit limbs ([5]uint64) in base 2^52 for efficient multiplication
  • AVX2-Acceleratable Operations:
    • mul / sqr: Field multiplication/squaring using 128-bit intermediate products
    • normalize / normalizeWeak: Carry propagation across limbs
    • add / negate: Parallel limb operations ideal for VPADDQ, VPSUBQ
    • inv: Modular inversion via Fermat's little theorem (chain of sqr/mul)
    • sqrt: Square root computation using addition chains

Affine/Jacobian Group Operations (group.go)

  • Types: GroupElementAffine (x, y coordinates), GroupElementJacobian (x, y, z coordinates)
  • AVX2-Acceleratable Operations:
    • double: Point doubling - multiple independent field operations
    • addVar / addGE: Point addition - parallelizable field multiplications
    • setGEJ: Coordinate conversion with batch field inversions

Key AVX2 Instructions for Implementation

Operation Relevant AVX2 Instructions
128-bit limb add VPADDQ (packed 64-bit add) with carry chain
Limb multiplication VPMULUDQ (unsigned 32×32→64), VPCLMULQDQ (carryless multiply)
128-bit arithmetic VPMULLD, VPMULUDQ for multi-precision products
Carry propagation VPSRLQ/VPSLLQ (shift), VPAND (mask), VPALIGNR
Conditional moves VPBLENDVB (blend based on mask)
Data movement VMOVDQU (unaligned load/store), VBROADCASTI128

128-bit Limb Representation with AVX2

AVX2's 256-bit YMM registers can natively hold two 128-bit limbs, enabling more efficient representations:

Scalar (256-bit) with 2×128-bit limbs:

YMM0 = [scalar.d[1]:scalar.d[0]] | [scalar.d[3]:scalar.d[2]]
       ├── 128-bit limb 0 ───────┤ ├── 128-bit limb 1 ───────┤
  • A single 256-bit scalar fits in one YMM register as two 128-bit limbs
  • Addition/subtraction can use VPADDQ with manual carry handling between 64-bit halves
  • The 4×64-bit representation naturally maps to 2×128-bit by treating pairs

FieldElement (260-bit effective) with 128-bit limbs:

YMM0 = [fe.n[0]:fe.n[1]] (lower 104 bits used per pair)
YMM1 = [fe.n[2]:fe.n[3]]
XMM2 = [fe.n[4]:0]       (upper 48 bits)
  • 5×52-bit limbs can be reorganized into 3×128-bit containers
  • Multiplication benefits from VPMULUDQ processing two 64×64→128 products simultaneously

512-bit Intermediate Products:

  • Scalar multiplication produces 512-bit intermediates
  • Two YMM registers hold the full product: YMM0 = [l[1]:l[0]], YMM1 = [l[3]:l[2]], YMM2 = [l[5]:l[4]], YMM3 = [l[7]:l[6]]
  • Reduction can proceed in parallel across register pairs

Implementation Approach

AVX2 acceleration can be added via Go assembly (.s files) using the patterns described in AVX.md:

//go:build amd64

package p256k1

// FieldMulAVX2 multiplies two field elements using AVX2
// Uses 128-bit limb operations for ~2x throughput
//go:noescape
func FieldMulAVX2(r, a, b *FieldElement)

// ScalarMulAVX2 multiplies two scalars using AVX2
// Processes scalar as 2×128-bit limbs in a single YMM register
//go:noescape
func ScalarMulAVX2(r, a, b *Scalar)

// ScalarAdd256AVX2 adds two 256-bit scalars using 128-bit limb arithmetic
//go:noescape
func ScalarAdd256AVX2(r, a, b *Scalar) bool

The key insight is that AVX2's 256-bit registers holding 128-bit limb pairs enable:

  • 2x parallelism for addition/subtraction across limb pairs
  • Efficient carry chains using VPSRLQ to extract carries and VPADDQ to propagate
  • Reduced loop iterations for multi-precision arithmetic (2 iterations for 256-bit instead of 4)

Implementation Status

Completed

  • Core field and scalar arithmetic
  • Basic group operations
  • Context management
  • Main API structure
  • Key generation and verification
  • Basic signature operations
  • Comprehensive test suite

🚧 Simplified/Placeholder

  • ECDSA Implementation: Basic structure in place, but signing/verification uses simplified algorithms
  • Field Multiplication: Uses simplified approach instead of optimized assembly
  • Point Validation: Curve equation checking is simplified
  • Nonce Generation: Uses crypto/rand instead of RFC 6979

Not Yet Implemented

  • Hash Functions: SHA-256 and tagged hash implementations
  • Optimized Multiplication: Full constant-time field multiplication
  • Precomputed Tables: Optimized scalar multiplication with precomputed points
  • Optional Modules: Schnorr signatures, ECDH, extra keys
  • Recovery: Public key recovery from signatures
  • Complete ECDSA: Full constant-time ECDSA implementation

Security Considerations

⚠️ This implementation is for educational/development purposes and should not be used in production without further security review and completion of the cryptographic implementations.

Key security features implemented:

  • Constant-time field operations (basic level)
  • Magnitude tracking to prevent overflows
  • Memory clearing for sensitive data
  • Context randomization support

Key security features still needed:

  • Complete constant-time ECDSA implementation
  • Proper nonce generation (RFC 6979)
  • Side-channel resistance verification
  • Comprehensive security testing

Building and Testing

cd pkg/
go test -v          # Run all tests
go test -bench=.    # Run benchmarks
go build            # Build the package

License

This implementation is derived from libsecp256k1 and maintains the same MIT license.

Description
secp256k1 Go Implementation
Readme 12 MiB
Languages
C 86.3%
Go 12%
Assembly 1.6%