secp256k1 Go Implementation
This package provides a pure Go implementation of the secp256k1 elliptic curve cryptographic primitives, ported from the libsecp256k1 C library.
Features Implemented
✅ Core Components
-
Field Arithmetic (
field.go,field_mul.go): Complete implementation of field operations modulo the secp256k1 field prime (2^256 - 2^32 - 977)- 5x52-bit limb representation for efficient arithmetic
- Addition, multiplication, squaring, inversion operations
- Constant-time normalization and magnitude management
-
Scalar Arithmetic (
scalar.go): Complete implementation of scalar operations modulo the group order- 4x64-bit limb representation
- Addition, multiplication, inversion, negation operations
- Proper overflow handling and reduction
-
Group Operations (
group.go): Elliptic curve point operations- Affine and Jacobian coordinate representations
- Point addition, doubling, negation
- Coordinate conversion between representations
-
Context Management (
context.go): Context objects for enhanced security- Context creation, cloning, destruction
- Randomization for side-channel protection
- Callback management for error handling
-
Main API (
secp256k1.go): Core secp256k1 API functions- Public key parsing, serialization, and comparison
- ECDSA signature parsing and serialization
- Key generation and verification
- Basic ECDSA signing and verification (simplified implementation)
-
Utilities (
util.go): Helper functions and constants- Memory management utilities
- Endianness conversion functions
- Bit manipulation utilities
- Error handling and callbacks
✅ Testing
- Comprehensive test suite (
secp256k1_test.go) covering:- Basic functionality and self-tests
- Field element operations
- Scalar operations
- Key generation
- Signature operations
- Public key operations
- Performance benchmarks
Usage
package main
import (
"fmt"
"crypto/rand"
p256k1 "p256k1.mleku.dev/pkg"
)
func main() {
// Create context
ctx, err := p256k1.ContextCreate(p256k1.ContextNone)
if err != nil {
panic(err)
}
defer p256k1.ContextDestroy(ctx)
// Generate secret key
var seckey [32]byte
rand.Read(seckey[:])
// Verify secret key
if !p256k1.ECSecKeyVerify(ctx, seckey[:]) {
panic("Invalid secret key")
}
// Create public key
var pubkey p256k1.PublicKey
if !p256k1.ECPubkeyCreate(ctx, &pubkey, seckey[:]) {
panic("Failed to create public key")
}
fmt.Println("Successfully created secp256k1 key pair!")
}
Architecture
The implementation follows the same architectural patterns as libsecp256k1:
- Layered Design: Low-level field/scalar arithmetic → Group operations → High-level API
- Constant-Time Operations: Designed to prevent timing side-channel attacks
- Magnitude Tracking: Field elements track their "magnitude" to optimize operations
- Context Objects: Encapsulate state and provide enhanced security features
Performance
Benchmark results on AMD Ryzen 5 PRO 4650G:
- Field Addition: ~2.4 ns/op
- Scalar Multiplication: ~9.9 ns/op
AVX2 Acceleration Opportunities
The Scalar and FieldElement types and their operations are designed with data layouts that are amenable to AVX2 SIMD acceleration:
Scalar Type (scalar.go)
- Representation: 4×64-bit limbs (
[4]uint64) representing 256-bit scalars - AVX2-Acceleratable Operations:
scalarAdd/scalarMul: 256-bit integer arithmetic usingVPADDD/Q,VPMULUDQmul512: Full 512-bit product computation - can use AVX2's 256-bit registers to process limb pairs in parallelreduce512: Modular reduction with Montgomery-style operationswNAF: Window Non-Adjacent Form conversion for scalar multiplicationsplitLambda: GLV endomorphism scalar splitting
FieldElement Type (field.go, field_mul.go)
- Representation: 5×52-bit limbs (
[5]uint64) in base 2^52 for efficient multiplication - AVX2-Acceleratable Operations:
mul/sqr: Field multiplication/squaring using 128-bit intermediate productsnormalize/normalizeWeak: Carry propagation across limbsadd/negate: Parallel limb operations ideal forVPADDQ,VPSUBQinv: Modular inversion via Fermat's little theorem (chain of sqr/mul)sqrt: Square root computation using addition chains
Affine/Jacobian Group Operations (group.go)
- Types:
GroupElementAffine(x, y coordinates),GroupElementJacobian(x, y, z coordinates) - AVX2-Acceleratable Operations:
double: Point doubling - multiple independent field operationsaddVar/addGE: Point addition - parallelizable field multiplicationssetGEJ: Coordinate conversion with batch field inversions
Key AVX2 Instructions for Implementation
| Operation | Relevant AVX2 Instructions |
|---|---|
| 128-bit limb add | VPADDQ (packed 64-bit add) with carry chain |
| Limb multiplication | VPMULUDQ (unsigned 32×32→64), VPCLMULQDQ (carryless multiply) |
| 128-bit arithmetic | VPMULLD, VPMULUDQ for multi-precision products |
| Carry propagation | VPSRLQ/VPSLLQ (shift), VPAND (mask), VPALIGNR |
| Conditional moves | VPBLENDVB (blend based on mask) |
| Data movement | VMOVDQU (unaligned load/store), VBROADCASTI128 |
128-bit Limb Representation with AVX2
AVX2's 256-bit YMM registers can natively hold two 128-bit limbs, enabling more efficient representations:
Scalar (256-bit) with 2×128-bit limbs:
YMM0 = [scalar.d[1]:scalar.d[0]] | [scalar.d[3]:scalar.d[2]]
├── 128-bit limb 0 ───────┤ ├── 128-bit limb 1 ───────┤
- A single 256-bit scalar fits in one YMM register as two 128-bit limbs
- Addition/subtraction can use
VPADDQwith manual carry handling between 64-bit halves - The 4×64-bit representation naturally maps to 2×128-bit by treating pairs
FieldElement (260-bit effective) with 128-bit limbs:
YMM0 = [fe.n[0]:fe.n[1]] (lower 104 bits used per pair)
YMM1 = [fe.n[2]:fe.n[3]]
XMM2 = [fe.n[4]:0] (upper 48 bits)
- 5×52-bit limbs can be reorganized into 3×128-bit containers
- Multiplication benefits from
VPMULUDQprocessing two 64×64→128 products simultaneously
512-bit Intermediate Products:
- Scalar multiplication produces 512-bit intermediates
- Two YMM registers hold the full product:
YMM0 = [l[1]:l[0]], YMM1 = [l[3]:l[2]], YMM2 = [l[5]:l[4]], YMM3 = [l[7]:l[6]] - Reduction can proceed in parallel across register pairs
Implementation Approach
AVX2 acceleration can be added via Go assembly (.s files) using the patterns described in AVX.md:
//go:build amd64
package p256k1
// FieldMulAVX2 multiplies two field elements using AVX2
// Uses 128-bit limb operations for ~2x throughput
//go:noescape
func FieldMulAVX2(r, a, b *FieldElement)
// ScalarMulAVX2 multiplies two scalars using AVX2
// Processes scalar as 2×128-bit limbs in a single YMM register
//go:noescape
func ScalarMulAVX2(r, a, b *Scalar)
// ScalarAdd256AVX2 adds two 256-bit scalars using 128-bit limb arithmetic
//go:noescape
func ScalarAdd256AVX2(r, a, b *Scalar) bool
The key insight is that AVX2's 256-bit registers holding 128-bit limb pairs enable:
- 2x parallelism for addition/subtraction across limb pairs
- Efficient carry chains using
VPSRLQto extract carries andVPADDQto propagate - Reduced loop iterations for multi-precision arithmetic (2 iterations for 256-bit instead of 4)
Implementation Status
✅ Completed
- Core field and scalar arithmetic
- Basic group operations
- Context management
- Main API structure
- Key generation and verification
- Basic signature operations
- Comprehensive test suite
🚧 Simplified/Placeholder
- ECDSA Implementation: Basic structure in place, but signing/verification uses simplified algorithms
- Field Multiplication: Uses simplified approach instead of optimized assembly
- Point Validation: Curve equation checking is simplified
- Nonce Generation: Uses crypto/rand instead of RFC 6979
❌ Not Yet Implemented
- Hash Functions: SHA-256 and tagged hash implementations
- Optimized Multiplication: Full constant-time field multiplication
- Precomputed Tables: Optimized scalar multiplication with precomputed points
- Optional Modules: Schnorr signatures, ECDH, extra keys
- Recovery: Public key recovery from signatures
- Complete ECDSA: Full constant-time ECDSA implementation
Security Considerations
⚠️ This implementation is for educational/development purposes and should not be used in production without further security review and completion of the cryptographic implementations.
Key security features implemented:
- Constant-time field operations (basic level)
- Magnitude tracking to prevent overflows
- Memory clearing for sensitive data
- Context randomization support
Key security features still needed:
- Complete constant-time ECDSA implementation
- Proper nonce generation (RFC 6979)
- Side-channel resistance verification
- Comprehensive security testing
Building and Testing
cd pkg/
go test -v # Run all tests
go test -bench=. # Run benchmarks
go build # Build the package
License
This implementation is derived from libsecp256k1 and maintains the same MIT license.