252 lines
9.3 KiB
Markdown
252 lines
9.3 KiB
Markdown
# secp256k1 Go Implementation
|
||
|
||
This package provides a pure Go implementation of the secp256k1 elliptic curve cryptographic primitives, ported from the libsecp256k1 C library.
|
||
|
||
## Features Implemented
|
||
|
||
### ✅ Core Components
|
||
- **Field Arithmetic** (`field.go`, `field_mul.go`): Complete implementation of field operations modulo the secp256k1 field prime (2^256 - 2^32 - 977)
|
||
- 5x52-bit limb representation for efficient arithmetic
|
||
- Addition, multiplication, squaring, inversion operations
|
||
- Constant-time normalization and magnitude management
|
||
|
||
- **Scalar Arithmetic** (`scalar.go`): Complete implementation of scalar operations modulo the group order
|
||
- 4x64-bit limb representation
|
||
- Addition, multiplication, inversion, negation operations
|
||
- Proper overflow handling and reduction
|
||
|
||
- **Group Operations** (`group.go`): Elliptic curve point operations
|
||
- Affine and Jacobian coordinate representations
|
||
- Point addition, doubling, negation
|
||
- Coordinate conversion between representations
|
||
|
||
- **Context Management** (`context.go`): Context objects for enhanced security
|
||
- Context creation, cloning, destruction
|
||
- Randomization for side-channel protection
|
||
- Callback management for error handling
|
||
|
||
- **Main API** (`secp256k1.go`): Core secp256k1 API functions
|
||
- Public key parsing, serialization, and comparison
|
||
- ECDSA signature parsing and serialization
|
||
- Key generation and verification
|
||
- Basic ECDSA signing and verification (simplified implementation)
|
||
|
||
- **Utilities** (`util.go`): Helper functions and constants
|
||
- Memory management utilities
|
||
- Endianness conversion functions
|
||
- Bit manipulation utilities
|
||
- Error handling and callbacks
|
||
|
||
### ✅ Testing
|
||
- Comprehensive test suite (`secp256k1_test.go`) covering:
|
||
- Basic functionality and self-tests
|
||
- Field element operations
|
||
- Scalar operations
|
||
- Key generation
|
||
- Signature operations
|
||
- Public key operations
|
||
- Performance benchmarks
|
||
|
||
## Usage
|
||
|
||
```go
|
||
package main
|
||
|
||
import (
|
||
"fmt"
|
||
"crypto/rand"
|
||
p256k1 "p256k1.mleku.dev/pkg"
|
||
)
|
||
|
||
func main() {
|
||
// Create context
|
||
ctx, err := p256k1.ContextCreate(p256k1.ContextNone)
|
||
if err != nil {
|
||
panic(err)
|
||
}
|
||
defer p256k1.ContextDestroy(ctx)
|
||
|
||
// Generate secret key
|
||
var seckey [32]byte
|
||
rand.Read(seckey[:])
|
||
|
||
// Verify secret key
|
||
if !p256k1.ECSecKeyVerify(ctx, seckey[:]) {
|
||
panic("Invalid secret key")
|
||
}
|
||
|
||
// Create public key
|
||
var pubkey p256k1.PublicKey
|
||
if !p256k1.ECPubkeyCreate(ctx, &pubkey, seckey[:]) {
|
||
panic("Failed to create public key")
|
||
}
|
||
|
||
fmt.Println("Successfully created secp256k1 key pair!")
|
||
}
|
||
```
|
||
|
||
## Architecture
|
||
|
||
The implementation follows the same architectural patterns as libsecp256k1:
|
||
|
||
1. **Layered Design**: Low-level field/scalar arithmetic → Group operations → High-level API
|
||
2. **Constant-Time Operations**: Designed to prevent timing side-channel attacks
|
||
3. **Magnitude Tracking**: Field elements track their "magnitude" to optimize operations
|
||
4. **Context Objects**: Encapsulate state and provide enhanced security features
|
||
|
||
## Performance
|
||
|
||
Benchmark results on AMD Ryzen 5 PRO 4650G:
|
||
- Field Addition: ~2.4 ns/op
|
||
- Scalar Multiplication: ~9.9 ns/op
|
||
|
||
## AVX2 Acceleration Opportunities
|
||
|
||
The Scalar and FieldElement types and their operations are designed with data layouts that are amenable to AVX2 SIMD acceleration:
|
||
|
||
### Scalar Type (`scalar.go`)
|
||
- **Representation**: 4×64-bit limbs (`[4]uint64`) representing 256-bit scalars
|
||
- **AVX2-Acceleratable Operations**:
|
||
- `scalarAdd` / `scalarMul`: 256-bit integer arithmetic using `VPADDD/Q`, `VPMULUDQ`
|
||
- `mul512`: Full 512-bit product computation - can use AVX2's 256-bit registers to process limb pairs in parallel
|
||
- `reduce512`: Modular reduction with Montgomery-style operations
|
||
- `wNAF`: Window Non-Adjacent Form conversion for scalar multiplication
|
||
- `splitLambda`: GLV endomorphism scalar splitting
|
||
|
||
### FieldElement Type (`field.go`, `field_mul.go`)
|
||
- **Representation**: 5×52-bit limbs (`[5]uint64`) in base 2^52 for efficient multiplication
|
||
- **AVX2-Acceleratable Operations**:
|
||
- `mul` / `sqr`: Field multiplication/squaring using 128-bit intermediate products
|
||
- `normalize` / `normalizeWeak`: Carry propagation across limbs
|
||
- `add` / `negate`: Parallel limb operations ideal for `VPADDQ`, `VPSUBQ`
|
||
- `inv`: Modular inversion via Fermat's little theorem (chain of sqr/mul)
|
||
- `sqrt`: Square root computation using addition chains
|
||
|
||
### Affine/Jacobian Group Operations (`group.go`)
|
||
- **Types**: `GroupElementAffine` (x, y coordinates), `GroupElementJacobian` (x, y, z coordinates)
|
||
- **AVX2-Acceleratable Operations**:
|
||
- `double`: Point doubling - multiple independent field operations
|
||
- `addVar` / `addGE`: Point addition - parallelizable field multiplications
|
||
- `setGEJ`: Coordinate conversion with batch field inversions
|
||
|
||
### Key AVX2 Instructions for Implementation
|
||
|
||
| Operation | Relevant AVX2 Instructions |
|
||
|-----------|---------------------------|
|
||
| 128-bit limb add | `VPADDQ` (packed 64-bit add) with carry chain |
|
||
| Limb multiplication | `VPMULUDQ` (unsigned 32×32→64), `VPCLMULQDQ` (carryless multiply) |
|
||
| 128-bit arithmetic | `VPMULLD`, `VPMULUDQ` for multi-precision products |
|
||
| Carry propagation | `VPSRLQ`/`VPSLLQ` (shift), `VPAND` (mask), `VPALIGNR` |
|
||
| Conditional moves | `VPBLENDVB` (blend based on mask) |
|
||
| Data movement | `VMOVDQU` (unaligned load/store), `VBROADCASTI128` |
|
||
|
||
### 128-bit Limb Representation with AVX2
|
||
|
||
AVX2's 256-bit YMM registers can natively hold two 128-bit limbs, enabling more efficient representations:
|
||
|
||
**Scalar (256-bit) with 2×128-bit limbs:**
|
||
```
|
||
YMM0 = [scalar.d[1]:scalar.d[0]] | [scalar.d[3]:scalar.d[2]]
|
||
├── 128-bit limb 0 ───────┤ ├── 128-bit limb 1 ───────┤
|
||
```
|
||
- A single 256-bit scalar fits in one YMM register as two 128-bit limbs
|
||
- Addition/subtraction can use `VPADDQ` with manual carry handling between 64-bit halves
|
||
- The 4×64-bit representation naturally maps to 2×128-bit by treating pairs
|
||
|
||
**FieldElement (260-bit effective) with 128-bit limbs:**
|
||
```
|
||
YMM0 = [fe.n[0]:fe.n[1]] (lower 104 bits used per pair)
|
||
YMM1 = [fe.n[2]:fe.n[3]]
|
||
XMM2 = [fe.n[4]:0] (upper 48 bits)
|
||
```
|
||
- 5×52-bit limbs can be reorganized into 3×128-bit containers
|
||
- Multiplication benefits from `VPMULUDQ` processing two 64×64→128 products simultaneously
|
||
|
||
**512-bit Intermediate Products:**
|
||
- Scalar multiplication produces 512-bit intermediates
|
||
- Two YMM registers hold the full product: `YMM0 = [l[1]:l[0]], YMM1 = [l[3]:l[2]], YMM2 = [l[5]:l[4]], YMM3 = [l[7]:l[6]]`
|
||
- Reduction can proceed in parallel across register pairs
|
||
|
||
### Implementation Approach
|
||
|
||
AVX2 acceleration can be added via Go assembly (`.s` files) using the patterns described in `AVX.md`:
|
||
|
||
```go
|
||
//go:build amd64
|
||
|
||
package p256k1
|
||
|
||
// FieldMulAVX2 multiplies two field elements using AVX2
|
||
// Uses 128-bit limb operations for ~2x throughput
|
||
//go:noescape
|
||
func FieldMulAVX2(r, a, b *FieldElement)
|
||
|
||
// ScalarMulAVX2 multiplies two scalars using AVX2
|
||
// Processes scalar as 2×128-bit limbs in a single YMM register
|
||
//go:noescape
|
||
func ScalarMulAVX2(r, a, b *Scalar)
|
||
|
||
// ScalarAdd256AVX2 adds two 256-bit scalars using 128-bit limb arithmetic
|
||
//go:noescape
|
||
func ScalarAdd256AVX2(r, a, b *Scalar) bool
|
||
```
|
||
|
||
The key insight is that AVX2's 256-bit registers holding 128-bit limb pairs enable:
|
||
- **2x parallelism** for addition/subtraction across limb pairs
|
||
- **Efficient carry chains** using `VPSRLQ` to extract carries and `VPADDQ` to propagate
|
||
- **Reduced loop iterations** for multi-precision arithmetic (2 iterations for 256-bit instead of 4)
|
||
|
||
## Implementation Status
|
||
|
||
### ✅ Completed
|
||
- Core field and scalar arithmetic
|
||
- Basic group operations
|
||
- Context management
|
||
- Main API structure
|
||
- Key generation and verification
|
||
- Basic signature operations
|
||
- Comprehensive test suite
|
||
|
||
### 🚧 Simplified/Placeholder
|
||
- **ECDSA Implementation**: Basic structure in place, but signing/verification uses simplified algorithms
|
||
- **Field Multiplication**: Uses simplified approach instead of optimized assembly
|
||
- **Point Validation**: Curve equation checking is simplified
|
||
- **Nonce Generation**: Uses crypto/rand instead of RFC 6979
|
||
|
||
### ❌ Not Yet Implemented
|
||
- **Hash Functions**: SHA-256 and tagged hash implementations
|
||
- **Optimized Multiplication**: Full constant-time field multiplication
|
||
- **Precomputed Tables**: Optimized scalar multiplication with precomputed points
|
||
- **Optional Modules**: Schnorr signatures, ECDH, extra keys
|
||
- **Recovery**: Public key recovery from signatures
|
||
- **Complete ECDSA**: Full constant-time ECDSA implementation
|
||
|
||
## Security Considerations
|
||
|
||
⚠️ **This implementation is for educational/development purposes and should not be used in production without further security review and completion of the cryptographic implementations.**
|
||
|
||
Key security features implemented:
|
||
- Constant-time field operations (basic level)
|
||
- Magnitude tracking to prevent overflows
|
||
- Memory clearing for sensitive data
|
||
- Context randomization support
|
||
|
||
Key security features still needed:
|
||
- Complete constant-time ECDSA implementation
|
||
- Proper nonce generation (RFC 6979)
|
||
- Side-channel resistance verification
|
||
- Comprehensive security testing
|
||
|
||
## Building and Testing
|
||
|
||
```bash
|
||
cd pkg/
|
||
go test -v # Run all tests
|
||
go test -bench=. # Run benchmarks
|
||
go build # Build the package
|
||
```
|
||
|
||
## License
|
||
|
||
This implementation is derived from libsecp256k1 and maintains the same MIT license.
|