Files
p256k1/README.md
2025-11-28 16:35:08 +00:00

252 lines
9.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# secp256k1 Go Implementation
This package provides a pure Go implementation of the secp256k1 elliptic curve cryptographic primitives, ported from the libsecp256k1 C library.
## Features Implemented
### ✅ Core Components
- **Field Arithmetic** (`field.go`, `field_mul.go`): Complete implementation of field operations modulo the secp256k1 field prime (2^256 - 2^32 - 977)
- 5x52-bit limb representation for efficient arithmetic
- Addition, multiplication, squaring, inversion operations
- Constant-time normalization and magnitude management
- **Scalar Arithmetic** (`scalar.go`): Complete implementation of scalar operations modulo the group order
- 4x64-bit limb representation
- Addition, multiplication, inversion, negation operations
- Proper overflow handling and reduction
- **Group Operations** (`group.go`): Elliptic curve point operations
- Affine and Jacobian coordinate representations
- Point addition, doubling, negation
- Coordinate conversion between representations
- **Context Management** (`context.go`): Context objects for enhanced security
- Context creation, cloning, destruction
- Randomization for side-channel protection
- Callback management for error handling
- **Main API** (`secp256k1.go`): Core secp256k1 API functions
- Public key parsing, serialization, and comparison
- ECDSA signature parsing and serialization
- Key generation and verification
- Basic ECDSA signing and verification (simplified implementation)
- **Utilities** (`util.go`): Helper functions and constants
- Memory management utilities
- Endianness conversion functions
- Bit manipulation utilities
- Error handling and callbacks
### ✅ Testing
- Comprehensive test suite (`secp256k1_test.go`) covering:
- Basic functionality and self-tests
- Field element operations
- Scalar operations
- Key generation
- Signature operations
- Public key operations
- Performance benchmarks
## Usage
```go
package main
import (
"fmt"
"crypto/rand"
p256k1 "p256k1.mleku.dev/pkg"
)
func main() {
// Create context
ctx, err := p256k1.ContextCreate(p256k1.ContextNone)
if err != nil {
panic(err)
}
defer p256k1.ContextDestroy(ctx)
// Generate secret key
var seckey [32]byte
rand.Read(seckey[:])
// Verify secret key
if !p256k1.ECSecKeyVerify(ctx, seckey[:]) {
panic("Invalid secret key")
}
// Create public key
var pubkey p256k1.PublicKey
if !p256k1.ECPubkeyCreate(ctx, &pubkey, seckey[:]) {
panic("Failed to create public key")
}
fmt.Println("Successfully created secp256k1 key pair!")
}
```
## Architecture
The implementation follows the same architectural patterns as libsecp256k1:
1. **Layered Design**: Low-level field/scalar arithmetic → Group operations → High-level API
2. **Constant-Time Operations**: Designed to prevent timing side-channel attacks
3. **Magnitude Tracking**: Field elements track their "magnitude" to optimize operations
4. **Context Objects**: Encapsulate state and provide enhanced security features
## Performance
Benchmark results on AMD Ryzen 5 PRO 4650G:
- Field Addition: ~2.4 ns/op
- Scalar Multiplication: ~9.9 ns/op
## AVX2 Acceleration Opportunities
The Scalar and FieldElement types and their operations are designed with data layouts that are amenable to AVX2 SIMD acceleration:
### Scalar Type (`scalar.go`)
- **Representation**: 4×64-bit limbs (`[4]uint64`) representing 256-bit scalars
- **AVX2-Acceleratable Operations**:
- `scalarAdd` / `scalarMul`: 256-bit integer arithmetic using `VPADDD/Q`, `VPMULUDQ`
- `mul512`: Full 512-bit product computation - can use AVX2's 256-bit registers to process limb pairs in parallel
- `reduce512`: Modular reduction with Montgomery-style operations
- `wNAF`: Window Non-Adjacent Form conversion for scalar multiplication
- `splitLambda`: GLV endomorphism scalar splitting
### FieldElement Type (`field.go`, `field_mul.go`)
- **Representation**: 5×52-bit limbs (`[5]uint64`) in base 2^52 for efficient multiplication
- **AVX2-Acceleratable Operations**:
- `mul` / `sqr`: Field multiplication/squaring using 128-bit intermediate products
- `normalize` / `normalizeWeak`: Carry propagation across limbs
- `add` / `negate`: Parallel limb operations ideal for `VPADDQ`, `VPSUBQ`
- `inv`: Modular inversion via Fermat's little theorem (chain of sqr/mul)
- `sqrt`: Square root computation using addition chains
### Affine/Jacobian Group Operations (`group.go`)
- **Types**: `GroupElementAffine` (x, y coordinates), `GroupElementJacobian` (x, y, z coordinates)
- **AVX2-Acceleratable Operations**:
- `double`: Point doubling - multiple independent field operations
- `addVar` / `addGE`: Point addition - parallelizable field multiplications
- `setGEJ`: Coordinate conversion with batch field inversions
### Key AVX2 Instructions for Implementation
| Operation | Relevant AVX2 Instructions |
|-----------|---------------------------|
| 128-bit limb add | `VPADDQ` (packed 64-bit add) with carry chain |
| Limb multiplication | `VPMULUDQ` (unsigned 32×32→64), `VPCLMULQDQ` (carryless multiply) |
| 128-bit arithmetic | `VPMULLD`, `VPMULUDQ` for multi-precision products |
| Carry propagation | `VPSRLQ`/`VPSLLQ` (shift), `VPAND` (mask), `VPALIGNR` |
| Conditional moves | `VPBLENDVB` (blend based on mask) |
| Data movement | `VMOVDQU` (unaligned load/store), `VBROADCASTI128` |
### 128-bit Limb Representation with AVX2
AVX2's 256-bit YMM registers can natively hold two 128-bit limbs, enabling more efficient representations:
**Scalar (256-bit) with 2×128-bit limbs:**
```
YMM0 = [scalar.d[1]:scalar.d[0]] | [scalar.d[3]:scalar.d[2]]
├── 128-bit limb 0 ───────┤ ├── 128-bit limb 1 ───────┤
```
- A single 256-bit scalar fits in one YMM register as two 128-bit limbs
- Addition/subtraction can use `VPADDQ` with manual carry handling between 64-bit halves
- The 4×64-bit representation naturally maps to 2×128-bit by treating pairs
**FieldElement (260-bit effective) with 128-bit limbs:**
```
YMM0 = [fe.n[0]:fe.n[1]] (lower 104 bits used per pair)
YMM1 = [fe.n[2]:fe.n[3]]
XMM2 = [fe.n[4]:0] (upper 48 bits)
```
- 5×52-bit limbs can be reorganized into 3×128-bit containers
- Multiplication benefits from `VPMULUDQ` processing two 64×64→128 products simultaneously
**512-bit Intermediate Products:**
- Scalar multiplication produces 512-bit intermediates
- Two YMM registers hold the full product: `YMM0 = [l[1]:l[0]], YMM1 = [l[3]:l[2]], YMM2 = [l[5]:l[4]], YMM3 = [l[7]:l[6]]`
- Reduction can proceed in parallel across register pairs
### Implementation Approach
AVX2 acceleration can be added via Go assembly (`.s` files) using the patterns described in `AVX.md`:
```go
//go:build amd64
package p256k1
// FieldMulAVX2 multiplies two field elements using AVX2
// Uses 128-bit limb operations for ~2x throughput
//go:noescape
func FieldMulAVX2(r, a, b *FieldElement)
// ScalarMulAVX2 multiplies two scalars using AVX2
// Processes scalar as 2×128-bit limbs in a single YMM register
//go:noescape
func ScalarMulAVX2(r, a, b *Scalar)
// ScalarAdd256AVX2 adds two 256-bit scalars using 128-bit limb arithmetic
//go:noescape
func ScalarAdd256AVX2(r, a, b *Scalar) bool
```
The key insight is that AVX2's 256-bit registers holding 128-bit limb pairs enable:
- **2x parallelism** for addition/subtraction across limb pairs
- **Efficient carry chains** using `VPSRLQ` to extract carries and `VPADDQ` to propagate
- **Reduced loop iterations** for multi-precision arithmetic (2 iterations for 256-bit instead of 4)
## Implementation Status
### ✅ Completed
- Core field and scalar arithmetic
- Basic group operations
- Context management
- Main API structure
- Key generation and verification
- Basic signature operations
- Comprehensive test suite
### 🚧 Simplified/Placeholder
- **ECDSA Implementation**: Basic structure in place, but signing/verification uses simplified algorithms
- **Field Multiplication**: Uses simplified approach instead of optimized assembly
- **Point Validation**: Curve equation checking is simplified
- **Nonce Generation**: Uses crypto/rand instead of RFC 6979
### ❌ Not Yet Implemented
- **Hash Functions**: SHA-256 and tagged hash implementations
- **Optimized Multiplication**: Full constant-time field multiplication
- **Precomputed Tables**: Optimized scalar multiplication with precomputed points
- **Optional Modules**: Schnorr signatures, ECDH, extra keys
- **Recovery**: Public key recovery from signatures
- **Complete ECDSA**: Full constant-time ECDSA implementation
## Security Considerations
⚠️ **This implementation is for educational/development purposes and should not be used in production without further security review and completion of the cryptographic implementations.**
Key security features implemented:
- Constant-time field operations (basic level)
- Magnitude tracking to prevent overflows
- Memory clearing for sensitive data
- Context randomization support
Key security features still needed:
- Complete constant-time ECDSA implementation
- Proper nonce generation (RFC 6979)
- Side-channel resistance verification
- Comprehensive security testing
## Building and Testing
```bash
cd pkg/
go test -v # Run all tests
go test -bench=. # Run benchmarks
go build # Build the package
```
## License
This implementation is derived from libsecp256k1 and maintains the same MIT license.