Decompose handle-event.go into DDD domain services (v0.36.15)
Some checks failed
Go / build-and-release (push) Has been cancelled

Major refactoring of event handling into clean, testable domain services:

- Add pkg/event/validation: JSON hex validation, signature verification,
  timestamp bounds, NIP-70 protected tag validation
- Add pkg/event/authorization: Policy and ACL authorization decisions,
  auth challenge handling, access level determination
- Add pkg/event/routing: Event router registry with ephemeral and delete
  handlers, kind-based dispatch
- Add pkg/event/processing: Event persistence, delivery to subscribers,
  and post-save hooks (ACL reconfig, sync, relay groups)
- Reduce handle-event.go from 783 to 296 lines (62% reduction)
- Add comprehensive unit tests for all new domain services
- Refactor database tests to use shared TestMain setup
- Fix blossom URL test expectations (missing "/" separator)
- Add go-memory-optimization skill and analysis documentation
- Update DDD_ANALYSIS.md to reflect completed decomposition

Files modified:
- app/handle-event.go: Slim orchestrator using domain services
- app/server.go: Service initialization and interface wrappers
- app/handle-event-types.go: Shared types (OkHelper, result types)
- pkg/event/validation/*: New validation service package
- pkg/event/authorization/*: New authorization service package
- pkg/event/routing/*: New routing service package
- pkg/event/processing/*: New processing service package
- pkg/database/*_test.go: Refactored to shared TestMain
- pkg/blossom/http_test.go: Fixed URL format expectations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-25 05:30:07 +01:00
parent 3e0a94a053
commit 24383ef1f4
42 changed files with 4791 additions and 2118 deletions

View File

@@ -0,0 +1,366 @@
# ORLY Relay Memory Optimization Analysis
This document analyzes ORLY's current memory optimization patterns against Go best practices for high-performance systems. The analysis covers buffer management, caching strategies, allocation patterns, and identifies optimization opportunities.
## Executive Summary
ORLY implements several sophisticated memory optimization strategies:
- **Compact event storage** achieving ~87% space savings via serial references
- **Two-level caching** for serial lookups and query results
- **ZSTD compression** for query cache with LRU eviction
- **Atomic operations** for lock-free statistics tracking
- **Pre-allocation patterns** for slice capacity management
However, several opportunities exist to further reduce GC pressure:
- Implement `sync.Pool` for frequently allocated buffers
- Use fixed-size arrays for cryptographic values
- Pool `bytes.Buffer` instances in hot paths
- Optimize escape behavior in serialization code
---
## Current Memory Patterns
### 1. Compact Event Storage
**Location**: `pkg/database/compact_event.go`
ORLY's most significant memory optimization is the compact binary format for event storage:
```
Original event: 32 (ID) + 32 (pubkey) + 32*4 (tags) = 192+ bytes
Compact format: 5 (pubkey serial) + 5*4 (tag serials) = 25 bytes
Savings: ~87% compression per event
```
**Key techniques:**
- 5-byte serial references replace 32-byte IDs/pubkeys
- Varint encoding for variable-length integers (CreatedAt, tag counts)
- Type flags for efficient deserialization
- Separate `SerialEventId` index for ID reconstruction
**Assessment**: Excellent storage optimization. This dramatically reduces database size and I/O costs.
### 2. Serial Cache System
**Location**: `pkg/database/serial_cache.go`
Two-way lookup cache for serial ↔ ID/pubkey mappings:
```go
type SerialCache struct {
pubkeyBySerial map[uint64][]byte // For decoding
serialByPubkeyHash map[string]uint64 // For encoding
eventIdBySerial map[uint64][]byte // For decoding
serialByEventIdHash map[string]uint64 // For encoding
}
```
**Memory footprint:**
- Pubkey cache: 100k entries × 32 bytes ≈ 3.2MB
- Event ID cache: 500k entries × 32 bytes ≈ 16MB
- Total: ~19-20MB overhead
**Strengths:**
- Fine-grained `RWMutex` locking per direction/type
- Configurable cache limits
- Defensive copying prevents external mutations
**Improvement opportunity:** The eviction strategy (clear 50% when full) is simple but not LRU. Consider ring buffers or generational caching for better hit rates.
### 3. Query Cache with ZSTD Compression
**Location**: `pkg/database/querycache/event_cache.go`
```go
type EventCache struct {
entries map[string]*EventCacheEntry
lruList *list.List
encoder *zstd.Encoder // Reused encoder (level 9)
decoder *zstd.Decoder // Reused decoder
maxSize int64 // Default 512MB compressed
}
```
**Strengths:**
- ZSTD level 9 compression (best ratio)
- Encoder/decoder reuse avoids repeated initialization
- LRU eviction with proper size tracking
- Background cleanup of expired entries
- Tracks compression ratio with exponential moving average
**Memory pattern:** Stores compressed data in cache, decompresses on-demand. This trades CPU for memory.
### 4. Buffer Allocation Patterns
**Current approach:** Uses `new(bytes.Buffer)` throughout serialization code:
```go
// pkg/database/save-event.go, compact_event.go, serial_cache.go
buf := new(bytes.Buffer)
// ... encode data
return buf.Bytes()
```
**Assessment:** Each call allocates a new buffer on the heap. For high-throughput scenarios (thousands of events/second), this creates significant GC pressure.
---
## Optimization Opportunities
### 1. Implement sync.Pool for Buffer Reuse
**Priority: High**
Currently, ORLY creates new `bytes.Buffer` instances for every serialization operation. A buffer pool would amortize allocation costs:
```go
// Recommended implementation
var bufferPool = sync.Pool{
New: func() interface{} {
return bytes.NewBuffer(make([]byte, 0, 4096))
},
}
func getBuffer() *bytes.Buffer {
return bufferPool.Get().(*bytes.Buffer)
}
func putBuffer(buf *bytes.Buffer) {
buf.Reset()
bufferPool.Put(buf)
}
```
**Impact areas:**
- `pkg/database/compact_event.go` - MarshalCompactEvent, encodeCompactTag
- `pkg/database/save-event.go` - index key generation
- `pkg/database/serial_cache.go` - GetEventIdBySerial, StoreEventIdSerial
**Expected benefit:** 50-80% reduction in buffer allocations on hot paths.
### 2. Fixed-Size Array Types for Cryptographic Values
**Priority: Medium**
The external nostr library uses `[]byte` slices for IDs, pubkeys, and signatures. However, these are always fixed sizes:
| Type | Size | Current | Recommended |
|------|------|---------|-------------|
| Event ID | 32 bytes | `[]byte` | `[32]byte` |
| Pubkey | 32 bytes | `[]byte` | `[32]byte` |
| Signature | 64 bytes | `[]byte` | `[64]byte` |
Internal types like `Uint40` already follow this pattern but use struct wrapping:
```go
// Current (pkg/database/indexes/types/uint40.go)
type Uint40 struct{ value uint64 }
// Already efficient - no slice allocation
```
For cryptographic values, consider wrapper types:
```go
type EventID [32]byte
type Pubkey [32]byte
type Signature [64]byte
func (id EventID) IsZero() bool { return id == EventID{} }
func (id EventID) Hex() string { return hex.Enc(id[:]) }
```
**Benefit:** Stack allocation for local variables, zero-value comparison efficiency.
### 3. Pre-allocated Slice Patterns
**Current usage is good:**
```go
// pkg/database/save-event.go:51-54
sers = make(types.Uint40s, 0, len(idxs)*100) // Estimate 100 serials per index
// pkg/database/compact_event.go:283
ev.Tags = tag.NewSWithCap(int(nTags)) // Pre-allocate tag slice
```
**Improvement:** Apply consistently to:
- `Uint40s.Union/Intersection/Difference` methods (currently use `append` without capacity hints)
- Query result accumulation in `query-events.go`
### 4. Escape Analysis Optimization
**Priority: Medium**
Several patterns cause unnecessary heap escapes. Check with:
```bash
go build -gcflags="-m -m" ./pkg/database/...
```
**Common escape causes in codebase:**
```go
// compact_event.go:224 - Small slice escapes
buf := make([]byte, 5) // Could be [5]byte on stack
// compact_event.go:335 - Single-byte slice escapes
typeBuf := make([]byte, 1) // Could be var typeBuf [1]byte
```
**Fix:**
```go
func readUint40(r io.Reader) (value uint64, err error) {
var buf [5]byte // Stack-allocated
if _, err = io.ReadFull(r, buf[:]); err != nil {
return 0, err
}
// ...
}
```
### 5. Atomic Bytes Wrapper Optimization
**Location**: `pkg/utils/atomic/bytes.go`
Current implementation copies on both Load and Store:
```go
func (x *Bytes) Load() (b []byte) {
vb := x.v.Load().([]byte)
b = make([]byte, len(vb)) // Allocation on every Load
copy(b, vb)
return
}
```
This is safe but expensive for high-frequency access. Consider:
- Read-copy-update (RCU) pattern for read-heavy workloads
- `sync.RWMutex` with direct access for controlled use cases
### 6. Goroutine Management
**Current patterns:**
- Worker goroutines for message processing (`app/listener.go`)
- Background cleanup goroutines (`querycache/event_cache.go`)
- Pinger goroutines per connection (`app/handle-websocket.go`)
**Assessment:** Good use of bounded channels and `sync.WaitGroup` for lifecycle management.
**Improvement:** Consider a worker pool for subscription handlers to limit peak goroutine count:
```go
type WorkerPool struct {
jobs chan func()
workers int
wg sync.WaitGroup
}
```
---
## Memory Budget Analysis
### Runtime Memory Breakdown
| Component | Estimated Size | Notes |
|-----------|---------------|-------|
| Serial Cache (pubkeys) | 3.2 MB | 100k × 32 bytes |
| Serial Cache (event IDs) | 16 MB | 500k × 32 bytes |
| Query Cache | 512 MB | Configurable, compressed |
| Per-connection state | ~10 KB | Channels, buffers, maps |
| Badger DB caches | Variable | Controlled by Badger config |
### GC Tuning Recommendations
For a relay handling 1000+ events/second:
```go
// main.go or init
import "runtime/debug"
func init() {
// More aggressive GC to limit heap growth
debug.SetGCPercent(50) // GC at 50% heap growth (default 100)
// Set soft memory limit based on available RAM
debug.SetMemoryLimit(2 << 30) // 2GB limit
}
```
Or via environment:
```bash
GOGC=50 GOMEMLIMIT=2GiB ./orly
```
---
## Profiling Commands
### Heap Profile
```bash
# Enable pprof (already supported)
ORLY_PPROF_HTTP=true ./orly
# Capture heap profile
go tool pprof http://localhost:6060/debug/pprof/heap
# Analyze allocations
go tool pprof -alloc_space heap.prof
go tool pprof -inuse_space heap.prof
```
### Escape Analysis
```bash
# Check which variables escape to heap
go build -gcflags="-m -m" ./pkg/database/... 2>&1 | grep "escapes to heap"
```
### Allocation Benchmarks
Add to existing benchmarks:
```go
func BenchmarkCompactMarshal(b *testing.B) {
b.ReportAllocs()
ev := createTestEvent()
resolver := &testResolver{}
b.ResetTimer()
for i := 0; i < b.N; i++ {
data, _ := MarshalCompactEvent(ev, resolver)
_ = data
}
}
```
---
## Implementation Priority
1. **High Priority (Immediate Impact)**
- Implement `sync.Pool` for `bytes.Buffer` in serialization paths
- Replace small `make([]byte, n)` with fixed arrays in decode functions
2. **Medium Priority (Significant Improvement)**
- Add pre-allocation hints to set operation methods
- Optimize escape behavior in compact event encoding
- Consider worker pool for subscription handlers
3. **Low Priority (Refinement)**
- LRU-based serial cache eviction
- Fixed-size types for cryptographic values (requires nostr library changes)
- RCU pattern for atomic bytes in high-frequency paths
---
## Conclusion
ORLY demonstrates thoughtful memory optimization in its storage layer, particularly the compact event format achieving 87% space savings. The dual-cache architecture (serial cache + query cache) balances memory usage with lookup performance.
The primary opportunity for improvement is in the serialization hot path, where buffer pooling could significantly reduce GC pressure. The recommended `sync.Pool` implementation would have immediate benefits for high-throughput deployments without requiring architectural changes.
Secondary improvements around escape analysis and fixed-size types would provide incremental gains and should be prioritized based on profiling data from production workloads.