Decompose handle-event.go into DDD domain services (v0.36.15)
Some checks failed
Go / build-and-release (push) Has been cancelled
Some checks failed
Go / build-and-release (push) Has been cancelled
Major refactoring of event handling into clean, testable domain services: - Add pkg/event/validation: JSON hex validation, signature verification, timestamp bounds, NIP-70 protected tag validation - Add pkg/event/authorization: Policy and ACL authorization decisions, auth challenge handling, access level determination - Add pkg/event/routing: Event router registry with ephemeral and delete handlers, kind-based dispatch - Add pkg/event/processing: Event persistence, delivery to subscribers, and post-save hooks (ACL reconfig, sync, relay groups) - Reduce handle-event.go from 783 to 296 lines (62% reduction) - Add comprehensive unit tests for all new domain services - Refactor database tests to use shared TestMain setup - Fix blossom URL test expectations (missing "/" separator) - Add go-memory-optimization skill and analysis documentation - Update DDD_ANALYSIS.md to reflect completed decomposition Files modified: - app/handle-event.go: Slim orchestrator using domain services - app/server.go: Service initialization and interface wrappers - app/handle-event-types.go: Shared types (OkHelper, result types) - pkg/event/validation/*: New validation service package - pkg/event/authorization/*: New authorization service package - pkg/event/routing/*: New routing service package - pkg/event/processing/*: New processing service package - pkg/database/*_test.go: Refactored to shared TestMain - pkg/blossom/http_test.go: Fixed URL format expectations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
366
docs/MEMORY_OPTIMIZATION_ANALYSIS.md
Normal file
366
docs/MEMORY_OPTIMIZATION_ANALYSIS.md
Normal file
@@ -0,0 +1,366 @@
|
||||
# ORLY Relay Memory Optimization Analysis
|
||||
|
||||
This document analyzes ORLY's current memory optimization patterns against Go best practices for high-performance systems. The analysis covers buffer management, caching strategies, allocation patterns, and identifies optimization opportunities.
|
||||
|
||||
## Executive Summary
|
||||
|
||||
ORLY implements several sophisticated memory optimization strategies:
|
||||
- **Compact event storage** achieving ~87% space savings via serial references
|
||||
- **Two-level caching** for serial lookups and query results
|
||||
- **ZSTD compression** for query cache with LRU eviction
|
||||
- **Atomic operations** for lock-free statistics tracking
|
||||
- **Pre-allocation patterns** for slice capacity management
|
||||
|
||||
However, several opportunities exist to further reduce GC pressure:
|
||||
- Implement `sync.Pool` for frequently allocated buffers
|
||||
- Use fixed-size arrays for cryptographic values
|
||||
- Pool `bytes.Buffer` instances in hot paths
|
||||
- Optimize escape behavior in serialization code
|
||||
|
||||
---
|
||||
|
||||
## Current Memory Patterns
|
||||
|
||||
### 1. Compact Event Storage
|
||||
|
||||
**Location**: `pkg/database/compact_event.go`
|
||||
|
||||
ORLY's most significant memory optimization is the compact binary format for event storage:
|
||||
|
||||
```
|
||||
Original event: 32 (ID) + 32 (pubkey) + 32*4 (tags) = 192+ bytes
|
||||
Compact format: 5 (pubkey serial) + 5*4 (tag serials) = 25 bytes
|
||||
Savings: ~87% compression per event
|
||||
```
|
||||
|
||||
**Key techniques:**
|
||||
- 5-byte serial references replace 32-byte IDs/pubkeys
|
||||
- Varint encoding for variable-length integers (CreatedAt, tag counts)
|
||||
- Type flags for efficient deserialization
|
||||
- Separate `SerialEventId` index for ID reconstruction
|
||||
|
||||
**Assessment**: Excellent storage optimization. This dramatically reduces database size and I/O costs.
|
||||
|
||||
### 2. Serial Cache System
|
||||
|
||||
**Location**: `pkg/database/serial_cache.go`
|
||||
|
||||
Two-way lookup cache for serial ↔ ID/pubkey mappings:
|
||||
|
||||
```go
|
||||
type SerialCache struct {
|
||||
pubkeyBySerial map[uint64][]byte // For decoding
|
||||
serialByPubkeyHash map[string]uint64 // For encoding
|
||||
eventIdBySerial map[uint64][]byte // For decoding
|
||||
serialByEventIdHash map[string]uint64 // For encoding
|
||||
}
|
||||
```
|
||||
|
||||
**Memory footprint:**
|
||||
- Pubkey cache: 100k entries × 32 bytes ≈ 3.2MB
|
||||
- Event ID cache: 500k entries × 32 bytes ≈ 16MB
|
||||
- Total: ~19-20MB overhead
|
||||
|
||||
**Strengths:**
|
||||
- Fine-grained `RWMutex` locking per direction/type
|
||||
- Configurable cache limits
|
||||
- Defensive copying prevents external mutations
|
||||
|
||||
**Improvement opportunity:** The eviction strategy (clear 50% when full) is simple but not LRU. Consider ring buffers or generational caching for better hit rates.
|
||||
|
||||
### 3. Query Cache with ZSTD Compression
|
||||
|
||||
**Location**: `pkg/database/querycache/event_cache.go`
|
||||
|
||||
```go
|
||||
type EventCache struct {
|
||||
entries map[string]*EventCacheEntry
|
||||
lruList *list.List
|
||||
encoder *zstd.Encoder // Reused encoder (level 9)
|
||||
decoder *zstd.Decoder // Reused decoder
|
||||
maxSize int64 // Default 512MB compressed
|
||||
}
|
||||
```
|
||||
|
||||
**Strengths:**
|
||||
- ZSTD level 9 compression (best ratio)
|
||||
- Encoder/decoder reuse avoids repeated initialization
|
||||
- LRU eviction with proper size tracking
|
||||
- Background cleanup of expired entries
|
||||
- Tracks compression ratio with exponential moving average
|
||||
|
||||
**Memory pattern:** Stores compressed data in cache, decompresses on-demand. This trades CPU for memory.
|
||||
|
||||
### 4. Buffer Allocation Patterns
|
||||
|
||||
**Current approach:** Uses `new(bytes.Buffer)` throughout serialization code:
|
||||
|
||||
```go
|
||||
// pkg/database/save-event.go, compact_event.go, serial_cache.go
|
||||
buf := new(bytes.Buffer)
|
||||
// ... encode data
|
||||
return buf.Bytes()
|
||||
```
|
||||
|
||||
**Assessment:** Each call allocates a new buffer on the heap. For high-throughput scenarios (thousands of events/second), this creates significant GC pressure.
|
||||
|
||||
---
|
||||
|
||||
## Optimization Opportunities
|
||||
|
||||
### 1. Implement sync.Pool for Buffer Reuse
|
||||
|
||||
**Priority: High**
|
||||
|
||||
Currently, ORLY creates new `bytes.Buffer` instances for every serialization operation. A buffer pool would amortize allocation costs:
|
||||
|
||||
```go
|
||||
// Recommended implementation
|
||||
var bufferPool = sync.Pool{
|
||||
New: func() interface{} {
|
||||
return bytes.NewBuffer(make([]byte, 0, 4096))
|
||||
},
|
||||
}
|
||||
|
||||
func getBuffer() *bytes.Buffer {
|
||||
return bufferPool.Get().(*bytes.Buffer)
|
||||
}
|
||||
|
||||
func putBuffer(buf *bytes.Buffer) {
|
||||
buf.Reset()
|
||||
bufferPool.Put(buf)
|
||||
}
|
||||
```
|
||||
|
||||
**Impact areas:**
|
||||
- `pkg/database/compact_event.go` - MarshalCompactEvent, encodeCompactTag
|
||||
- `pkg/database/save-event.go` - index key generation
|
||||
- `pkg/database/serial_cache.go` - GetEventIdBySerial, StoreEventIdSerial
|
||||
|
||||
**Expected benefit:** 50-80% reduction in buffer allocations on hot paths.
|
||||
|
||||
### 2. Fixed-Size Array Types for Cryptographic Values
|
||||
|
||||
**Priority: Medium**
|
||||
|
||||
The external nostr library uses `[]byte` slices for IDs, pubkeys, and signatures. However, these are always fixed sizes:
|
||||
|
||||
| Type | Size | Current | Recommended |
|
||||
|------|------|---------|-------------|
|
||||
| Event ID | 32 bytes | `[]byte` | `[32]byte` |
|
||||
| Pubkey | 32 bytes | `[]byte` | `[32]byte` |
|
||||
| Signature | 64 bytes | `[]byte` | `[64]byte` |
|
||||
|
||||
Internal types like `Uint40` already follow this pattern but use struct wrapping:
|
||||
|
||||
```go
|
||||
// Current (pkg/database/indexes/types/uint40.go)
|
||||
type Uint40 struct{ value uint64 }
|
||||
|
||||
// Already efficient - no slice allocation
|
||||
```
|
||||
|
||||
For cryptographic values, consider wrapper types:
|
||||
|
||||
```go
|
||||
type EventID [32]byte
|
||||
type Pubkey [32]byte
|
||||
type Signature [64]byte
|
||||
|
||||
func (id EventID) IsZero() bool { return id == EventID{} }
|
||||
func (id EventID) Hex() string { return hex.Enc(id[:]) }
|
||||
```
|
||||
|
||||
**Benefit:** Stack allocation for local variables, zero-value comparison efficiency.
|
||||
|
||||
### 3. Pre-allocated Slice Patterns
|
||||
|
||||
**Current usage is good:**
|
||||
|
||||
```go
|
||||
// pkg/database/save-event.go:51-54
|
||||
sers = make(types.Uint40s, 0, len(idxs)*100) // Estimate 100 serials per index
|
||||
|
||||
// pkg/database/compact_event.go:283
|
||||
ev.Tags = tag.NewSWithCap(int(nTags)) // Pre-allocate tag slice
|
||||
```
|
||||
|
||||
**Improvement:** Apply consistently to:
|
||||
- `Uint40s.Union/Intersection/Difference` methods (currently use `append` without capacity hints)
|
||||
- Query result accumulation in `query-events.go`
|
||||
|
||||
### 4. Escape Analysis Optimization
|
||||
|
||||
**Priority: Medium**
|
||||
|
||||
Several patterns cause unnecessary heap escapes. Check with:
|
||||
|
||||
```bash
|
||||
go build -gcflags="-m -m" ./pkg/database/...
|
||||
```
|
||||
|
||||
**Common escape causes in codebase:**
|
||||
|
||||
```go
|
||||
// compact_event.go:224 - Small slice escapes
|
||||
buf := make([]byte, 5) // Could be [5]byte on stack
|
||||
|
||||
// compact_event.go:335 - Single-byte slice escapes
|
||||
typeBuf := make([]byte, 1) // Could be var typeBuf [1]byte
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```go
|
||||
func readUint40(r io.Reader) (value uint64, err error) {
|
||||
var buf [5]byte // Stack-allocated
|
||||
if _, err = io.ReadFull(r, buf[:]); err != nil {
|
||||
return 0, err
|
||||
}
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Atomic Bytes Wrapper Optimization
|
||||
|
||||
**Location**: `pkg/utils/atomic/bytes.go`
|
||||
|
||||
Current implementation copies on both Load and Store:
|
||||
|
||||
```go
|
||||
func (x *Bytes) Load() (b []byte) {
|
||||
vb := x.v.Load().([]byte)
|
||||
b = make([]byte, len(vb)) // Allocation on every Load
|
||||
copy(b, vb)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
This is safe but expensive for high-frequency access. Consider:
|
||||
- Read-copy-update (RCU) pattern for read-heavy workloads
|
||||
- `sync.RWMutex` with direct access for controlled use cases
|
||||
|
||||
### 6. Goroutine Management
|
||||
|
||||
**Current patterns:**
|
||||
- Worker goroutines for message processing (`app/listener.go`)
|
||||
- Background cleanup goroutines (`querycache/event_cache.go`)
|
||||
- Pinger goroutines per connection (`app/handle-websocket.go`)
|
||||
|
||||
**Assessment:** Good use of bounded channels and `sync.WaitGroup` for lifecycle management.
|
||||
|
||||
**Improvement:** Consider a worker pool for subscription handlers to limit peak goroutine count:
|
||||
|
||||
```go
|
||||
type WorkerPool struct {
|
||||
jobs chan func()
|
||||
workers int
|
||||
wg sync.WaitGroup
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Memory Budget Analysis
|
||||
|
||||
### Runtime Memory Breakdown
|
||||
|
||||
| Component | Estimated Size | Notes |
|
||||
|-----------|---------------|-------|
|
||||
| Serial Cache (pubkeys) | 3.2 MB | 100k × 32 bytes |
|
||||
| Serial Cache (event IDs) | 16 MB | 500k × 32 bytes |
|
||||
| Query Cache | 512 MB | Configurable, compressed |
|
||||
| Per-connection state | ~10 KB | Channels, buffers, maps |
|
||||
| Badger DB caches | Variable | Controlled by Badger config |
|
||||
|
||||
### GC Tuning Recommendations
|
||||
|
||||
For a relay handling 1000+ events/second:
|
||||
|
||||
```go
|
||||
// main.go or init
|
||||
import "runtime/debug"
|
||||
|
||||
func init() {
|
||||
// More aggressive GC to limit heap growth
|
||||
debug.SetGCPercent(50) // GC at 50% heap growth (default 100)
|
||||
|
||||
// Set soft memory limit based on available RAM
|
||||
debug.SetMemoryLimit(2 << 30) // 2GB limit
|
||||
}
|
||||
```
|
||||
|
||||
Or via environment:
|
||||
```bash
|
||||
GOGC=50 GOMEMLIMIT=2GiB ./orly
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Profiling Commands
|
||||
|
||||
### Heap Profile
|
||||
|
||||
```bash
|
||||
# Enable pprof (already supported)
|
||||
ORLY_PPROF_HTTP=true ./orly
|
||||
|
||||
# Capture heap profile
|
||||
go tool pprof http://localhost:6060/debug/pprof/heap
|
||||
|
||||
# Analyze allocations
|
||||
go tool pprof -alloc_space heap.prof
|
||||
go tool pprof -inuse_space heap.prof
|
||||
```
|
||||
|
||||
### Escape Analysis
|
||||
|
||||
```bash
|
||||
# Check which variables escape to heap
|
||||
go build -gcflags="-m -m" ./pkg/database/... 2>&1 | grep "escapes to heap"
|
||||
```
|
||||
|
||||
### Allocation Benchmarks
|
||||
|
||||
Add to existing benchmarks:
|
||||
|
||||
```go
|
||||
func BenchmarkCompactMarshal(b *testing.B) {
|
||||
b.ReportAllocs()
|
||||
ev := createTestEvent()
|
||||
resolver := &testResolver{}
|
||||
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
data, _ := MarshalCompactEvent(ev, resolver)
|
||||
_ = data
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
1. **High Priority (Immediate Impact)**
|
||||
- Implement `sync.Pool` for `bytes.Buffer` in serialization paths
|
||||
- Replace small `make([]byte, n)` with fixed arrays in decode functions
|
||||
|
||||
2. **Medium Priority (Significant Improvement)**
|
||||
- Add pre-allocation hints to set operation methods
|
||||
- Optimize escape behavior in compact event encoding
|
||||
- Consider worker pool for subscription handlers
|
||||
|
||||
3. **Low Priority (Refinement)**
|
||||
- LRU-based serial cache eviction
|
||||
- Fixed-size types for cryptographic values (requires nostr library changes)
|
||||
- RCU pattern for atomic bytes in high-frequency paths
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
ORLY demonstrates thoughtful memory optimization in its storage layer, particularly the compact event format achieving 87% space savings. The dual-cache architecture (serial cache + query cache) balances memory usage with lookup performance.
|
||||
|
||||
The primary opportunity for improvement is in the serialization hot path, where buffer pooling could significantly reduce GC pressure. The recommended `sync.Pool` implementation would have immediate benefits for high-throughput deployments without requiring architectural changes.
|
||||
|
||||
Secondary improvements around escape analysis and fixed-size types would provide incremental gains and should be prioritized based on profiling data from production workloads.
|
||||
Reference in New Issue
Block a user