3.7 KiB
ORLY Performance Analysis
Benchmark Results Summary
Performance with 90s warmup:
- Peak Throughput: 10,452 events/sec
- Avg Latency: 1.63ms
- P95 Latency: 2.27ms
- Success Rate: 100%
Key Findings
1. Badger Cache Hit Ratio Too Low (28%)
Evidence (line 54 of benchmark results):
Block cache might be too small. Metrics: hit: 128456 miss: 332127 ... hit-ratio: 0.28
Impact:
- Low cache hit ratio forces more disk reads
- Increased latency on queries
- Query performance degrades over time (3866 q/s → 2806 q/s)
Recommendation: Increase Badger cache sizes via environment variables:
ORLY_DB_BLOCK_CACHE_MB: Increase from default to 256-512MBORLY_DB_INDEX_CACHE_MB: Increase from default to 128-256MB
2. CPU Profile Analysis
Total CPU time: 3.65s over 510s runtime (0.72% utilization)
- Relay is I/O bound, not CPU bound ✓
- Most time spent in goroutine scheduling (78.63%)
- Badger compaction uses 12.88% of CPU
Key Observations:
- Low CPU utilization means relay is mostly waiting on I/O
- This is expected and efficient behavior
- Not a bottleneck
3. Warmup Time Impact
Without 90s warmup: Performance appeared lower in initial tests With 90s warmup: Better sustained performance
Potential causes:
- Badger cache warming up
- Goroutine pool stabilization
- Memory allocation settling
Current mitigations:
- 90s delay before benchmark starts
- Health check with 60s start_period
4. Query Performance Degradation
Round 1: 3,866 queries/sec Round 2: 2,806 queries/sec (27% decrease)
Likely causes:
- Cache pressure from accumulated data
- Badger compaction interference
- LSM tree depth increasing
Recommendations:
- Increase cache sizes (primary fix)
- Tune Badger compaction settings
- Consider periodic cache warming
Recommended Configuration Changes
1. Increase Badger Cache Sizes
Add to cmd/benchmark/Dockerfile.next-orly:
ENV ORLY_DB_BLOCK_CACHE_MB=512
ENV ORLY_DB_INDEX_CACHE_MB=256
2. Tune Badger Options
Consider adjusting in pkg/database/database.go:
// Increase value log file size for better write performance
ValueLogFileSize: 256 << 20, // 256MB (currently defaults to 1GB)
// Increase number of compactors
NumCompactors: 4, // Default is 4, could go to 8
// Increase number of level zero tables before compaction
NumLevelZeroTables: 8, // Default is 5
// Increase number of level zero tables before stalling writes
NumLevelZeroTablesStall: 16, // Default is 15
3. Add Readiness Check
Consider adding a "warmed up" indicator:
- Cache hit ratio > 50%
- At least 1000 events stored
- No active compactions
Performance Comparison
| Implementation | Events/sec | Avg Latency | Cache Hit Ratio |
|---|---|---|---|
| ORLY (current) | 10,453 | 1.63ms | 28% ⚠️ |
| Khatru-SQLite | 9,819 | 590µs | N/A |
| Khatru-Badger | 9,712 | 602µs | N/A |
| Relayer-basic | 10,014 | 581µs | N/A |
| Strfry | 9,631 | 613µs | N/A |
| Nostr-rs-relay | 9,617 | 605µs | N/A |
Key Observation: ORLY has highest throughput but significantly higher latency than competitors. The low cache hit ratio explains this discrepancy.
Next Steps
- Immediate: Test with increased cache sizes
- Short-term: Optimize Badger configuration
- Medium-term: Investigate query path optimizations
- Long-term: Consider query result caching layer
Files Modified
cmd/benchmark/docker-compose.profile.yml- Profile-enabled ORLY setupcmd/benchmark/run-profile.sh- Script to run profiled benchmarks- This analysis document
Profile Data
CPU profile available at: cmd/benchmark/profiles/cpu.pprof
Analyze with:
go tool pprof -http=:8080 profiles/cpu.pprof