Files
next.orly.dev/cmd/benchmark/PERFORMANCE_ANALYSIS.md

3.7 KiB

ORLY Performance Analysis

Benchmark Results Summary

Performance with 90s warmup:

  • Peak Throughput: 10,452 events/sec
  • Avg Latency: 1.63ms
  • P95 Latency: 2.27ms
  • Success Rate: 100%

Key Findings

1. Badger Cache Hit Ratio Too Low (28%)

Evidence (line 54 of benchmark results):

Block cache might be too small. Metrics: hit: 128456 miss: 332127 ... hit-ratio: 0.28

Impact:

  • Low cache hit ratio forces more disk reads
  • Increased latency on queries
  • Query performance degrades over time (3866 q/s → 2806 q/s)

Recommendation: Increase Badger cache sizes via environment variables:

  • ORLY_DB_BLOCK_CACHE_MB: Increase from default to 256-512MB
  • ORLY_DB_INDEX_CACHE_MB: Increase from default to 128-256MB

2. CPU Profile Analysis

Total CPU time: 3.65s over 510s runtime (0.72% utilization)

  • Relay is I/O bound, not CPU bound ✓
  • Most time spent in goroutine scheduling (78.63%)
  • Badger compaction uses 12.88% of CPU

Key Observations:

  • Low CPU utilization means relay is mostly waiting on I/O
  • This is expected and efficient behavior
  • Not a bottleneck

3. Warmup Time Impact

Without 90s warmup: Performance appeared lower in initial tests With 90s warmup: Better sustained performance

Potential causes:

  • Badger cache warming up
  • Goroutine pool stabilization
  • Memory allocation settling

Current mitigations:

  • 90s delay before benchmark starts
  • Health check with 60s start_period

4. Query Performance Degradation

Round 1: 3,866 queries/sec Round 2: 2,806 queries/sec (27% decrease)

Likely causes:

  1. Cache pressure from accumulated data
  2. Badger compaction interference
  3. LSM tree depth increasing

Recommendations:

  1. Increase cache sizes (primary fix)
  2. Tune Badger compaction settings
  3. Consider periodic cache warming

1. Increase Badger Cache Sizes

Add to cmd/benchmark/Dockerfile.next-orly:

ENV ORLY_DB_BLOCK_CACHE_MB=512
ENV ORLY_DB_INDEX_CACHE_MB=256

2. Tune Badger Options

Consider adjusting in pkg/database/database.go:

// Increase value log file size for better write performance
ValueLogFileSize: 256 << 20, // 256MB (currently defaults to 1GB)

// Increase number of compactors
NumCompactors: 4, // Default is 4, could go to 8

// Increase number of level zero tables before compaction
NumLevelZeroTables: 8, // Default is 5

// Increase number of level zero tables before stalling writes
NumLevelZeroTablesStall: 16, // Default is 15

3. Add Readiness Check

Consider adding a "warmed up" indicator:

  • Cache hit ratio > 50%
  • At least 1000 events stored
  • No active compactions

Performance Comparison

Implementation Events/sec Avg Latency Cache Hit Ratio
ORLY (current) 10,453 1.63ms 28% ⚠️
Khatru-SQLite 9,819 590µs N/A
Khatru-Badger 9,712 602µs N/A
Relayer-basic 10,014 581µs N/A
Strfry 9,631 613µs N/A
Nostr-rs-relay 9,617 605µs N/A

Key Observation: ORLY has highest throughput but significantly higher latency than competitors. The low cache hit ratio explains this discrepancy.

Next Steps

  1. Immediate: Test with increased cache sizes
  2. Short-term: Optimize Badger configuration
  3. Medium-term: Investigate query path optimizations
  4. Long-term: Consider query result caching layer

Files Modified

  • cmd/benchmark/docker-compose.profile.yml - Profile-enabled ORLY setup
  • cmd/benchmark/run-profile.sh - Script to run profiled benchmarks
  • This analysis document

Profile Data

CPU profile available at: cmd/benchmark/profiles/cpu.pprof

Analyze with:

go tool pprof -http=:8080 profiles/cpu.pprof