mleku/next.orly.dev

Fork 1

Files

mleku 95bcf85ad7

optimizing badger cache, won a 10-15% improvement in most benchmarks

2025-11-16 15:07:36 +00:00

3.7 KiB

Raw Blame History

ORLY Performance Analysis

Benchmark Results Summary

Performance with 90s warmup:

Peak Throughput: 10,452 events/sec
Avg Latency: 1.63ms
P95 Latency: 2.27ms
Success Rate: 100%

Key Findings

1. Badger Cache Hit Ratio Too Low (28%)

Evidence (line 54 of benchmark results):

Block cache might be too small. Metrics: hit: 128456 miss: 332127 ... hit-ratio: 0.28

Impact:

Low cache hit ratio forces more disk reads
Increased latency on queries
Query performance degrades over time (3866 q/s → 2806 q/s)

Recommendation: Increase Badger cache sizes via environment variables:

ORLY_DB_BLOCK_CACHE_MB: Increase from default to 256-512MB
ORLY_DB_INDEX_CACHE_MB: Increase from default to 128-256MB

2. CPU Profile Analysis

Total CPU time: 3.65s over 510s runtime (0.72% utilization)

Relay is I/O bound, not CPU bound ✓
Most time spent in goroutine scheduling (78.63%)
Badger compaction uses 12.88% of CPU

Key Observations:

Low CPU utilization means relay is mostly waiting on I/O
This is expected and efficient behavior
Not a bottleneck

3. Warmup Time Impact

Without 90s warmup: Performance appeared lower in initial tests With 90s warmup: Better sustained performance

Potential causes:

Badger cache warming up
Goroutine pool stabilization
Memory allocation settling

Current mitigations:

90s delay before benchmark starts
Health check with 60s start_period

4. Query Performance Degradation

Round 1: 3,866 queries/sec Round 2: 2,806 queries/sec (27% decrease)

Likely causes:

Cache pressure from accumulated data
Badger compaction interference
LSM tree depth increasing

Recommendations:

Increase cache sizes (primary fix)
Tune Badger compaction settings
Consider periodic cache warming

Recommended Configuration Changes

1. Increase Badger Cache Sizes

Add to cmd/benchmark/Dockerfile.next-orly:

ENV ORLY_DB_BLOCK_CACHE_MB=512
ENV ORLY_DB_INDEX_CACHE_MB=256

2. Tune Badger Options

Consider adjusting in pkg/database/database.go:

// Increase value log file size for better write performance
ValueLogFileSize: 256 << 20, // 256MB (currently defaults to 1GB)

// Increase number of compactors
NumCompactors: 4, // Default is 4, could go to 8

// Increase number of level zero tables before compaction
NumLevelZeroTables: 8, // Default is 5

// Increase number of level zero tables before stalling writes
NumLevelZeroTablesStall: 16, // Default is 15

3. Add Readiness Check

Consider adding a "warmed up" indicator:

Cache hit ratio > 50%
At least 1000 events stored
No active compactions

Performance Comparison

Implementation	Events/sec	Avg Latency	Cache Hit Ratio
ORLY (current)	10,453	1.63ms	28% ⚠️
Khatru-SQLite	9,819	590µs	N/A
Khatru-Badger	9,712	602µs	N/A
Relayer-basic	10,014	581µs	N/A
Strfry	9,631	613µs	N/A
Nostr-rs-relay	9,617	605µs	N/A

Key Observation: ORLY has highest throughput but significantly higher latency than competitors. The low cache hit ratio explains this discrepancy.

Next Steps

Immediate: Test with increased cache sizes
Short-term: Optimize Badger configuration
Medium-term: Investigate query path optimizations
Long-term: Consider query result caching layer

Files Modified

cmd/benchmark/docker-compose.profile.yml - Profile-enabled ORLY setup
cmd/benchmark/run-profile.sh - Script to run profiled benchmarks
This analysis document

Profile Data

CPU profile available at: cmd/benchmark/profiles/cpu.pprof

Analyze with:

go tool pprof -http=:8080 profiles/cpu.pprof

3.7 KiB Raw Blame History