optimizing badger cache, won a 10-15% improvement in most benchmarks
This commit is contained in:
97
cmd/benchmark/CACHE_TUNING_ANALYSIS.md
Normal file
97
cmd/benchmark/CACHE_TUNING_ANALYSIS.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# Badger Cache Tuning Analysis
|
||||
|
||||
## Problem Identified
|
||||
|
||||
From benchmark run `run_20251116_092759`, the Badger block cache showed critical performance issues:
|
||||
|
||||
### Cache Metrics (Round 1):
|
||||
```
|
||||
Block cache might be too small. Metrics:
|
||||
- hit: 151,469
|
||||
- miss: 307,989
|
||||
- hit-ratio: 0.33 (33%)
|
||||
- keys-added: 226,912
|
||||
- keys-evicted: 226,893 (99.99% eviction rate!)
|
||||
- Cache life expectancy: 2 seconds (90th percentile)
|
||||
```
|
||||
|
||||
### Performance Impact:
|
||||
- **Burst Pattern Latency**: 9.35ms avg (vs 3.61ms for khatru-sqlite)
|
||||
- **P95 Latency**: 34.48ms (vs 8.59ms for khatru-sqlite)
|
||||
- **Cache hit ratio**: Only 33% - causing constant disk I/O
|
||||
|
||||
## Root Cause
|
||||
|
||||
The benchmark container was using **default Badger cache sizes** (much smaller than the code defaults):
|
||||
- Block cache: ~64 MB (Badger default)
|
||||
- Index cache: ~32 MB (Badger default)
|
||||
|
||||
The code has better defaults (1024 MB / 512 MB), but these weren't set in the Docker container.
|
||||
|
||||
## Cache Size Calculation
|
||||
|
||||
Based on benchmark workload analysis:
|
||||
|
||||
### Block Cache Requirements:
|
||||
- Total cost added: 12.44 TB during test
|
||||
- With 226K keys and immediate evictions, we need to hold ~100-200K blocks in memory
|
||||
- At ~10-20 KB per block average: **2-4 GB needed**
|
||||
|
||||
### Index Cache Requirements:
|
||||
- For 200K+ keys with metadata
|
||||
- Efficient index lookups during queries
|
||||
- **1-2 GB needed**
|
||||
|
||||
## Solution
|
||||
|
||||
Updated `Dockerfile.next-orly` with optimized cache settings:
|
||||
|
||||
```dockerfile
|
||||
ENV ORLY_DB_BLOCK_CACHE_MB=2048 # 2 GB block cache
|
||||
ENV ORLY_DB_INDEX_CACHE_MB=1024 # 1 GB index cache
|
||||
```
|
||||
|
||||
### Expected Improvements:
|
||||
- **Cache hit ratio**: Target 85-95% (up from 33%)
|
||||
- **Burst pattern latency**: Target <5ms avg (down from 9.35ms)
|
||||
- **P95 latency**: Target <15ms (down from 34.48ms)
|
||||
- **Query latency**: Significant reduction due to cached index lookups
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
1. Rebuild Docker image with new cache settings
|
||||
2. Run full benchmark suite
|
||||
3. Compare metrics:
|
||||
- Cache hit ratio
|
||||
- Average/P95/P99 latencies
|
||||
- Throughput under burst patterns
|
||||
- Memory usage
|
||||
|
||||
## Memory Budget
|
||||
|
||||
With these settings, the relay will use approximately:
|
||||
- Block cache: 2 GB
|
||||
- Index cache: 1 GB
|
||||
- Badger internal structures: ~200 MB
|
||||
- Go runtime: ~200 MB
|
||||
- **Total**: ~3.5 GB
|
||||
|
||||
This is reasonable for a high-performance relay and well within modern server capabilities.
|
||||
|
||||
## Alternative Configurations
|
||||
|
||||
For constrained environments:
|
||||
|
||||
### Medium (1.5 GB total):
|
||||
```
|
||||
ORLY_DB_BLOCK_CACHE_MB=1024
|
||||
ORLY_DB_INDEX_CACHE_MB=512
|
||||
```
|
||||
|
||||
### Minimal (512 MB total):
|
||||
```
|
||||
ORLY_DB_BLOCK_CACHE_MB=384
|
||||
ORLY_DB_INDEX_CACHE_MB=128
|
||||
```
|
||||
|
||||
Note: Smaller caches will result in lower hit ratios and higher latencies.
|
||||
Reference in New Issue
Block a user