optimizing badger cache, won a 10-15% improvement in most benchmarks

This commit is contained in:
2025-11-16 15:07:36 +00:00
parent 9bb3a7e057
commit 95bcf85ad7
72 changed files with 8158 additions and 4048 deletions

View File

@@ -0,0 +1,137 @@
# ORLY Performance Analysis
## Benchmark Results Summary
### Performance with 90s warmup:
- **Peak Throughput**: 10,452 events/sec
- **Avg Latency**: 1.63ms
- **P95 Latency**: 2.27ms
- **Success Rate**: 100%
### Key Findings
#### 1. Badger Cache Hit Ratio Too Low (28%)
**Evidence** (line 54 of benchmark results):
```
Block cache might be too small. Metrics: hit: 128456 miss: 332127 ... hit-ratio: 0.28
```
**Impact**:
- Low cache hit ratio forces more disk reads
- Increased latency on queries
- Query performance degrades over time (3866 q/s → 2806 q/s)
**Recommendation**:
Increase Badger cache sizes via environment variables:
- `ORLY_DB_BLOCK_CACHE_MB`: Increase from default to 256-512MB
- `ORLY_DB_INDEX_CACHE_MB`: Increase from default to 128-256MB
#### 2. CPU Profile Analysis
**Total CPU time**: 3.65s over 510s runtime (0.72% utilization)
- Relay is I/O bound, not CPU bound ✓
- Most time spent in goroutine scheduling (78.63%)
- Badger compaction uses 12.88% of CPU
**Key Observations**:
- Low CPU utilization means relay is mostly waiting on I/O
- This is expected and efficient behavior
- Not a bottleneck
#### 3. Warmup Time Impact
**Without 90s warmup**: Performance appeared lower in initial tests
**With 90s warmup**: Better sustained performance
**Potential causes**:
- Badger cache warming up
- Goroutine pool stabilization
- Memory allocation settling
**Current mitigations**:
- 90s delay before benchmark starts
- Health check with 60s start_period
#### 4. Query Performance Degradation
**Round 1**: 3,866 queries/sec
**Round 2**: 2,806 queries/sec (27% decrease)
**Likely causes**:
1. Cache pressure from accumulated data
2. Badger compaction interference
3. LSM tree depth increasing
**Recommendations**:
1. Increase cache sizes (primary fix)
2. Tune Badger compaction settings
3. Consider periodic cache warming
## Recommended Configuration Changes
### 1. Increase Badger Cache Sizes
Add to `cmd/benchmark/Dockerfile.next-orly`:
```dockerfile
ENV ORLY_DB_BLOCK_CACHE_MB=512
ENV ORLY_DB_INDEX_CACHE_MB=256
```
### 2. Tune Badger Options
Consider adjusting in `pkg/database/database.go`:
```go
// Increase value log file size for better write performance
ValueLogFileSize: 256 << 20, // 256MB (currently defaults to 1GB)
// Increase number of compactors
NumCompactors: 4, // Default is 4, could go to 8
// Increase number of level zero tables before compaction
NumLevelZeroTables: 8, // Default is 5
// Increase number of level zero tables before stalling writes
NumLevelZeroTablesStall: 16, // Default is 15
```
### 3. Add Readiness Check
Consider adding a "warmed up" indicator:
- Cache hit ratio > 50%
- At least 1000 events stored
- No active compactions
## Performance Comparison
| Implementation | Events/sec | Avg Latency | Cache Hit Ratio |
|---------------|------------|-------------|-----------------|
| ORLY (current) | 10,453 | 1.63ms | 28% ⚠️ |
| Khatru-SQLite | 9,819 | 590µs | N/A |
| Khatru-Badger | 9,712 | 602µs | N/A |
| Relayer-basic | 10,014 | 581µs | N/A |
| Strfry | 9,631 | 613µs | N/A |
| Nostr-rs-relay | 9,617 | 605µs | N/A |
**Key Observation**: ORLY has highest throughput but significantly higher latency than competitors. The low cache hit ratio explains this discrepancy.
## Next Steps
1. **Immediate**: Test with increased cache sizes
2. **Short-term**: Optimize Badger configuration
3. **Medium-term**: Investigate query path optimizations
4. **Long-term**: Consider query result caching layer
## Files Modified
- `cmd/benchmark/docker-compose.profile.yml` - Profile-enabled ORLY setup
- `cmd/benchmark/run-profile.sh` - Script to run profiled benchmarks
- This analysis document
## Profile Data
CPU profile available at: `cmd/benchmark/profiles/cpu.pprof`
Analyze with:
```bash
go tool pprof -http=:8080 profiles/cpu.pprof
```