Files
next.orly.dev/BADGER_MIGRATION_GUIDE.md

8.8 KiB

Badger Database Migration Guide

Overview

This guide covers migrating your ORLY relay database when changing Badger configuration parameters, specifically for the VLogPercentile and table size optimizations.

When Migration is Needed

Based on research of Badger v4 source code and documentation:

Configuration Changes That DON'T Require Migration

The following options can be changed without migration:

  • BlockCacheSize - Only affects in-memory cache
  • IndexCacheSize - Only affects in-memory cache
  • NumCompactors - Runtime setting
  • NumLevelZeroTables - Affects compaction timing
  • NumMemtables - Affects write buffering
  • DetectConflicts - Runtime conflict detection
  • Compression - New data uses new compression, old data remains as-is
  • BlockSize - Explicitly stated in Badger source: "Changing BlockSize across DB runs will not break badger"

Configuration Changes That BENEFIT from Migration

The following options apply to new writes only - existing data gradually adopts new settings through compaction:

  • VLogPercentile - Affects where new values are stored (LSM vs vlog)
  • BaseTableSize - New SST files use new size
  • MemTableSize - Affects new write buffering
  • BaseLevelSize - Affects new LSM tree structure
  • ValueLogFileSize - New vlog files use new size

Migration Impact: Without migration, existing data remains in its current location (LSM tree or value log). The database will gradually adapt through normal compaction, which may take days or weeks depending on write volume.

Migration Options

Option 1: No Migration (Let Natural Compaction Handle It)

Best for: Low-traffic relays, testing environments

Pros:

  • No downtime required
  • No manual intervention
  • Zero risk of data loss

Cons:

  • Benefits take time to materialize (days/weeks)
  • Old data layout persists until natural compaction
  • Cache tuning benefits delayed

Steps:

  1. Update Badger configuration in pkg/database/database.go
  2. Restart ORLY relay
  3. Monitor performance over several days
  4. Optionally run manual GC: db.RunValueLogGC(0.5) periodically

Option 2: Manual Value Log Garbage Collection

Best for: Medium-traffic relays wanting faster optimization

Pros:

  • Faster than natural compaction
  • Still safe (no export/import)
  • Can run while relay is online

Cons:

  • Still gradual (hours instead of days)
  • CPU/disk intensive during GC
  • Partial benefit until GC completes

Steps:

  1. Update Badger configuration
  2. Restart ORLY relay
  3. Monitor logs for compaction activity
  4. Manually trigger GC if needed (future feature - not currently exposed)

Best for: Production relays, large databases, maximum performance

Pros:

  • Immediate full benefit of new configuration
  • Clean database structure
  • Predictable migration time
  • Reclaims all disk space

Cons:

  • Requires relay downtime (several hours for large DBs)
  • Requires 2x disk space temporarily
  • More complex procedure

Steps: See detailed procedure below

Full Migration Procedure (Option 3)

Prerequisites

  1. Disk space: At minimum 2.5x current database size

    • 1x for current database
    • 1x for JSONL export
    • 0.5x for new database (will be smaller with compression)
  2. Time estimate:

    • Export: ~100-500 MB/s depending on disk speed
    • Import: ~50-200 MB/s with indexing overhead
    • Example: 10 GB database = ~10-30 minutes total
  3. Backup: Ensure you have a recent backup before proceeding

Step-by-Step Migration

1. Prepare Migration Script

Use the provided scripts/migrate-badger-config.sh script (see below).

2. Stop the Relay

# If using systemd
sudo systemctl stop orly

# If running manually
pkill orly

3. Run Migration

cd ~/src/next.orly.dev
chmod +x scripts/migrate-badger-config.sh
./scripts/migrate-badger-config.sh

The script will:

  • Export all events to JSONL format
  • Move old database to backup location
  • Create new database with updated configuration
  • Import all events (rebuilds indexes automatically)
  • Verify event count matches

4. Verify Migration

# Check that events were migrated
echo "Old event count:"
cat ~/.local/share/ORLY-backup-*/migration.log | grep "exported.*events"

echo "New event count:"
cat ~/.local/share/ORLY/migration.log | grep "saved.*events"

5. Restart Relay

# If using systemd
sudo systemctl start orly
sudo journalctl -u orly -f

# If running manually
./orly

6. Monitor Performance

Watch for improvements in:

  • Cache hit ratio (should be >85% with new config)
  • Average query latency (should be <3ms for cached events)
  • No "Block cache too small" warnings in logs

7. Clean Up (After Verification)

# Once you confirm everything works (wait 24-48 hours)
rm -rf ~/.local/share/ORLY-backup-*
rm ~/.local/share/ORLY/events-export.jsonl

Migration Script

The migration script is located at scripts/migrate-badger-config.sh and handles:

  • Automatic export of all events to JSONL
  • Safe backup of existing database
  • Creation of new database with updated config
  • Import and indexing of all events
  • Verification of event counts

Rollback Procedure

If migration fails or performance degrades:

# Stop the relay
sudo systemctl stop orly  # or pkill orly

# Restore old database
rm -rf ~/.local/share/ORLY
mv ~/.local/share/ORLY-backup-$(date +%Y%m%d)* ~/.local/share/ORLY

# Restart with old configuration
sudo systemctl start orly

Configuration Changes Summary

Changes Applied in pkg/database/database.go

// Cache sizes (can change without migration)
opts.BlockCacheSize = 16384 MB  (was 512 MB)
opts.IndexCacheSize = 4096 MB   (was 256 MB)

// Table sizes (benefits from migration)
opts.BaseTableSize = 8 MB       (was 64 MB)
opts.MemTableSize = 16 MB       (was 64 MB)
opts.ValueLogFileSize = 128 MB  (was 256 MB)

// Inline event optimization (CRITICAL - benefits from migration)
opts.VLogPercentile = 0.99      (was 0.0 - default)

// LSM structure (benefits from migration)
opts.BaseLevelSize = 64 MB      (was 10 MB - default)

// Performance settings (no migration needed)
opts.DetectConflicts = false    (was true)
opts.Compression = options.ZSTD (was options.None)
opts.NumCompactors = 8          (was 4)
opts.NumMemtables = 8           (was 5)

Expected Improvements

Before Migration

  • Cache hit ratio: 33%
  • Average latency: 9.35ms
  • P95 latency: 34.48ms
  • Block cache warnings: Yes

After Migration

  • Cache hit ratio: 85-95%
  • Average latency: <3ms
  • P95 latency: <8ms
  • Block cache warnings: No
  • Inline events: 3-5x faster reads

Troubleshooting

Migration Script Fails

Error: "Not enough disk space"

  • Free up space or use Option 1 (natural compaction)
  • Ensure you have 2.5x current DB size available

Error: "Export failed"

  • Check database is not corrupted
  • Ensure ORLY is stopped
  • Check file permissions

Error: "Import count mismatch"

  • This is informational - some events may be duplicates
  • Check logs for specific errors
  • Verify core events are present via relay queries

Performance Not Improved

After migration, performance is the same:

  1. Verify configuration was actually applied:

    # Check running relay logs for config output
    sudo journalctl -u orly | grep -i "block.*cache\|vlog"
    
  2. Wait for cache to warm up (2-5 minutes after start)

  3. Check if workload changed (different query patterns)

  4. Verify disk I/O is not bottleneck:

    iostat -x 5
    

High CPU During Migration

  • This is normal - import rebuilds all indexes
  • Migration is single-threaded by design (data consistency)
  • Expect 30-60% CPU usage on one core

Additional Notes

Compression Impact

The Compression = options.ZSTD setting:

  • Only compresses new data
  • Old data remains uncompressed until rewritten by compaction
  • Migration forces all data to be rewritten → immediate compression benefit
  • Expect 2-3x compression ratio for event data

VLogPercentile Behavior

With VLogPercentile = 0.99:

  • 99% of values stored in LSM tree (fast access)
  • 1% of values stored in value log (large events >100 KB)
  • Threshold dynamically adjusted based on value size distribution
  • Perfect for ORLY's inline event optimization

Production Considerations

For production relays:

  1. Schedule migration during low-traffic period
  2. Notify users of maintenance window
  3. Have rollback plan ready
  4. Monitor closely for 24-48 hours after migration
  5. Keep backup for at least 1 week

References

  • Badger v4 Documentation: https://pkg.go.dev/github.com/dgraph-io/badger/v4
  • ORLY Database Package: pkg/database/database.go
  • Export/Import Implementation: pkg/database/{export,import}.go
  • Cache Optimization Analysis: cmd/benchmark/CACHE_OPTIMIZATION_STRATEGY.md
  • Inline Event Optimization: cmd/benchmark/INLINE_EVENT_OPTIMIZATION.md