Files
next.orly.dev/MESSAGE_QUEUE_FIX.md
mleku 581e0ec588
Some checks failed
Go / build (push) Has been cancelled
Go / release (push) Has been cancelled
Implement comprehensive WebSocket subscription stability fixes
- Resolved critical issues causing subscriptions to drop after 30-60 seconds due to unconsumed receiver channels.
- Introduced per-subscription consumer goroutines to ensure continuous event delivery and prevent channel overflow.
- Enhanced REQ parsing to handle both wrapped and unwrapped filter arrays, eliminating EOF errors.
- Updated publisher logic to correctly send events to receiver channels, ensuring proper event delivery to subscribers.
- Added extensive documentation and testing tools to verify subscription stability and performance.
- Bumped version to v0.26.2 to reflect these significant improvements.
2025-11-06 18:21:00 +00:00

120 lines
3.0 KiB
Markdown

# Message Queue Fix
## Issue Discovered
When running the subscription test, the relay logs showed:
```
⚠️ ws->10.0.0.2 message queue full, dropping message (capacity=100)
```
## Root Cause
The `messageProcessor` goroutine was processing messages **synchronously**, one at a time:
```go
// BEFORE (blocking)
func (l *Listener) messageProcessor() {
for {
case req := <-l.messageQueue:
l.HandleMessage(req.data, req.remote) // BLOCKS until done
}
}
```
**Problem:**
- `HandleMessage``HandleReq` can take several seconds (database queries, event delivery)
- While one message is being processed, new messages pile up in the queue
- Queue fills up (100 message capacity)
- New messages get dropped
## Solution
Process messages **concurrently** by launching each in its own goroutine (khatru pattern):
```go
// AFTER (concurrent)
func (l *Listener) messageProcessor() {
for {
case req := <-l.messageQueue:
go l.HandleMessage(req.data, req.remote) // NON-BLOCKING
}
}
```
**Benefits:**
- Multiple messages can be processed simultaneously
- Fast operations (CLOSE, AUTH) don't wait behind slow operations (REQ)
- Queue rarely fills up
- No message drops
## khatru Pattern
This matches how khatru handles messages:
1. **Sequential parsing** (in read loop) - Parser state can't be shared
2. **Concurrent handling** (separate goroutines) - Each message independent
From khatru:
```go
// Parse message (sequential, in read loop)
envelope, err := smp.ParseMessage(message)
// Handle message (concurrent, in goroutine)
go func(message string) {
switch env := envelope.(type) {
case *nostr.EventEnvelope:
handleEvent(ctx, ws, env, rl)
case *nostr.ReqEnvelope:
handleReq(ctx, ws, env, rl)
// ...
}
}(message)
```
## Files Changed
- `app/listener.go:199` - Added `go` keyword before `l.HandleMessage()`
## Impact
**Before:**
- Message queue filled up quickly
- Messages dropped under load
- Slow operations blocked everything
**After:**
- Messages processed concurrently
- Queue rarely fills up
- Each message type processed at its own pace
## Testing
```bash
# Build with fix
go build -o orly
# Run relay
./orly
# Run subscription test (should not see queue warnings)
./subscription-test-simple -duration 120
```
## Performance Notes
**Goroutine overhead:** Minimal (~2KB per goroutine)
- Modern Go runtime handles thousands of goroutines efficiently
- Typical connection: 1-5 concurrent goroutines at a time
- Under load: Goroutines naturally throttle based on CPU/IO capacity
**Message ordering:** No longer guaranteed within a connection
- This is fine for Nostr protocol (messages are independent)
- Each message type can complete at its own pace
- Matches khatru behavior
## Summary
The message queue was filling up because messages were processed synchronously. By processing them concurrently (one goroutine per message), we match khatru's proven architecture and eliminate message drops.
**Status:** ✅ Fixed in app/listener.go:199