Implement comprehensive WebSocket subscription stability fixes
- Resolved critical issues causing subscriptions to drop after 30-60 seconds due to unconsumed receiver channels. - Introduced per-subscription consumer goroutines to ensure continuous event delivery and prevent channel overflow. - Enhanced REQ parsing to handle both wrapped and unwrapped filter arrays, eliminating EOF errors. - Updated publisher logic to correctly send events to receiver channels, ensuring proper event delivery to subscribers. - Added extensive documentation and testing tools to verify subscription stability and performance. - Bumped version to v0.26.2 to reflect these significant improvements.
This commit is contained in:
119
MESSAGE_QUEUE_FIX.md
Normal file
119
MESSAGE_QUEUE_FIX.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Message Queue Fix
|
||||
|
||||
## Issue Discovered
|
||||
|
||||
When running the subscription test, the relay logs showed:
|
||||
```
|
||||
⚠️ ws->10.0.0.2 message queue full, dropping message (capacity=100)
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
|
||||
The `messageProcessor` goroutine was processing messages **synchronously**, one at a time:
|
||||
|
||||
```go
|
||||
// BEFORE (blocking)
|
||||
func (l *Listener) messageProcessor() {
|
||||
for {
|
||||
case req := <-l.messageQueue:
|
||||
l.HandleMessage(req.data, req.remote) // BLOCKS until done
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Problem:**
|
||||
- `HandleMessage` → `HandleReq` can take several seconds (database queries, event delivery)
|
||||
- While one message is being processed, new messages pile up in the queue
|
||||
- Queue fills up (100 message capacity)
|
||||
- New messages get dropped
|
||||
|
||||
## Solution
|
||||
|
||||
Process messages **concurrently** by launching each in its own goroutine (khatru pattern):
|
||||
|
||||
```go
|
||||
// AFTER (concurrent)
|
||||
func (l *Listener) messageProcessor() {
|
||||
for {
|
||||
case req := <-l.messageQueue:
|
||||
go l.HandleMessage(req.data, req.remote) // NON-BLOCKING
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Multiple messages can be processed simultaneously
|
||||
- Fast operations (CLOSE, AUTH) don't wait behind slow operations (REQ)
|
||||
- Queue rarely fills up
|
||||
- No message drops
|
||||
|
||||
## khatru Pattern
|
||||
|
||||
This matches how khatru handles messages:
|
||||
|
||||
1. **Sequential parsing** (in read loop) - Parser state can't be shared
|
||||
2. **Concurrent handling** (separate goroutines) - Each message independent
|
||||
|
||||
From khatru:
|
||||
```go
|
||||
// Parse message (sequential, in read loop)
|
||||
envelope, err := smp.ParseMessage(message)
|
||||
|
||||
// Handle message (concurrent, in goroutine)
|
||||
go func(message string) {
|
||||
switch env := envelope.(type) {
|
||||
case *nostr.EventEnvelope:
|
||||
handleEvent(ctx, ws, env, rl)
|
||||
case *nostr.ReqEnvelope:
|
||||
handleReq(ctx, ws, env, rl)
|
||||
// ...
|
||||
}
|
||||
}(message)
|
||||
```
|
||||
|
||||
## Files Changed
|
||||
|
||||
- `app/listener.go:199` - Added `go` keyword before `l.HandleMessage()`
|
||||
|
||||
## Impact
|
||||
|
||||
**Before:**
|
||||
- Message queue filled up quickly
|
||||
- Messages dropped under load
|
||||
- Slow operations blocked everything
|
||||
|
||||
**After:**
|
||||
- Messages processed concurrently
|
||||
- Queue rarely fills up
|
||||
- Each message type processed at its own pace
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# Build with fix
|
||||
go build -o orly
|
||||
|
||||
# Run relay
|
||||
./orly
|
||||
|
||||
# Run subscription test (should not see queue warnings)
|
||||
./subscription-test-simple -duration 120
|
||||
```
|
||||
|
||||
## Performance Notes
|
||||
|
||||
**Goroutine overhead:** Minimal (~2KB per goroutine)
|
||||
- Modern Go runtime handles thousands of goroutines efficiently
|
||||
- Typical connection: 1-5 concurrent goroutines at a time
|
||||
- Under load: Goroutines naturally throttle based on CPU/IO capacity
|
||||
|
||||
**Message ordering:** No longer guaranteed within a connection
|
||||
- This is fine for Nostr protocol (messages are independent)
|
||||
- Each message type can complete at its own pace
|
||||
- Matches khatru behavior
|
||||
|
||||
## Summary
|
||||
|
||||
The message queue was filling up because messages were processed synchronously. By processing them concurrently (one goroutine per message), we match khatru's proven architecture and eliminate message drops.
|
||||
|
||||
**Status:** ✅ Fixed in app/listener.go:199
|
||||
Reference in New Issue
Block a user