- Resolved critical issues causing subscriptions to drop after 30-60 seconds due to unconsumed receiver channels. - Introduced per-subscription consumer goroutines to ensure continuous event delivery and prevent channel overflow. - Enhanced REQ parsing to handle both wrapped and unwrapped filter arrays, eliminating EOF errors. - Updated publisher logic to correctly send events to receiver channels, ensuring proper event delivery to subscribers. - Added extensive documentation and testing tools to verify subscription stability and performance. - Bumped version to v0.26.2 to reflect these significant improvements.
120 lines
3.0 KiB
Markdown
120 lines
3.0 KiB
Markdown
# Message Queue Fix
|
|
|
|
## Issue Discovered
|
|
|
|
When running the subscription test, the relay logs showed:
|
|
```
|
|
⚠️ ws->10.0.0.2 message queue full, dropping message (capacity=100)
|
|
```
|
|
|
|
## Root Cause
|
|
|
|
The `messageProcessor` goroutine was processing messages **synchronously**, one at a time:
|
|
|
|
```go
|
|
// BEFORE (blocking)
|
|
func (l *Listener) messageProcessor() {
|
|
for {
|
|
case req := <-l.messageQueue:
|
|
l.HandleMessage(req.data, req.remote) // BLOCKS until done
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:**
|
|
- `HandleMessage` → `HandleReq` can take several seconds (database queries, event delivery)
|
|
- While one message is being processed, new messages pile up in the queue
|
|
- Queue fills up (100 message capacity)
|
|
- New messages get dropped
|
|
|
|
## Solution
|
|
|
|
Process messages **concurrently** by launching each in its own goroutine (khatru pattern):
|
|
|
|
```go
|
|
// AFTER (concurrent)
|
|
func (l *Listener) messageProcessor() {
|
|
for {
|
|
case req := <-l.messageQueue:
|
|
go l.HandleMessage(req.data, req.remote) // NON-BLOCKING
|
|
}
|
|
}
|
|
```
|
|
|
|
**Benefits:**
|
|
- Multiple messages can be processed simultaneously
|
|
- Fast operations (CLOSE, AUTH) don't wait behind slow operations (REQ)
|
|
- Queue rarely fills up
|
|
- No message drops
|
|
|
|
## khatru Pattern
|
|
|
|
This matches how khatru handles messages:
|
|
|
|
1. **Sequential parsing** (in read loop) - Parser state can't be shared
|
|
2. **Concurrent handling** (separate goroutines) - Each message independent
|
|
|
|
From khatru:
|
|
```go
|
|
// Parse message (sequential, in read loop)
|
|
envelope, err := smp.ParseMessage(message)
|
|
|
|
// Handle message (concurrent, in goroutine)
|
|
go func(message string) {
|
|
switch env := envelope.(type) {
|
|
case *nostr.EventEnvelope:
|
|
handleEvent(ctx, ws, env, rl)
|
|
case *nostr.ReqEnvelope:
|
|
handleReq(ctx, ws, env, rl)
|
|
// ...
|
|
}
|
|
}(message)
|
|
```
|
|
|
|
## Files Changed
|
|
|
|
- `app/listener.go:199` - Added `go` keyword before `l.HandleMessage()`
|
|
|
|
## Impact
|
|
|
|
**Before:**
|
|
- Message queue filled up quickly
|
|
- Messages dropped under load
|
|
- Slow operations blocked everything
|
|
|
|
**After:**
|
|
- Messages processed concurrently
|
|
- Queue rarely fills up
|
|
- Each message type processed at its own pace
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Build with fix
|
|
go build -o orly
|
|
|
|
# Run relay
|
|
./orly
|
|
|
|
# Run subscription test (should not see queue warnings)
|
|
./subscription-test-simple -duration 120
|
|
```
|
|
|
|
## Performance Notes
|
|
|
|
**Goroutine overhead:** Minimal (~2KB per goroutine)
|
|
- Modern Go runtime handles thousands of goroutines efficiently
|
|
- Typical connection: 1-5 concurrent goroutines at a time
|
|
- Under load: Goroutines naturally throttle based on CPU/IO capacity
|
|
|
|
**Message ordering:** No longer guaranteed within a connection
|
|
- This is fine for Nostr protocol (messages are independent)
|
|
- Each message type can complete at its own pace
|
|
- Matches khatru behavior
|
|
|
|
## Summary
|
|
|
|
The message queue was filling up because messages were processed synchronously. By processing them concurrently (one goroutine per message), we match khatru's proven architecture and eliminate message drops.
|
|
|
|
**Status:** ✅ Fixed in app/listener.go:199
|