Files
next.orly.dev/MESSAGE_QUEUE_FIX.md
mleku 581e0ec588
Some checks failed
Go / build (push) Has been cancelled
Go / release (push) Has been cancelled
Implement comprehensive WebSocket subscription stability fixes
- Resolved critical issues causing subscriptions to drop after 30-60 seconds due to unconsumed receiver channels.
- Introduced per-subscription consumer goroutines to ensure continuous event delivery and prevent channel overflow.
- Enhanced REQ parsing to handle both wrapped and unwrapped filter arrays, eliminating EOF errors.
- Updated publisher logic to correctly send events to receiver channels, ensuring proper event delivery to subscribers.
- Added extensive documentation and testing tools to verify subscription stability and performance.
- Bumped version to v0.26.2 to reflect these significant improvements.
2025-11-06 18:21:00 +00:00

3.0 KiB

Message Queue Fix

Issue Discovered

When running the subscription test, the relay logs showed:

⚠️ ws->10.0.0.2 message queue full, dropping message (capacity=100)

Root Cause

The messageProcessor goroutine was processing messages synchronously, one at a time:

// BEFORE (blocking)
func (l *Listener) messageProcessor() {
    for {
        case req := <-l.messageQueue:
            l.HandleMessage(req.data, req.remote)  // BLOCKS until done
    }
}

Problem:

  • HandleMessageHandleReq can take several seconds (database queries, event delivery)
  • While one message is being processed, new messages pile up in the queue
  • Queue fills up (100 message capacity)
  • New messages get dropped

Solution

Process messages concurrently by launching each in its own goroutine (khatru pattern):

// AFTER (concurrent)
func (l *Listener) messageProcessor() {
    for {
        case req := <-l.messageQueue:
            go l.HandleMessage(req.data, req.remote)  // NON-BLOCKING
    }
}

Benefits:

  • Multiple messages can be processed simultaneously
  • Fast operations (CLOSE, AUTH) don't wait behind slow operations (REQ)
  • Queue rarely fills up
  • No message drops

khatru Pattern

This matches how khatru handles messages:

  1. Sequential parsing (in read loop) - Parser state can't be shared
  2. Concurrent handling (separate goroutines) - Each message independent

From khatru:

// Parse message (sequential, in read loop)
envelope, err := smp.ParseMessage(message)

// Handle message (concurrent, in goroutine)
go func(message string) {
    switch env := envelope.(type) {
    case *nostr.EventEnvelope:
        handleEvent(ctx, ws, env, rl)
    case *nostr.ReqEnvelope:
        handleReq(ctx, ws, env, rl)
    // ...
    }
}(message)

Files Changed

  • app/listener.go:199 - Added go keyword before l.HandleMessage()

Impact

Before:

  • Message queue filled up quickly
  • Messages dropped under load
  • Slow operations blocked everything

After:

  • Messages processed concurrently
  • Queue rarely fills up
  • Each message type processed at its own pace

Testing

# Build with fix
go build -o orly

# Run relay
./orly

# Run subscription test (should not see queue warnings)
./subscription-test-simple -duration 120

Performance Notes

Goroutine overhead: Minimal (~2KB per goroutine)

  • Modern Go runtime handles thousands of goroutines efficiently
  • Typical connection: 1-5 concurrent goroutines at a time
  • Under load: Goroutines naturally throttle based on CPU/IO capacity

Message ordering: No longer guaranteed within a connection

  • This is fine for Nostr protocol (messages are independent)
  • Each message type can complete at its own pace
  • Matches khatru behavior

Summary

The message queue was filling up because messages were processed synchronously. By processing them concurrently (one goroutine per message), we match khatru's proven architecture and eliminate message drops.

Status: Fixed in app/listener.go:199