Files
next.orly.dev/docs/NEO4J_BACKEND.md

10 KiB

Neo4j Database Backend for ORLY Relay

Overview

The Neo4j database backend provides a graph-native storage solution for the ORLY Nostr relay. Unlike traditional key-value or document stores, Neo4j is optimized for relationship-heavy queries, making it an ideal fit for Nostr's social graph and event reference patterns.

Architecture

Core Components

  1. Main Database File (pkg/neo4j/neo4j.go)

    • Implements the database.Database interface
    • Manages Neo4j driver connection and lifecycle
    • Uses Badger for metadata storage (markers, identity, subscriptions)
    • Registers with the database factory via init()
  2. Schema Management (pkg/neo4j/schema.go)

    • Defines Neo4j constraints and indexes using Cypher
    • Creates unique constraints on Event IDs and Author pubkeys
    • Indexes for optimal query performance (kind, created_at, tags)
  3. Query Engine (pkg/neo4j/query-events.go)

    • Translates Nostr REQ filters to Cypher queries
    • Leverages graph traversal for tag relationships
    • Supports prefix matching for IDs and pubkeys
    • Parameterized queries for security and performance
  4. Event Storage (pkg/neo4j/save-event.go)

    • Stores events as nodes with properties
    • Creates graph relationships:
      • AUTHORED_BY: Event → Author
      • REFERENCES: Event → Event (e-tags)
      • MENTIONS: Event → Author (p-tags)
      • TAGGED_WITH: Event → Tag

Graph Schema

Node Types

Event Node

(:Event {
  id: string,           // Hex-encoded event ID (32 bytes)
  serial: int,          // Sequential serial number
  kind: int,            // Event kind
  created_at: int,      // Unix timestamp
  content: string,      // Event content
  sig: string,          // Hex-encoded signature
  pubkey: string,       // Hex-encoded author pubkey
  tags: string          // JSON-encoded tags array
})

Author Node

(:Author {
  pubkey: string        // Hex-encoded pubkey (unique)
})

Tag Node

(:Tag {
  type: string,         // Tag type (e.g., "t", "d")
  value: string         // Tag value
})

Marker Node (for metadata)

(:Marker {
  key: string,          // Unique key
  value: string         // Hex-encoded value
})

Relationships

  • (:Event)-[:AUTHORED_BY]->(:Author) - Event authorship
  • (:Event)-[:REFERENCES]->(:Event) - Event references (e-tags)
  • (:Event)-[:MENTIONS]->(:Author) - Author mentions (p-tags)
  • (:Event)-[:TAGGED_WITH]->(:Tag) - Generic tag associations

How Nostr REQ Messages Are Implemented

Filter to Cypher Translation

The query engine in query-events.go translates Nostr filters to Cypher queries:

1. ID Filters

{"ids": ["abc123..."]}

Becomes:

MATCH (e:Event)
WHERE e.id = $id_0

For prefix matching (partial IDs):

WHERE e.id STARTS WITH $id_0

2. Author Filters

{"authors": ["pubkey1...", "pubkey2..."]}

Becomes:

MATCH (e:Event)
WHERE e.pubkey IN $authors

3. Kind Filters

{"kinds": [1, 7]}

Becomes:

MATCH (e:Event)
WHERE e.kind IN $kinds

4. Time Range Filters

{"since": 1234567890, "until": 1234567900}

Becomes:

MATCH (e:Event)
WHERE e.created_at >= $since AND e.created_at <= $until

5. Tag Filters (Graph Advantage!)

{"#t": ["bitcoin", "nostr"]}

Becomes:

MATCH (e:Event)
OPTIONAL MATCH (e)-[:TAGGED_WITH]->(t0:Tag)
WHERE t0.type = $tagType_0 AND t0.value IN $tagValues_0

This leverages Neo4j's native graph traversal for efficient tag queries!

6. Combined Filters

{
  "kinds": [1],
  "authors": ["abc..."],
  "#p": ["xyz..."],
  "limit": 50
}

Becomes:

MATCH (e:Event)
OPTIONAL MATCH (e)-[:TAGGED_WITH]->(t0:Tag)
WHERE e.kind IN $kinds
  AND e.pubkey IN $authors
  AND t0.type = $tagType_0
  AND t0.value IN $tagValues_0
RETURN e.id, e.kind, e.created_at, e.content, e.sig, e.pubkey, e.tags
ORDER BY e.created_at DESC
LIMIT $limit

Query Execution Flow

  1. Parse Filter: Extract IDs, authors, kinds, times, tags
  2. Build Cypher: Construct parameterized query with MATCH/WHERE clauses
  3. Execute: Run via ExecuteRead() with read-only session
  4. Parse Results: Convert Neo4j records to Nostr events
  5. Return: Send events back to client

Configuration

Environment Variables

# Neo4j Connection
ORLY_NEO4J_URI="bolt://localhost:7687"
ORLY_NEO4J_USER="neo4j"
ORLY_NEO4J_PASSWORD="password"

# Database Type Selection
ORLY_DB_TYPE="neo4j"

# Data Directory (for Badger metadata storage)
ORLY_DATA_DIR="~/.local/share/ORLY"

Example Docker Compose Setup

version: '3.8'
services:
  neo4j:
    image: neo4j:5.15
    ports:
      - "7474:7474"  # HTTP
      - "7687:7687"  # Bolt
    environment:
      - NEO4J_AUTH=neo4j/password
      - NEO4J_PLUGINS=["apoc"]
    volumes:
      - neo4j_data:/data
      - neo4j_logs:/logs

  orly:
    build: .
    ports:
      - "3334:3334"
    environment:
      - ORLY_DB_TYPE=neo4j
      - ORLY_NEO4J_URI=bolt://neo4j:7687
      - ORLY_NEO4J_USER=neo4j
      - ORLY_NEO4J_PASSWORD=password
    depends_on:
      - neo4j

volumes:
  neo4j_data:
  neo4j_logs:

Performance Considerations

Advantages Over Badger/DGraph

  1. Native Graph Queries: Tag relationships and social graph traversals are native operations
  2. Optimized Indexes: Automatic index usage for constrained properties
  3. Efficient Joins: Relationship traversals are O(1) lookups
  4. Query Planner: Neo4j's query planner optimizes complex multi-filter queries

Tuning Recommendations

  1. Indexes: The schema creates indexes for:

    • Event ID (unique constraint + index)
    • Event kind
    • Event created_at
    • Composite: kind + created_at
    • Tag type + value
  2. Cache Configuration: Configure Neo4j's page cache and heap size:

# neo4j.conf
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=4G
  1. Query Limits: Always use LIMIT in queries to prevent memory exhaustion

Implementation Details

Replaceable Events

Replaceable events (kinds 0, 3, 10000-19999) are handled in WouldReplaceEvent():

MATCH (e:Event {kind: $kind, pubkey: $pubkey})
WHERE e.created_at < $createdAt
RETURN e.serial, e.created_at

Older events are deleted before saving the new one.

Parameterized Replaceable Events

For kinds 30000-39999, we also match on the d-tag:

MATCH (e:Event {kind: $kind, pubkey: $pubkey})-[:TAGGED_WITH]->(t:Tag {type: 'd', value: $dValue})
WHERE e.created_at < $createdAt
RETURN e.serial

Event Deletion (NIP-09)

Delete events (kind 5) are processed via graph traversal:

MATCH (target:Event {id: $targetId})
MATCH (delete:Event {kind: 5})-[:REFERENCES]->(target)
WHERE delete.pubkey = $pubkey OR delete.pubkey IN $admins
RETURN delete.id

Only same-author or admin deletions are allowed.

Comparison with Other Backends

Feature Badger DGraph Neo4j
Storage Type Key-value Graph (distributed) Graph (native)
Query Language Custom indexes DQL Cypher
Tag Queries Index lookups Graph traversal Native relationships
Scaling Single-node Distributed Cluster/Causal cluster
Memory Usage Low Medium High
Setup Complexity Minimal Medium Medium
Best For Small relays Large distributed Relationship-heavy

Development Guide

Adding New Indexes

  1. Update schema.go with new index definition
  2. Add to applySchema() function
  3. Restart relay to apply schema changes

Example:

CREATE INDEX event_content_fulltext IF NOT EXISTS
FOR (e:Event) ON (e.content)
OPTIONS {indexConfig: {`fulltext.analyzer`: 'english'}}

Custom Queries

To add custom query methods:

  1. Add method to query-events.go
  2. Build Cypher query with parameterization
  3. Use ExecuteRead() or ExecuteWrite() as appropriate
  4. Parse results with parseEventsFromResult()

Testing

Due to Neo4j dependency, tests require a running Neo4j instance:

# Start Neo4j via Docker
docker run -d --name neo4j-test \
  -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/test \
  neo4j:5.15

# Run tests
ORLY_NEO4J_URI="bolt://localhost:7687" \
ORLY_NEO4J_USER="neo4j" \
ORLY_NEO4J_PASSWORD="test" \
go test ./pkg/neo4j/...

# Cleanup
docker rm -f neo4j-test

Future Enhancements

  1. Full-text Search: Leverage Neo4j's full-text indexes for content search
  2. Graph Analytics: Implement social graph metrics (centrality, communities)
  3. Advanced Queries: Support NIP-50 search via Cypher full-text capabilities
  4. Clustering: Deploy Neo4j cluster for high availability
  5. APOC Procedures: Utilize APOC library for advanced graph algorithms
  6. Caching Layer: Implement query result caching similar to Badger backend

Troubleshooting

Connection Issues

# Test connectivity
cypher-shell -a bolt://localhost:7687 -u neo4j -p password

# Check Neo4j logs
docker logs neo4j

Performance Issues

// View query execution plan
EXPLAIN MATCH (e:Event) WHERE e.kind = 1 RETURN e LIMIT 10

// Profile query performance
PROFILE MATCH (e:Event)-[:AUTHORED_BY]->(a:Author) RETURN e, a LIMIT 10

Schema Issues

// List all constraints
SHOW CONSTRAINTS

// List all indexes
SHOW INDEXES

// Drop and recreate schema
DROP CONSTRAINT event_id_unique IF EXISTS
CREATE CONSTRAINT event_id_unique FOR (e:Event) REQUIRE e.id IS UNIQUE

References

License

This Neo4j backend implementation follows the same license as the ORLY relay project.