# Neo4j Database Backend for ORLY Relay ## Overview The Neo4j database backend provides a graph-native storage solution for the ORLY Nostr relay. Unlike traditional key-value or document stores, Neo4j is optimized for relationship-heavy queries, making it an ideal fit for Nostr's social graph and event reference patterns. ## Architecture ### Core Components 1. **Main Database File** ([pkg/neo4j/neo4j.go](../pkg/neo4j/neo4j.go)) - Implements the `database.Database` interface - Manages Neo4j driver connection and lifecycle - Uses Badger for metadata storage (markers, identity, subscriptions) - Registers with the database factory via `init()` 2. **Schema Management** ([pkg/neo4j/schema.go](../pkg/neo4j/schema.go)) - Defines Neo4j constraints and indexes using Cypher - Creates unique constraints on Event IDs and Author pubkeys - Indexes for optimal query performance (kind, created_at, tags) 3. **Query Engine** ([pkg/neo4j/query-events.go](../pkg/neo4j/query-events.go)) - Translates Nostr REQ filters to Cypher queries - Leverages graph traversal for tag relationships - Supports prefix matching for IDs and pubkeys - Parameterized queries for security and performance 4. **Event Storage** ([pkg/neo4j/save-event.go](../pkg/neo4j/save-event.go)) - Stores events as nodes with properties - Creates graph relationships: - `AUTHORED_BY`: Event → Author - `REFERENCES`: Event → Event (e-tags) - `MENTIONS`: Event → Author (p-tags) - `TAGGED_WITH`: Event → Tag ## Graph Schema ### Node Types **Event Node** ```cypher (:Event { id: string, // Hex-encoded event ID (32 bytes) serial: int, // Sequential serial number kind: int, // Event kind created_at: int, // Unix timestamp content: string, // Event content sig: string, // Hex-encoded signature pubkey: string, // Hex-encoded author pubkey tags: string // JSON-encoded tags array }) ``` **Author Node** ```cypher (:Author { pubkey: string // Hex-encoded pubkey (unique) }) ``` **Tag Node** ```cypher (:Tag { type: string, // Tag type (e.g., "t", "d") value: string // Tag value }) ``` **Marker Node** (for metadata) ```cypher (:Marker { key: string, // Unique key value: string // Hex-encoded value }) ``` ### Relationships - `(:Event)-[:AUTHORED_BY]->(:Author)` - Event authorship - `(:Event)-[:REFERENCES]->(:Event)` - Event references (e-tags) - `(:Event)-[:MENTIONS]->(:Author)` - Author mentions (p-tags) - `(:Event)-[:TAGGED_WITH]->(:Tag)` - Generic tag associations ## How Nostr REQ Messages Are Implemented ### Filter to Cypher Translation The query engine in [query-events.go](../pkg/neo4j/query-events.go) translates Nostr filters to Cypher queries: #### 1. ID Filters ```json {"ids": ["abc123..."]} ``` Becomes: ```cypher MATCH (e:Event) WHERE e.id = $id_0 ``` For prefix matching (partial IDs): ```cypher WHERE e.id STARTS WITH $id_0 ``` #### 2. Author Filters ```json {"authors": ["pubkey1...", "pubkey2..."]} ``` Becomes: ```cypher MATCH (e:Event) WHERE e.pubkey IN $authors ``` #### 3. Kind Filters ```json {"kinds": [1, 7]} ``` Becomes: ```cypher MATCH (e:Event) WHERE e.kind IN $kinds ``` #### 4. Time Range Filters ```json {"since": 1234567890, "until": 1234567900} ``` Becomes: ```cypher MATCH (e:Event) WHERE e.created_at >= $since AND e.created_at <= $until ``` #### 5. Tag Filters (Graph Advantage!) ```json {"#t": ["bitcoin", "nostr"]} ``` Becomes: ```cypher MATCH (e:Event) OPTIONAL MATCH (e)-[:TAGGED_WITH]->(t0:Tag) WHERE t0.type = $tagType_0 AND t0.value IN $tagValues_0 ``` This leverages Neo4j's native graph traversal for efficient tag queries! #### 6. Combined Filters ```json { "kinds": [1], "authors": ["abc..."], "#p": ["xyz..."], "limit": 50 } ``` Becomes: ```cypher MATCH (e:Event) OPTIONAL MATCH (e)-[:TAGGED_WITH]->(t0:Tag) WHERE e.kind IN $kinds AND e.pubkey IN $authors AND t0.type = $tagType_0 AND t0.value IN $tagValues_0 RETURN e.id, e.kind, e.created_at, e.content, e.sig, e.pubkey, e.tags ORDER BY e.created_at DESC LIMIT $limit ``` ### Query Execution Flow 1. **Parse Filter**: Extract IDs, authors, kinds, times, tags 2. **Build Cypher**: Construct parameterized query with MATCH/WHERE clauses 3. **Execute**: Run via `ExecuteRead()` with read-only session 4. **Parse Results**: Convert Neo4j records to Nostr events 5. **Return**: Send events back to client ## Configuration All configuration is centralized in `app/config/config.go` and visible via `./orly help`. > **Important:** All environment variables must be defined in `app/config/config.go`. Do not use `os.Getenv()` directly in package code. Database backends receive configuration via the `database.DatabaseConfig` struct. ### Environment Variables ```bash # Neo4j Connection ORLY_NEO4J_URI="bolt://localhost:7687" ORLY_NEO4J_USER="neo4j" ORLY_NEO4J_PASSWORD="password" # Database Type Selection ORLY_DB_TYPE="neo4j" # Data Directory (for Badger metadata storage) ORLY_DATA_DIR="~/.local/share/ORLY" ``` ### Example Docker Compose Setup ```yaml version: '3.8' services: neo4j: image: neo4j:5.15 ports: - "7474:7474" # HTTP - "7687:7687" # Bolt environment: - NEO4J_AUTH=neo4j/password - NEO4J_PLUGINS=["apoc"] volumes: - neo4j_data:/data - neo4j_logs:/logs orly: build: . ports: - "3334:3334" environment: - ORLY_DB_TYPE=neo4j - ORLY_NEO4J_URI=bolt://neo4j:7687 - ORLY_NEO4J_USER=neo4j - ORLY_NEO4J_PASSWORD=password depends_on: - neo4j volumes: neo4j_data: neo4j_logs: ``` ## Performance Considerations ### Advantages Over Badger/DGraph 1. **Native Graph Queries**: Tag relationships and social graph traversals are native operations 2. **Optimized Indexes**: Automatic index usage for constrained properties 3. **Efficient Joins**: Relationship traversals are O(1) lookups 4. **Query Planner**: Neo4j's query planner optimizes complex multi-filter queries ### Tuning Recommendations 1. **Indexes**: The schema creates indexes for: - Event ID (unique constraint + index) - Event kind - Event created_at - Composite: kind + created_at - Tag type + value 2. **Cache Configuration**: Configure Neo4j's page cache and heap size: ```conf # neo4j.conf dbms.memory.heap.initial_size=2G dbms.memory.heap.max_size=4G dbms.memory.pagecache.size=4G ``` 3. **Query Limits**: Always use LIMIT in queries to prevent memory exhaustion ## Implementation Details ### Replaceable Events Replaceable events (kinds 0, 3, 10000-19999) are handled in `WouldReplaceEvent()`: ```cypher MATCH (e:Event {kind: $kind, pubkey: $pubkey}) WHERE e.created_at < $createdAt RETURN e.serial, e.created_at ``` Older events are deleted before saving the new one. ### Parameterized Replaceable Events For kinds 30000-39999, we also match on the d-tag: ```cypher MATCH (e:Event {kind: $kind, pubkey: $pubkey})-[:TAGGED_WITH]->(t:Tag {type: 'd', value: $dValue}) WHERE e.created_at < $createdAt RETURN e.serial ``` ### Event Deletion (NIP-09) Delete events (kind 5) are processed via graph traversal: ```cypher MATCH (target:Event {id: $targetId}) MATCH (delete:Event {kind: 5})-[:REFERENCES]->(target) WHERE delete.pubkey = $pubkey OR delete.pubkey IN $admins RETURN delete.id ``` Only same-author or admin deletions are allowed. ## Comparison with Other Backends | Feature | Badger | DGraph | Neo4j | |---------|--------|--------|-------| | **Storage Type** | Key-value | Graph (distributed) | Graph (native) | | **Query Language** | Custom indexes | DQL | Cypher | | **Tag Queries** | Index lookups | Graph traversal | Native relationships | | **Scaling** | Single-node | Distributed | Cluster/Causal cluster | | **Memory Usage** | Low | Medium | High | | **Setup Complexity** | Minimal | Medium | Medium | | **Best For** | Small relays | Large distributed | Relationship-heavy | ## Development Guide ### Adding New Indexes 1. Update [schema.go](../pkg/neo4j/schema.go) with new index definition 2. Add to `applySchema()` function 3. Restart relay to apply schema changes Example: ```cypher CREATE INDEX event_content_fulltext IF NOT EXISTS FOR (e:Event) ON (e.content) OPTIONS {indexConfig: {`fulltext.analyzer`: 'english'}} ``` ### Custom Queries To add custom query methods: 1. Add method to [query-events.go](../pkg/neo4j/query-events.go) 2. Build Cypher query with parameterization 3. Use `ExecuteRead()` or `ExecuteWrite()` as appropriate 4. Parse results with `parseEventsFromResult()` ### Testing Due to Neo4j dependency, tests require a running Neo4j instance: ```bash # Start Neo4j via Docker docker run -d --name neo4j-test \ -p 7687:7687 \ -e NEO4J_AUTH=neo4j/test \ neo4j:5.15 # Run tests ORLY_NEO4J_URI="bolt://localhost:7687" \ ORLY_NEO4J_USER="neo4j" \ ORLY_NEO4J_PASSWORD="test" \ go test ./pkg/neo4j/... # Cleanup docker rm -f neo4j-test ``` ## Future Enhancements 1. **Full-text Search**: Leverage Neo4j's full-text indexes for content search 2. **Graph Analytics**: Implement social graph metrics (centrality, communities) 3. **Advanced Queries**: Support NIP-50 search via Cypher full-text capabilities 4. **Clustering**: Deploy Neo4j cluster for high availability 5. **APOC Procedures**: Utilize APOC library for advanced graph algorithms 6. **Caching Layer**: Implement query result caching similar to Badger backend ## Troubleshooting ### Connection Issues ```bash # Test connectivity cypher-shell -a bolt://localhost:7687 -u neo4j -p password # Check Neo4j logs docker logs neo4j ``` ### Performance Issues ```cypher // View query execution plan EXPLAIN MATCH (e:Event) WHERE e.kind = 1 RETURN e LIMIT 10 // Profile query performance PROFILE MATCH (e:Event)-[:AUTHORED_BY]->(a:Author) RETURN e, a LIMIT 10 ``` ### Schema Issues ```cypher // List all constraints SHOW CONSTRAINTS // List all indexes SHOW INDEXES // Drop and recreate schema DROP CONSTRAINT event_id_unique IF EXISTS CREATE CONSTRAINT event_id_unique FOR (e:Event) REQUIRE e.id IS UNIQUE ``` ## References - [Neo4j Documentation](https://neo4j.com/docs/) - [Cypher Query Language](https://neo4j.com/docs/cypher-manual/current/) - [Neo4j Go Driver](https://neo4j.com/docs/go-manual/current/) - [Graph Database Patterns](https://neo4j.com/developer/graph-db-vs-rdbms/) - [Nostr Protocol (NIP-01)](https://github.com/nostr-protocol/nips/blob/master/01.md) ## License This Neo4j backend implementation follows the same license as the ORLY relay project.