initial draft of neo4j database driver

2025-11-17 08:19:44 +00:00
parent beed174e83
commit 86481a42e8
22 changed files with 3259 additions and 14 deletions
--- a/docs/NEO4J_BACKEND.md
+++ b/docs/NEO4J_BACKEND.md
@@ -0,0 +1,406 @@
+# Neo4j Database Backend for ORLY Relay
+
+## Overview
+
+The Neo4j database backend provides a graph-native storage solution for the ORLY Nostr relay. Unlike traditional key-value or document stores, Neo4j is optimized for relationship-heavy queries, making it an ideal fit for Nostr's social graph and event reference patterns.
+
+## Architecture
+
+### Core Components
+
+1. **Main Database File** ([pkg/neo4j/neo4j.go](../pkg/neo4j/neo4j.go))
+   - Implements the `database.Database` interface
+   - Manages Neo4j driver connection and lifecycle
+   - Uses Badger for metadata storage (markers, identity, subscriptions)
+   - Registers with the database factory via `init()`
+
+2. **Schema Management** ([pkg/neo4j/schema.go](../pkg/neo4j/schema.go))
+   - Defines Neo4j constraints and indexes using Cypher
+   - Creates unique constraints on Event IDs and Author pubkeys
+   - Indexes for optimal query performance (kind, created_at, tags)
+
+3. **Query Engine** ([pkg/neo4j/query-events.go](../pkg/neo4j/query-events.go))
+   - Translates Nostr REQ filters to Cypher queries
+   - Leverages graph traversal for tag relationships
+   - Supports prefix matching for IDs and pubkeys
+   - Parameterized queries for security and performance
+
+4. **Event Storage** ([pkg/neo4j/save-event.go](../pkg/neo4j/save-event.go))
+   - Stores events as nodes with properties
+   - Creates graph relationships:
+     - `AUTHORED_BY`: Event → Author
+     - `REFERENCES`: Event → Event (e-tags)
+     - `MENTIONS`: Event → Author (p-tags)
+     - `TAGGED_WITH`: Event → Tag
+
+## Graph Schema
+
+### Node Types
+
+**Event Node**
+```cypher
+(:Event {
+  id: string,           // Hex-encoded event ID (32 bytes)
+  serial: int,          // Sequential serial number
+  kind: int,            // Event kind
+  created_at: int,      // Unix timestamp
+  content: string,      // Event content
+  sig: string,          // Hex-encoded signature
+  pubkey: string,       // Hex-encoded author pubkey
+  tags: string          // JSON-encoded tags array
+})
+```
+
+**Author Node**
+```cypher
+(:Author {
+  pubkey: string        // Hex-encoded pubkey (unique)
+})
+```
+
+**Tag Node**
+```cypher
+(:Tag {
+  type: string,         // Tag type (e.g., "t", "d")
+  value: string         // Tag value
+})
+```
+
+**Marker Node** (for metadata)
+```cypher
+(:Marker {
+  key: string,          // Unique key
+  value: string         // Hex-encoded value
+})
+```
+
+### Relationships
+
+- `(:Event)-[:AUTHORED_BY]->(:Author)` - Event authorship
+- `(:Event)-[:REFERENCES]->(:Event)` - Event references (e-tags)
+- `(:Event)-[:MENTIONS]->(:Author)` - Author mentions (p-tags)
+- `(:Event)-[:TAGGED_WITH]->(:Tag)` - Generic tag associations
+
+## How Nostr REQ Messages Are Implemented
+
+### Filter to Cypher Translation
+
+The query engine in [query-events.go](../pkg/neo4j/query-events.go) translates Nostr filters to Cypher queries:
+
+#### 1. ID Filters
+```json
+{"ids": ["abc123..."]}
+```
+Becomes:
+```cypher
+MATCH (e:Event)
+WHERE e.id = $id_0
+```
+
+For prefix matching (partial IDs):
+```cypher
+WHERE e.id STARTS WITH $id_0
+```
+
+#### 2. Author Filters
+```json
+{"authors": ["pubkey1...", "pubkey2..."]}
+```
+Becomes:
+```cypher
+MATCH (e:Event)
+WHERE e.pubkey IN $authors
+```
+
+#### 3. Kind Filters
+```json
+{"kinds": [1, 7]}
+```
+Becomes:
+```cypher
+MATCH (e:Event)
+WHERE e.kind IN $kinds
+```
+
+#### 4. Time Range Filters
+```json
+{"since": 1234567890, "until": 1234567900}
+```
+Becomes:
+```cypher
+MATCH (e:Event)
+WHERE e.created_at >= $since AND e.created_at <= $until
+```
+
+#### 5. Tag Filters (Graph Advantage!)
+```json
+{"#t": ["bitcoin", "nostr"]}
+```
+Becomes:
+```cypher
+MATCH (e:Event)
+OPTIONAL MATCH (e)-[:TAGGED_WITH]->(t0:Tag)
+WHERE t0.type = $tagType_0 AND t0.value IN $tagValues_0
+```
+
+This leverages Neo4j's native graph traversal for efficient tag queries!
+
+#### 6. Combined Filters
+```json
+{
+  "kinds": [1],
+  "authors": ["abc..."],
+  "#p": ["xyz..."],
+  "limit": 50
+}
+```
+Becomes:
+```cypher
+MATCH (e:Event)
+OPTIONAL MATCH (e)-[:TAGGED_WITH]->(t0:Tag)
+WHERE e.kind IN $kinds
+  AND e.pubkey IN $authors
+  AND t0.type = $tagType_0
+  AND t0.value IN $tagValues_0
+RETURN e.id, e.kind, e.created_at, e.content, e.sig, e.pubkey, e.tags
+ORDER BY e.created_at DESC
+LIMIT $limit
+```
+
+### Query Execution Flow
+
+1. **Parse Filter**: Extract IDs, authors, kinds, times, tags
+2. **Build Cypher**: Construct parameterized query with MATCH/WHERE clauses
+3. **Execute**: Run via `ExecuteRead()` with read-only session
+4. **Parse Results**: Convert Neo4j records to Nostr events
+5. **Return**: Send events back to client
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# Neo4j Connection
+ORLY_NEO4J_URI="bolt://localhost:7687"
+ORLY_NEO4J_USER="neo4j"
+ORLY_NEO4J_PASSWORD="password"
+
+# Database Type Selection
+ORLY_DB_TYPE="neo4j"
+
+# Data Directory (for Badger metadata storage)
+ORLY_DATA_DIR="~/.local/share/ORLY"
+```
+
+### Example Docker Compose Setup
+
+```yaml
+version: '3.8'
+services:
+  neo4j:
+    image: neo4j:5.15
+    ports:
+      - "7474:7474"  # HTTP
+      - "7687:7687"  # Bolt
+    environment:
+      - NEO4J_AUTH=neo4j/password
+      - NEO4J_PLUGINS=["apoc"]
+    volumes:
+      - neo4j_data:/data
+      - neo4j_logs:/logs
+
+  orly:
+    build: .
+    ports:
+      - "3334:3334"
+    environment:
+      - ORLY_DB_TYPE=neo4j
+      - ORLY_NEO4J_URI=bolt://neo4j:7687
+      - ORLY_NEO4J_USER=neo4j
+      - ORLY_NEO4J_PASSWORD=password
+    depends_on:
+      - neo4j
+
+volumes:
+  neo4j_data:
+  neo4j_logs:
+```
+
+## Performance Considerations
+
+### Advantages Over Badger/DGraph
+
+1. **Native Graph Queries**: Tag relationships and social graph traversals are native operations
+2. **Optimized Indexes**: Automatic index usage for constrained properties
+3. **Efficient Joins**: Relationship traversals are O(1) lookups
+4. **Query Planner**: Neo4j's query planner optimizes complex multi-filter queries
+
+### Tuning Recommendations
+
+1. **Indexes**: The schema creates indexes for:
+   - Event ID (unique constraint + index)
+   - Event kind
+   - Event created_at
+   - Composite: kind + created_at
+   - Tag type + value
+
+2. **Cache Configuration**: Configure Neo4j's page cache and heap size:
+```conf
+# neo4j.conf
+dbms.memory.heap.initial_size=2G
+dbms.memory.heap.max_size=4G
+dbms.memory.pagecache.size=4G
+```
+
+3. **Query Limits**: Always use LIMIT in queries to prevent memory exhaustion
+
+## Implementation Details
+
+### Replaceable Events
+
+Replaceable events (kinds 0, 3, 10000-19999) are handled in `WouldReplaceEvent()`:
+
+```cypher
+MATCH (e:Event {kind: $kind, pubkey: $pubkey})
+WHERE e.created_at < $createdAt
+RETURN e.serial, e.created_at
+```
+
+Older events are deleted before saving the new one.
+
+### Parameterized Replaceable Events
+
+For kinds 30000-39999, we also match on the d-tag:
+
+```cypher
+MATCH (e:Event {kind: $kind, pubkey: $pubkey})-[:TAGGED_WITH]->(t:Tag {type: 'd', value: $dValue})
+WHERE e.created_at < $createdAt
+RETURN e.serial
+```
+
+### Event Deletion (NIP-09)
+
+Delete events (kind 5) are processed via graph traversal:
+
+```cypher
+MATCH (target:Event {id: $targetId})
+MATCH (delete:Event {kind: 5})-[:REFERENCES]->(target)
+WHERE delete.pubkey = $pubkey OR delete.pubkey IN $admins
+RETURN delete.id
+```
+
+Only same-author or admin deletions are allowed.
+
+## Comparison with Other Backends
+
+| Feature | Badger | DGraph | Neo4j |
+|---------|--------|--------|-------|
+| **Storage Type** | Key-value | Graph (distributed) | Graph (native) |
+| **Query Language** | Custom indexes | DQL | Cypher |
+| **Tag Queries** | Index lookups | Graph traversal | Native relationships |
+| **Scaling** | Single-node | Distributed | Cluster/Causal cluster |
+| **Memory Usage** | Low | Medium | High |
+| **Setup Complexity** | Minimal | Medium | Medium |
+| **Best For** | Small relays | Large distributed | Relationship-heavy |
+
+## Development Guide
+
+### Adding New Indexes
+
+1. Update [schema.go](../pkg/neo4j/schema.go) with new index definition
+2. Add to `applySchema()` function
+3. Restart relay to apply schema changes
+
+Example:
+```cypher
+CREATE INDEX event_content_fulltext IF NOT EXISTS
+FOR (e:Event) ON (e.content)
+OPTIONS {indexConfig: {`fulltext.analyzer`: 'english'}}
+```
+
+### Custom Queries
+
+To add custom query methods:
+
+1. Add method to [query-events.go](../pkg/neo4j/query-events.go)
+2. Build Cypher query with parameterization
+3. Use `ExecuteRead()` or `ExecuteWrite()` as appropriate
+4. Parse results with `parseEventsFromResult()`
+
+### Testing
+
+Due to Neo4j dependency, tests require a running Neo4j instance:
+
+```bash
+# Start Neo4j via Docker
+docker run -d --name neo4j-test \
+  -p 7687:7687 \
+  -e NEO4J_AUTH=neo4j/test \
+  neo4j:5.15
+
+# Run tests
+ORLY_NEO4J_URI="bolt://localhost:7687" \
+ORLY_NEO4J_USER="neo4j" \
+ORLY_NEO4J_PASSWORD="test" \
+go test ./pkg/neo4j/...
+
+# Cleanup
+docker rm -f neo4j-test
+```
+
+## Future Enhancements
+
+1. **Full-text Search**: Leverage Neo4j's full-text indexes for content search
+2. **Graph Analytics**: Implement social graph metrics (centrality, communities)
+3. **Advanced Queries**: Support NIP-50 search via Cypher full-text capabilities
+4. **Clustering**: Deploy Neo4j cluster for high availability
+5. **APOC Procedures**: Utilize APOC library for advanced graph algorithms
+6. **Caching Layer**: Implement query result caching similar to Badger backend
+
+## Troubleshooting
+
+### Connection Issues
+
+```bash
+# Test connectivity
+cypher-shell -a bolt://localhost:7687 -u neo4j -p password
+
+# Check Neo4j logs
+docker logs neo4j
+```
+
+### Performance Issues
+
+```cypher
+// View query execution plan
+EXPLAIN MATCH (e:Event) WHERE e.kind = 1 RETURN e LIMIT 10
+
+// Profile query performance
+PROFILE MATCH (e:Event)-[:AUTHORED_BY]->(a:Author) RETURN e, a LIMIT 10
+```
+
+### Schema Issues
+
+```cypher
+// List all constraints
+SHOW CONSTRAINTS
+
+// List all indexes
+SHOW INDEXES
+
+// Drop and recreate schema
+DROP CONSTRAINT event_id_unique IF EXISTS
+CREATE CONSTRAINT event_id_unique FOR (e:Event) REQUIRE e.id IS UNIQUE
+```
+
+## References
+
+- [Neo4j Documentation](https://neo4j.com/docs/)
+- [Cypher Query Language](https://neo4j.com/docs/cypher-manual/current/)
+- [Neo4j Go Driver](https://neo4j.com/docs/go-manual/current/)
+- [Graph Database Patterns](https://neo4j.com/developer/graph-db-vs-rdbms/)
+- [Nostr Protocol (NIP-01)](https://github.com/nostr-protocol/nips/blob/master/01.md)
+
+## License
+
+This Neo4j backend implementation follows the same license as the ORLY relay project.