# Event-Driven Vertex Management Implementation Summary ## What Was Implemented This document summarizes the event-driven vertex management system for maintaining NostrUser nodes and social graph relationships in the ORLY Neo4j backend. ## Architecture Overview ### Dual Node System The implementation uses two separate node types: 1. **Event and Author nodes** (NIP-01 Support) - Created by `SaveEvent()` for ALL events - Used for standard Nostr REQ filter queries - Supports kinds, authors, tags, time ranges, etc. - **Always maintained** - this is the core relay functionality 2. **NostrUser nodes** (Social Graph / WoT) - Created by `SocialEventProcessor` for kinds 0, 3, 1984, 10000 - Used for social graph queries and Web of Trust metrics - Connected by FOLLOWS, MUTES, REPORTS relationships - **Event traceable** - every relationship links back to the event that created it This dual approach ensures **NIP-01 queries continue to work** while adding social graph capabilities. ## Files Created/Modified ### New Files 1. **[EVENT_PROCESSING_SPEC.md](./EVENT_PROCESSING_SPEC.md)** (120 KB) - Complete specification for event-driven vertex management - Processing workflows for each event kind - Diff computation algorithms - Cypher query patterns - Implementation architecture with Go code examples - Testing strategy and performance considerations 2. **[social-event-processor.go](./social-event-processor.go)** (18 KB) - `SocialEventProcessor` struct and methods - `processContactList()` - handles kind 3 (follows) - `processMuteList()` - handles kind 10000 (mutes) - `processReport()` - handles kind 1984 (reports) - `processProfileMetadata()` - handles kind 0 (profiles) - Helper functions for diff computation and p-tag extraction - Batch processing support 3. **[WOT_SPEC.md](./WOT_SPEC.md)** (40 KB) - Web of Trust data model specification - Trust metrics definitions (GrapeRank, PageRank, etc.) - Deployment modes (lean vs. full) - Example Cypher queries 4. **[ADDITIONAL_REQUIREMENTS.md](./ADDITIONAL_REQUIREMENTS.md)** (30 KB) - 50+ missing implementation details identified - Organized into 12 categories - Phased implementation roadmap - Research needed items flagged ### Modified Files 1. **[schema.go](./schema.go)** - Added `ProcessedSocialEvent` node constraint - Added indexes for social event processing - Maintained all existing NIP-01 schema (Event, Author, Tag, Marker) - Added WoT schema (NostrUser, WoT metrics nodes) - Updated `dropAll()` to handle new schema elements 2. **[save-event.go](./save-event.go)** - Integrated `SocialEventProcessor` call after base event save - Social processing is **non-blocking** (errors logged but don't fail save) - Processes kinds 0, 3, 1984, 10000 automatically - NIP-01 functionality preserved 3. **[README.md](./README.md)** - Added WoT features to feature list - Documented new specification files - Added file structure section ## Data Model ### Node Types #### ProcessedSocialEvent (Tracking Node) ```cypher (:ProcessedSocialEvent { event_id: string, // Hex event ID event_kind: int, // 0, 3, 1984, or 10000 pubkey: string, // Author pubkey created_at: timestamp, // Event timestamp processed_at: timestamp, // When relay processed it relationship_count: int, // How many relationships created superseded_by: string|null // If replaced by newer event }) ``` #### NostrUser (Social Graph Vertex) ```cypher (:NostrUser { pubkey: string, // Hex pubkey (unique) npub: string, // Bech32 npub name: string, // Profile name about: string, // Profile about picture: string, // Profile picture URL nip05: string, // NIP-05 identifier // ... other profile fields // ... trust metrics (future) }) ``` ### Relationship Types (With Event Traceability) #### FOLLOWS ```cypher (:NostrUser)-[:FOLLOWS { created_by_event: string, // Event ID that created this created_at: timestamp, // Event timestamp relay_received_at: timestamp }]->(:NostrUser) ``` #### MUTES ```cypher (:NostrUser)-[:MUTES { created_by_event: string, created_at: timestamp, relay_received_at: timestamp }]->(:NostrUser) ``` #### REPORTS ```cypher (:NostrUser)-[:REPORTS { created_by_event: string, created_at: timestamp, relay_received_at: timestamp, report_type: string // NIP-56 type (spam, illegal, etc.) }]->(:NostrUser) ``` ## Event Processing Flow ### Kind 3 (Contact List) - Follow Relationships ``` 1. Receive kind 3 event 2. Check if event already exists (base Event node) └─ If exists: return (already processed) 3. Save base Event + Author nodes (NIP-01) 4. Social processing: a. Check for existing kind 3 from this pubkey b. If new event is older: skip (don't replace newer with older) c. Extract p-tags from new event → new_follows[] d. Query old FOLLOWS relationships → old_follows[] e. Compute diff: - added_follows = new_follows - old_follows - removed_follows = old_follows - new_follows f. Transaction: - Mark old ProcessedSocialEvent as superseded - Create new ProcessedSocialEvent - DELETE removed FOLLOWS relationships - CREATE added FOLLOWS relationships g. Log: "processed contact list: added=X, removed=Y" ``` **Example:** ``` Event 1: Alice follows [Bob, Charlie] → Creates FOLLOWS to Bob and Charlie Event 2: Alice follows [Bob, Dave] (newer) → Diff: added=[Dave], removed=[Charlie] → Marks Event 1 as superseded → Deletes FOLLOWS to Charlie → Creates FOLLOWS to Dave → Result: Alice follows [Bob, Dave] ``` ### Kind 10000 (Mute List) - Mute Relationships Same pattern as kind 3, but creates/updates MUTES relationships. ### Kind 1984 (Reporting) - Report Relationships **Different:** Kind 1984 is NOT replaceable. ``` 1. Receive kind 1984 event 2. Save base Event + Author nodes 3. Social processing: a. Extract p-tag (reported user) and report type b. Create ProcessedSocialEvent node c. Create REPORTS relationship (new, not replacing old) ``` **Multiple Reports:** Same user can report same target multiple times. Each creates a separate REPORTS relationship. ### Kind 0 (Profile Metadata) - User Properties ``` 1. Receive kind 0 event 2. Save base Event + Author nodes 3. Social processing: a. Parse JSON content b. MERGE NostrUser (create if not exists) c. SET profile properties (name, about, picture, etc.) ``` **Note:** Kind 0 is replaceable, but we don't diff - just update all profile fields. ## Key Design Decisions ### 1. Event Traceability **All relationships include `created_by_event` property.** This enables: - Diff-based updates (know which relationships to remove when event is replaced) - Event deletion (remove associated relationships) - Auditing (track provenance of all data) - Temporal queries (state of graph at specific time) ### 2. Separate Event Tracking Node (ProcessedSocialEvent) Instead of marking Event nodes as processed, we use a separate ProcessedSocialEvent node. **Why?** - Event node represents the raw Nostr event (immutable) - ProcessedSocialEvent tracks our processing state (mutable) - Can have multiple processing states for same event (future: different algorithms) - Clean separation of concerns ### 3. Superseded Chain When a replaceable event is updated: - Old ProcessedSocialEvent.superseded_by = new_event_id - New ProcessedSocialEvent.superseded_by = null This creates a chain: Event1 → Event2 → Event3 (current) **Benefits:** - Can query history of user's follows over time - Can reconstruct graph at any point in time - Can debug issues ("why was this relationship removed?") ### 4. Non-Blocking Social Processing Social event processing errors are logged but don't fail the base event save. **Rationale:** - NIP-01 queries are the core relay function - Social graph is supplementary (for WoT, filtering, etc.) - Relay should continue operating even if social processing fails - Can reprocess events later if social processing was fixed ### 5. Dual Node System (Event/Author vs. NostrUser) **Why not merge Author and NostrUser?** - Event and Author are for NIP-01 queries (fast, simple) - NostrUser is for social graph (complex relationships) - Different query patterns and optimization strategies - Keeps social graph overhead separate from core relay performance - Future: May merge, but keep separate for now (pre-alpha) ## Query Examples ### Get Current Follows for User ```cypher MATCH (user:NostrUser {pubkey: $pubkey})-[f:FOLLOWS]->(followed:NostrUser) WHERE NOT EXISTS { MATCH (old:ProcessedSocialEvent {event_id: f.created_by_event}) WHERE old.superseded_by IS NOT NULL } RETURN followed.pubkey, followed.name ``` ### Get Mutual Follows (Friends) ```cypher MATCH (user:NostrUser {pubkey: $pubkey})-[:FOLLOWS]->(friend:NostrUser) -[:FOLLOWS]->(user) RETURN friend.pubkey, friend.name ``` ### Count Reports by Type ```cypher MATCH (reported:NostrUser {pubkey: $pubkey})<-[r:REPORTS]-() RETURN r.report_type, count(*) as count ORDER BY count DESC ``` ### Follow Graph History ```cypher MATCH (evt:ProcessedSocialEvent {pubkey: $pubkey, event_kind: 3}) RETURN evt.event_id, evt.created_at, evt.relationship_count, evt.superseded_by ORDER BY evt.created_at ASC ``` ## Testing Plan ### Unit Tests 1. **Diff computation** - Empty lists - All new, all removed - Mixed additions and removals - Duplicate handling 2. **Event ordering** - Newer replaces older - Older rejected - Same timestamp 3. **P-tag extraction** - Valid tags - Invalid pubkeys - Empty lists ### Integration Tests 1. **Contact list updates** - Initial list - Add follows - Remove follows - Replace entire list - Empty list (unfollow all) 2. **Multiple users** - Alice follows Bob - Bob follows Charlie - Verify independent updates 3. **Replaceable event semantics** - Older event arrives after newer - Verify rejection 4. **Report accumulation** - Multiple reports from same user - Multiple users reporting same target - Different report types ### Performance Tests 1. **Large contact lists** - 1000+ follows - Diff computation time - Transaction time 2. **High volume** - 1000 events/sec - Graph write throughput - Query performance ## Next Steps ### Phase 1: Basic Testing (Current) - [ ] Unit tests for social-event-processor.go - [ ] Integration tests with live Neo4j instance - [ ] Test with real Nostr events ### Phase 2: Optimization - [ ] Batch processing for initial sync - [ ] Index tuning based on query patterns - [ ] Transaction optimization ### Phase 3: Trust Metrics (Future) - [ ] Implement hops calculation (shortest path) - [ ] Research/implement GrapeRank algorithm - [ ] Implement Personalized PageRank - [ ] Compute and store trust metrics ### Phase 4: Query Extensions (Future) - [ ] REQ filter extensions (max_hops, min_influence) - [ ] Trust metrics query API - [ ] Multi-tenant support ## Configuration Currently **no configuration needed** - social event processing is automatic for kinds 0, 3, 1984, 10000. Future configuration options: ```bash # Enable/disable social graph processing ORLY_SOCIAL_GRAPH_ENABLED=true # Which event kinds to process ORLY_SOCIAL_GRAPH_KINDS=0,3,1984,10000 # Batch size for initial sync ORLY_SOCIAL_GRAPH_BATCH_SIZE=1000 # Enable WoT metrics computation (future) ORLY_WOT_ENABLED=false ``` ## Monitoring ### Metrics to Track - Events processed per second (by kind) - Average diff size (added/removed counts) - Transaction durations - Error rates by error type - Graph size (node/relationship counts) ### Logs to Monitor ``` INFO: processed contact list: author=abc123, added=5, removed=2, total=50 INFO: processed mute list: author=abc123, added=1, removed=0 INFO: processed report: reporter=abc123, reported=def456, type=spam ERROR: failed to process social event kind 3: ``` ## Known Limitations 1. **No event deletion support yet** - Kind 5 (event deletion) not implemented - Relationships persist even if event deleted 2. **No encrypted tag support** - Kind 10000 can have encrypted tags (private mutes) - Currently only processes public tags 3. **No temporal queries yet** - Can't query "who did Alice follow on 2024-01-01?" - Superseded chain exists but query API not implemented 4. **No batch import tool** - Processing events one at a time - Need tool for initial sync from existing relay 5. **No trust metrics computation** - NostrUser nodes created but metrics not calculated - Requires Phase 3 implementation ## Performance Characteristics ### Expected Performance - **Small contact lists** (< 100 follows): < 100ms per event - **Medium contact lists** (100-500 follows): 100-500ms per event - **Large contact lists** (500-1000 follows): 500-1000ms per event - **Very large contact lists** (> 1000 follows): May need optimization ### Bottlenecks 1. **Diff computation** - O(n) where n = max(old_size, new_size) 2. **Graph writes** - Neo4j transaction overhead 3. **Multiple network round trips** - Could batch queries ### Optimization Opportunities 1. Use APOC procedures for batch operations 2. Cache frequently accessed user data 3. Parallelize independent social event processing 4. Use Neo4j Graph Data Science library for trust metrics ## Summary This implementation provides a **solid foundation** for social graph management in the ORLY Neo4j backend: ✅ **Event traceability** - All relationships link to source events ✅ **Diff-based updates** - Efficient replaceable event handling ✅ **NIP-01 compatible** - Standard queries still work ✅ **Extensible** - Ready for trust metrics computation ✅ **Well-documented** - Comprehensive specs and code comments The system is **ready for testing and deployment** in a pre-alpha environment.