# Pubkey Graph System ## Overview The pubkey graph system provides efficient social graph queries by creating bidirectional, direction-aware edges between events and pubkeys in the ORLY relay. ## Architecture ### 1. Pubkey Serial Assignment **Purpose**: Compress 32-byte pubkeys to 5-byte serials for space efficiency. **Tables**: - `pks|pubkey_hash(8)|serial(5)` - Hash-to-serial lookup (16 bytes) - `spk|serial(5)` → 32-byte pubkey (value) - Serial-to-pubkey reverse lookup **Space Savings**: Each graph edge saves 27 bytes per pubkey reference (32 → 5 bytes). ### 2. Graph Edge Storage **Bidirectional edges with metadata**: #### EventPubkeyGraph (Forward) ``` epg|event_serial(5)|pubkey_serial(5)|kind(2)|direction(1) = 16 bytes ``` #### PubkeyEventGraph (Reverse) ``` peg|pubkey_serial(5)|kind(2)|direction(1)|event_serial(5) = 16 bytes ``` ### 3. Direction Byte The direction byte distinguishes relationship types: | Value | Direction | From Event Perspective | From Pubkey Perspective | |-------|-----------|------------------------|-------------------------| | `0` | Author | This pubkey is the event author | I am the author of this event | | `1` | P-Tag Out | Event references this pubkey | *(not used in reverse)* | | `2` | P-Tag In | *(not used in forward)* | I am referenced by this event | **Location in keys**: - **EventPubkeyGraph**: Byte 13 (after 3+5+5) - **PubkeyEventGraph**: Byte 10 (after 3+5+2) ## Graph Edge Creation When an event is saved: 1. **Extract pubkeys**: - Event author: `ev.Pubkey` - P-tags: All `["p", "", ...]` tags 2. **Get or create serials**: Each unique pubkey gets a monotonic 5-byte serial 3. **Create bidirectional edges**: For **author** (pubkey = event author): ``` epg|event_serial|author_serial|kind|0 (author edge) peg|author_serial|kind|0|event_serial (is-author edge) ``` For each **p-tag** (referenced pubkey): ``` epg|event_serial|ptag_serial|kind|1 (outbound reference) peg|ptag_serial|kind|2|event_serial (inbound reference) ``` ## Query Patterns ### Find all events authored by a pubkey ``` Prefix scan: peg|pubkey_serial|*|0|* Filter: direction == 0 (author) ``` ### Find all events mentioning a pubkey (inbound p-tags) ``` Prefix scan: peg|pubkey_serial|*|2|* Filter: direction == 2 (p-tag inbound) ``` ### Find all kind-1 events mentioning a pubkey ``` Prefix scan: peg|pubkey_serial|0x0001|2|* Exact match: kind == 1, direction == 2 ``` ### Find all pubkeys referenced by an event (outbound p-tags) ``` Prefix scan: epg|event_serial|*|*|1 Filter: direction == 1 (p-tag outbound) ``` ### Find the author of an event ``` Prefix scan: epg|event_serial|*|*|0 Filter: direction == 0 (author) ``` ## Implementation Details ### Thread Safety The `GetOrCreatePubkeySerial` function uses: 1. Read transaction to check for existing serial 2. If not found, get next sequence number 3. Write transaction with double-check to handle race conditions 4. Returns existing serial if another goroutine created it concurrently ### Deduplication The save-event function deduplicates pubkeys before creating serials: - Map keyed by hex-encoded pubkey - Prevents duplicate edges when author is also in p-tags ### Edge Cases 1. **Author in p-tags**: Only creates author edge (direction=0), skips duplicate p-tag edge 2. **Invalid p-tags**: Silently skipped if hex decode fails or length != 32 bytes 3. **No p-tags**: Only author edge is created ## Performance Characteristics ### Space Efficiency Per event with N unique pubkeys: - **Old approach** (storing full pubkeys): N × 32 bytes = 32N bytes - **New approach** (using serials): N × 5 bytes = 5N bytes - **Savings**: 27N bytes per event (84% reduction) Example: Event with author + 10 p-tags: - Old: 11 × 32 = 352 bytes - New: 11 × 5 = 55 bytes - **Saved: 297 bytes (84%)** ### Query Performance 1. **Pubkey lookup**: O(1) hash lookup via 8-byte truncated hash 2. **Serial generation**: O(1) atomic increment 3. **Graph queries**: Sequential scan with prefix optimization 4. **Kind filtering**: Built into key ordering, no event decoding needed ## Testing Comprehensive tests verify: - ✅ Serial assignment and deduplication - ✅ Bidirectional graph edge creation - ✅ Multiple events sharing pubkeys - ✅ Direction byte correctness - ✅ Edge cases (invalid pubkeys, non-existent keys) ## Future Query APIs The graph structure supports efficient queries for: 1. **Social Graph Queries**: - Who does Alice follow? (p-tags authored by Alice) - Who follows Bob? (p-tags referencing Bob) - Common connections between Alice and Bob 2. **Event Discovery**: - All replies to Alice's events (kind-1 events with p-tag to Alice) - All events Alice has replied to (kind-1 events by Alice with p-tags) - Quote reposts, mentions, reactions by event kind 3. **Analytics**: - Most-mentioned pubkeys (count p-tag-in edges) - Most active authors (count author edges) - Interaction patterns by kind ## Migration Notes This is a **new index** that: - Runs alongside existing event indexes - Populated automatically for all new events - Does NOT require reindexing existing events (yet) - Can be backfilled via a migration if needed To backfill existing events, run a migration that: 1. Iterates all events 2. Extracts pubkeys and creates serials 3. Creates graph edges for each event