create new index that records the links between pubkeys, events, kinds, and inbound/outbound/author

This commit is contained in:
2025-11-20 05:13:56 +00:00
parent d4e2f48b7e
commit b7417ab5eb
7 changed files with 932 additions and 2 deletions

View File

@@ -0,0 +1,185 @@
# Pubkey Graph System
## Overview
The pubkey graph system provides efficient social graph queries by creating bidirectional, direction-aware edges between events and pubkeys in the ORLY relay.
## Architecture
### 1. Pubkey Serial Assignment
**Purpose**: Compress 32-byte pubkeys to 5-byte serials for space efficiency.
**Tables**:
- `pks|pubkey_hash(8)|serial(5)` - Hash-to-serial lookup (16 bytes)
- `spk|serial(5)` → 32-byte pubkey (value) - Serial-to-pubkey reverse lookup
**Space Savings**: Each graph edge saves 27 bytes per pubkey reference (32 → 5 bytes).
### 2. Graph Edge Storage
**Bidirectional edges with metadata**:
#### EventPubkeyGraph (Forward)
```
epg|event_serial(5)|pubkey_serial(5)|kind(2)|direction(1) = 16 bytes
```
#### PubkeyEventGraph (Reverse)
```
peg|pubkey_serial(5)|kind(2)|direction(1)|event_serial(5) = 16 bytes
```
### 3. Direction Byte
The direction byte distinguishes relationship types:
| Value | Direction | From Event Perspective | From Pubkey Perspective |
|-------|-----------|------------------------|-------------------------|
| `0` | Author | This pubkey is the event author | I am the author of this event |
| `1` | P-Tag Out | Event references this pubkey | *(not used in reverse)* |
| `2` | P-Tag In | *(not used in forward)* | I am referenced by this event |
**Location in keys**:
- **EventPubkeyGraph**: Byte 13 (after 3+5+5)
- **PubkeyEventGraph**: Byte 10 (after 3+5+2)
## Graph Edge Creation
When an event is saved:
1. **Extract pubkeys**:
- Event author: `ev.Pubkey`
- P-tags: All `["p", "<hex-pubkey>", ...]` tags
2. **Get or create serials**: Each unique pubkey gets a monotonic 5-byte serial
3. **Create bidirectional edges**:
For **author** (pubkey = event author):
```
epg|event_serial|author_serial|kind|0 (author edge)
peg|author_serial|kind|0|event_serial (is-author edge)
```
For each **p-tag** (referenced pubkey):
```
epg|event_serial|ptag_serial|kind|1 (outbound reference)
peg|ptag_serial|kind|2|event_serial (inbound reference)
```
## Query Patterns
### Find all events authored by a pubkey
```
Prefix scan: peg|pubkey_serial|*|0|*
Filter: direction == 0 (author)
```
### Find all events mentioning a pubkey (inbound p-tags)
```
Prefix scan: peg|pubkey_serial|*|2|*
Filter: direction == 2 (p-tag inbound)
```
### Find all kind-1 events mentioning a pubkey
```
Prefix scan: peg|pubkey_serial|0x0001|2|*
Exact match: kind == 1, direction == 2
```
### Find all pubkeys referenced by an event (outbound p-tags)
```
Prefix scan: epg|event_serial|*|*|1
Filter: direction == 1 (p-tag outbound)
```
### Find the author of an event
```
Prefix scan: epg|event_serial|*|*|0
Filter: direction == 0 (author)
```
## Implementation Details
### Thread Safety
The `GetOrCreatePubkeySerial` function uses:
1. Read transaction to check for existing serial
2. If not found, get next sequence number
3. Write transaction with double-check to handle race conditions
4. Returns existing serial if another goroutine created it concurrently
### Deduplication
The save-event function deduplicates pubkeys before creating serials:
- Map keyed by hex-encoded pubkey
- Prevents duplicate edges when author is also in p-tags
### Edge Cases
1. **Author in p-tags**: Only creates author edge (direction=0), skips duplicate p-tag edge
2. **Invalid p-tags**: Silently skipped if hex decode fails or length != 32 bytes
3. **No p-tags**: Only author edge is created
## Performance Characteristics
### Space Efficiency
Per event with N unique pubkeys:
- **Old approach** (storing full pubkeys): N × 32 bytes = 32N bytes
- **New approach** (using serials): N × 5 bytes = 5N bytes
- **Savings**: 27N bytes per event (84% reduction)
Example: Event with author + 10 p-tags:
- Old: 11 × 32 = 352 bytes
- New: 11 × 5 = 55 bytes
- **Saved: 297 bytes (84%)**
### Query Performance
1. **Pubkey lookup**: O(1) hash lookup via 8-byte truncated hash
2. **Serial generation**: O(1) atomic increment
3. **Graph queries**: Sequential scan with prefix optimization
4. **Kind filtering**: Built into key ordering, no event decoding needed
## Testing
Comprehensive tests verify:
- ✅ Serial assignment and deduplication
- ✅ Bidirectional graph edge creation
- ✅ Multiple events sharing pubkeys
- ✅ Direction byte correctness
- ✅ Edge cases (invalid pubkeys, non-existent keys)
## Future Query APIs
The graph structure supports efficient queries for:
1. **Social Graph Queries**:
- Who does Alice follow? (p-tags authored by Alice)
- Who follows Bob? (p-tags referencing Bob)
- Common connections between Alice and Bob
2. **Event Discovery**:
- All replies to Alice's events (kind-1 events with p-tag to Alice)
- All events Alice has replied to (kind-1 events by Alice with p-tags)
- Quote reposts, mentions, reactions by event kind
3. **Analytics**:
- Most-mentioned pubkeys (count p-tag-in edges)
- Most active authors (count author edges)
- Interaction patterns by kind
## Migration Notes
This is a **new index** that:
- Runs alongside existing event indexes
- Populated automatically for all new events
- Does NOT require reindexing existing events (yet)
- Can be backfilled via a migration if needed
To backfill existing events, run a migration that:
1. Iterates all events
2. Extracts pubkeys and creates serials
3. Creates graph edges for each event