Files
next.orly.dev/pkg/database/PUBKEY_GRAPH.md

186 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Pubkey Graph System
## Overview
The pubkey graph system provides efficient social graph queries by creating bidirectional, direction-aware edges between events and pubkeys in the ORLY relay.
## Architecture
### 1. Pubkey Serial Assignment
**Purpose**: Compress 32-byte pubkeys to 5-byte serials for space efficiency.
**Tables**:
- `pks|pubkey_hash(8)|serial(5)` - Hash-to-serial lookup (16 bytes)
- `spk|serial(5)` → 32-byte pubkey (value) - Serial-to-pubkey reverse lookup
**Space Savings**: Each graph edge saves 27 bytes per pubkey reference (32 → 5 bytes).
### 2. Graph Edge Storage
**Bidirectional edges with metadata**:
#### EventPubkeyGraph (Forward)
```
epg|event_serial(5)|pubkey_serial(5)|kind(2)|direction(1) = 16 bytes
```
#### PubkeyEventGraph (Reverse)
```
peg|pubkey_serial(5)|kind(2)|direction(1)|event_serial(5) = 16 bytes
```
### 3. Direction Byte
The direction byte distinguishes relationship types:
| Value | Direction | From Event Perspective | From Pubkey Perspective |
|-------|-----------|------------------------|-------------------------|
| `0` | Author | This pubkey is the event author | I am the author of this event |
| `1` | P-Tag Out | Event references this pubkey | *(not used in reverse)* |
| `2` | P-Tag In | *(not used in forward)* | I am referenced by this event |
**Location in keys**:
- **EventPubkeyGraph**: Byte 13 (after 3+5+5)
- **PubkeyEventGraph**: Byte 10 (after 3+5+2)
## Graph Edge Creation
When an event is saved:
1. **Extract pubkeys**:
- Event author: `ev.Pubkey`
- P-tags: All `["p", "<hex-pubkey>", ...]` tags
2. **Get or create serials**: Each unique pubkey gets a monotonic 5-byte serial
3. **Create bidirectional edges**:
For **author** (pubkey = event author):
```
epg|event_serial|author_serial|kind|0 (author edge)
peg|author_serial|kind|0|event_serial (is-author edge)
```
For each **p-tag** (referenced pubkey):
```
epg|event_serial|ptag_serial|kind|1 (outbound reference)
peg|ptag_serial|kind|2|event_serial (inbound reference)
```
## Query Patterns
### Find all events authored by a pubkey
```
Prefix scan: peg|pubkey_serial|*|0|*
Filter: direction == 0 (author)
```
### Find all events mentioning a pubkey (inbound p-tags)
```
Prefix scan: peg|pubkey_serial|*|2|*
Filter: direction == 2 (p-tag inbound)
```
### Find all kind-1 events mentioning a pubkey
```
Prefix scan: peg|pubkey_serial|0x0001|2|*
Exact match: kind == 1, direction == 2
```
### Find all pubkeys referenced by an event (outbound p-tags)
```
Prefix scan: epg|event_serial|*|*|1
Filter: direction == 1 (p-tag outbound)
```
### Find the author of an event
```
Prefix scan: epg|event_serial|*|*|0
Filter: direction == 0 (author)
```
## Implementation Details
### Thread Safety
The `GetOrCreatePubkeySerial` function uses:
1. Read transaction to check for existing serial
2. If not found, get next sequence number
3. Write transaction with double-check to handle race conditions
4. Returns existing serial if another goroutine created it concurrently
### Deduplication
The save-event function deduplicates pubkeys before creating serials:
- Map keyed by hex-encoded pubkey
- Prevents duplicate edges when author is also in p-tags
### Edge Cases
1. **Author in p-tags**: Only creates author edge (direction=0), skips duplicate p-tag edge
2. **Invalid p-tags**: Silently skipped if hex decode fails or length != 32 bytes
3. **No p-tags**: Only author edge is created
## Performance Characteristics
### Space Efficiency
Per event with N unique pubkeys:
- **Old approach** (storing full pubkeys): N × 32 bytes = 32N bytes
- **New approach** (using serials): N × 5 bytes = 5N bytes
- **Savings**: 27N bytes per event (84% reduction)
Example: Event with author + 10 p-tags:
- Old: 11 × 32 = 352 bytes
- New: 11 × 5 = 55 bytes
- **Saved: 297 bytes (84%)**
### Query Performance
1. **Pubkey lookup**: O(1) hash lookup via 8-byte truncated hash
2. **Serial generation**: O(1) atomic increment
3. **Graph queries**: Sequential scan with prefix optimization
4. **Kind filtering**: Built into key ordering, no event decoding needed
## Testing
Comprehensive tests verify:
- ✅ Serial assignment and deduplication
- ✅ Bidirectional graph edge creation
- ✅ Multiple events sharing pubkeys
- ✅ Direction byte correctness
- ✅ Edge cases (invalid pubkeys, non-existent keys)
## Future Query APIs
The graph structure supports efficient queries for:
1. **Social Graph Queries**:
- Who does Alice follow? (p-tags authored by Alice)
- Who follows Bob? (p-tags referencing Bob)
- Common connections between Alice and Bob
2. **Event Discovery**:
- All replies to Alice's events (kind-1 events with p-tag to Alice)
- All events Alice has replied to (kind-1 events by Alice with p-tags)
- Quote reposts, mentions, reactions by event kind
3. **Analytics**:
- Most-mentioned pubkeys (count p-tag-in edges)
- Most active authors (count author edges)
- Interaction patterns by kind
## Migration Notes
This is a **new index** that:
- Runs alongside existing event indexes
- Populated automatically for all new events
- Does NOT require reindexing existing events (yet)
- Can be backfilled via a migration if needed
To backfill existing events, run a migration that:
1. Iterates all events
2. Extracts pubkeys and creates serials
3. Creates graph edges for each event