Enhance aggregator functionality for Nostr event collection
- Updated the aggregator to support both public (npub) and private (nsec) key inputs for event searching, enabling authentication for relays that require it. - Implemented bloom filter loading and appending capabilities for efficient incremental data collection. - Added timeout parameters for maximum runtime and stuck progress detection to improve reliability. - Enhanced README with detailed usage instructions, authentication behavior, and examples for incremental collection. - Bumped version to v0.17.16.
This commit is contained in:
@@ -5,45 +5,129 @@ A comprehensive program that searches for all events related to a specific npub
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
go run main.go -npub <npub> [-since <timestamp>] [-until <timestamp>]
|
||||
go run main.go -key <nsec|npub> [-since <timestamp>] [-until <timestamp>] [-filter <file>] [-output <file>]
|
||||
```
|
||||
|
||||
Where:
|
||||
- `<npub>` is a bech32-encoded Nostr public key (starting with "npub1")
|
||||
- `<nsec|npub>` is either a bech32-encoded Nostr private key (nsec1...) or public key (npub1...)
|
||||
- `<timestamp>` is a Unix timestamp (seconds since epoch) - optional
|
||||
- `<file>` is a file path for bloom filter input/output - optional
|
||||
|
||||
### Parameters
|
||||
|
||||
- **`-key`**: Required. The bech32-encoded Nostr key to search for events
|
||||
- **nsec**: Private key (enables authentication to relays that require it)
|
||||
- **npub**: Public key (authentication disabled)
|
||||
- **`-since`**: Optional. Start timestamp (Unix seconds). Only events after this time
|
||||
- **`-until`**: Optional. End timestamp (Unix seconds). Only events before this time
|
||||
- **`-filter`**: Optional. File containing base64-encoded bloom filter from previous runs
|
||||
- **`-output`**: Optional. Output file for events (default: stdout)
|
||||
|
||||
### Authentication
|
||||
|
||||
When using an **nsec** (private key), the aggregator will:
|
||||
- Derive the public key from the private key for event searching
|
||||
- Attempt to authenticate to relays that require it (NIP-42)
|
||||
- Continue working even if authentication fails on some relays
|
||||
- Log authentication success/failure for each relay
|
||||
|
||||
When using an **npub** (public key), the aggregator will:
|
||||
- Search for events using the provided public key
|
||||
- Skip authentication (no private key available)
|
||||
- Work with public relays that don't require authentication
|
||||
|
||||
### Behavior
|
||||
|
||||
- **Without `-filter`**: Creates new bloom filter, outputs to stdout or truncates output file
|
||||
- **With `-filter`**: Loads existing bloom filter, automatically appends to output file
|
||||
- **Bloom filter output**: Always written to stderr with timestamp information and base64 data
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Get all events related to a user (authored by and mentioning)
|
||||
go run main.go -npub npub1234567890abcdef...
|
||||
# Get all events related to a user using public key (no authentication)
|
||||
go run main.go -key npub1234567890abcdef...
|
||||
|
||||
# Get all events related to a user using private key (with authentication)
|
||||
go run main.go -key nsec1234567890abcdef...
|
||||
|
||||
# Get events related to a user since January 1, 2022
|
||||
go run main.go -npub npub1234567890abcdef... -since 1640995200
|
||||
go run main.go -key npub1234567890abcdef... -since 1640995200
|
||||
|
||||
# Get events related to a user between two dates
|
||||
go run main.go -npub npub1234567890abcdef... -since 1640995200 -until 1672531200
|
||||
go run main.go -key npub1234567890abcdef... -since 1640995200 -until 1672531200
|
||||
|
||||
# Get events related to a user until December 31, 2022
|
||||
go run main.go -npub npub1234567890abcdef... -until 1672531200
|
||||
go run main.go -key npub1234567890abcdef... -until 1672531200
|
||||
```
|
||||
|
||||
### Incremental Collection with Bloom Filter
|
||||
|
||||
```bash
|
||||
# First run: Collect initial events and save bloom filter (using npub)
|
||||
go run main.go -key npub1234567890abcdef... -since 1640995200 -until 1672531200 -output events.jsonl 2>bloom_filter.txt
|
||||
|
||||
# Second run: Continue from where we left off, append new events (using nsec for auth)
|
||||
go run main.go -key nsec1234567890abcdef... -since 1672531200 -until 1704067200 -filter bloom_filter.txt -output events.jsonl 2>bloom_filter_updated.txt
|
||||
|
||||
# Third run: Collect even more recent events
|
||||
go run main.go -key nsec1234567890abcdef... -since 1704067200 -filter bloom_filter_updated.txt -output events.jsonl 2>bloom_filter_final.txt
|
||||
```
|
||||
|
||||
### Output Redirection
|
||||
|
||||
```bash
|
||||
# Events to file, bloom filter to stderr (visible in terminal)
|
||||
go run main.go -key npub1... -output events.jsonl
|
||||
|
||||
# Events to file, bloom filter to separate file
|
||||
go run main.go -key npub1... -output events.jsonl 2>bloom_filter.txt
|
||||
|
||||
# Events to stdout, bloom filter to file (useful for piping events)
|
||||
go run main.go -key npub1... 2>bloom_filter.txt | jq .
|
||||
|
||||
# Using nsec for authentication to access private relays
|
||||
go run main.go -key nsec1... -output events.jsonl 2>bloom_filter.txt
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### Core Functionality
|
||||
- **Comprehensive event discovery**: Finds both events authored by the user and events that mention the user
|
||||
- **Dynamic relay discovery**: Automatically discovers and connects to new relays from relay list events (kind 10002)
|
||||
- **Progressive backward fetching**: Systematically collects historical data in time-based batches
|
||||
- **Triple filter approach**: Uses separate filters for authored events, p-tag mentions, and relay list events
|
||||
- **Intelligent time management**: Works backwards from current time (or until timestamp) to since timestamp
|
||||
|
||||
### Authentication & Access
|
||||
- **Private key support**: Use nsec keys to authenticate to relays that require it (NIP-42)
|
||||
- **Public key compatibility**: Continue to work with npub keys for public relay access
|
||||
- **Graceful fallback**: Continue operation even if authentication fails on some relays
|
||||
- **Auth-required relay access**: Access private notes and restricted content on authenticated relays
|
||||
- **Flexible key input**: Automatically detects and handles both nsec and npub key formats
|
||||
|
||||
### Memory Management
|
||||
- **Memory-efficient deduplication**: Uses bloom filter with ~0.1% false positive rate instead of unbounded maps
|
||||
- **Fixed memory footprint**: Bloom filter uses only ~1.75MB for 1M events with controlled memory growth
|
||||
- **Memory monitoring**: Real-time memory usage tracking and automatic garbage collection
|
||||
- **Persistent deduplication**: Bloom filter can be saved and reused across multiple runs
|
||||
|
||||
### Incremental Collection
|
||||
- **Bloom filter persistence**: Save deduplication state between runs for efficient incremental collection
|
||||
- **Automatic append mode**: When loading existing bloom filter, automatically appends to output file
|
||||
- **Timestamp tracking**: Records actual time range of processed events in bloom filter output
|
||||
- **Seamless continuation**: Resume collection from where previous run left off without duplicates
|
||||
|
||||
### Reliability & Performance
|
||||
- Connects to multiple relays simultaneously with dynamic expansion
|
||||
- Outputs events in JSONL format (one JSON object per line)
|
||||
- Handles connection failures gracefully
|
||||
- Continues running until all relay connections are closed
|
||||
- Time-based filtering with Unix timestamps (since/until parameters)
|
||||
- Input validation for timestamp ranges
|
||||
- Rate limiting and backoff for relay connection management
|
||||
|
||||
## Event Discovery
|
||||
|
||||
@@ -70,6 +154,61 @@ The aggregator uses an intelligent progressive backward fetching strategy:
|
||||
4. **Efficient processing**: Processes each time batch completely before moving to the next
|
||||
5. **Boundary respect**: Stops when reaching the since timestamp or beginning of available data
|
||||
|
||||
## Incremental Collection Workflow
|
||||
|
||||
The aggregator supports efficient incremental data collection using persistent bloom filters. This allows you to build comprehensive event archives over time without re-processing duplicate events.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **First Run**: Creates a new bloom filter and collects events for the specified time range
|
||||
2. **Bloom Filter Output**: At completion, outputs bloom filter summary to stderr with:
|
||||
- Event statistics (processed count, estimated unique events)
|
||||
- Time range covered (actual timestamps of collected events)
|
||||
- Base64-encoded bloom filter data for reuse
|
||||
3. **Subsequent Runs**: Load the saved bloom filter to skip already-seen events
|
||||
4. **Automatic Append**: When using an existing filter, new events are appended to the output file
|
||||
|
||||
### Bloom Filter Output Format
|
||||
|
||||
The bloom filter output includes comprehensive metadata:
|
||||
|
||||
```
|
||||
=== BLOOM FILTER SUMMARY ===
|
||||
Events processed: 1247
|
||||
Estimated unique events: 1247
|
||||
Bloom filter size: 1.75 MB
|
||||
False positive rate: ~0.1%
|
||||
Hash functions: 10
|
||||
Time range covered: 1640995200 to 1672531200
|
||||
Time range (human): 2022-01-01T00:00:00Z to 2023-01-01T00:00:00Z
|
||||
|
||||
Bloom filter (base64):
|
||||
[base64-encoded binary data]
|
||||
=== END BLOOM FILTER ===
|
||||
```
|
||||
|
||||
### Best Practices
|
||||
|
||||
- **Save bloom filters**: Always redirect stderr to a file to preserve the bloom filter
|
||||
- **Sequential time ranges**: Use non-overlapping time ranges for optimal efficiency
|
||||
- **Regular updates**: Update your bloom filter file after each run for the latest state
|
||||
- **Backup filters**: Keep copies of bloom filter files for different time periods
|
||||
|
||||
### Example Workflow
|
||||
|
||||
```bash
|
||||
# Month 1: January 2022 (using npub for public relays)
|
||||
go run main.go -key npub1... -since 1640995200 -until 1643673600 -output jan2022.jsonl 2>filter_jan.txt
|
||||
|
||||
# Month 2: February 2022 (using nsec for auth-required relays, append to same file)
|
||||
go run main.go -key nsec1... -since 1643673600 -until 1646092800 -filter filter_jan.txt -output all_events.jsonl 2>filter_feb.txt
|
||||
|
||||
# Month 3: March 2022 (continue with authentication for complete coverage)
|
||||
go run main.go -key nsec1... -since 1646092800 -until 1648771200 -filter filter_feb.txt -output all_events.jsonl 2>filter_mar.txt
|
||||
|
||||
# Result: all_events.jsonl contains deduplicated events from all three months, including private relay content
|
||||
```
|
||||
|
||||
## Memory Management
|
||||
|
||||
The aggregator uses advanced memory management techniques to handle large-scale data collection:
|
||||
@@ -108,6 +247,8 @@ The program starts with the following initial relays:
|
||||
|
||||
## Output Format
|
||||
|
||||
### Event Output (stdout or -output file)
|
||||
|
||||
Each line of output is a JSON object representing a Nostr event with the following fields:
|
||||
|
||||
- `id`: Event ID (hex)
|
||||
@@ -117,3 +258,32 @@ Each line of output is a JSON object representing a Nostr event with the followi
|
||||
- `tags`: Array of tag arrays
|
||||
- `content`: Event content string
|
||||
- `sig`: Event signature (hex)
|
||||
|
||||
### Bloom Filter Output (stderr)
|
||||
|
||||
At program completion, a comprehensive bloom filter summary is written to stderr containing:
|
||||
|
||||
- **Statistics**: Event counts, memory usage, performance metrics
|
||||
- **Time Range**: Actual timestamp range of collected events (both Unix and human-readable)
|
||||
- **Configuration**: Bloom filter parameters (size, hash functions, false positive rate)
|
||||
- **Binary Data**: Base64-encoded bloom filter for reuse in subsequent runs
|
||||
|
||||
The bloom filter output is structured with clear markers (`=== BLOOM FILTER SUMMARY ===` and `=== END BLOOM FILTER ===`) making it easy to parse and extract the base64 data programmatically.
|
||||
|
||||
### Output Separation
|
||||
|
||||
- **Events**: Always go to stdout (default) or the file specified by `-output`
|
||||
- **Bloom Filter**: Always goes to stderr, allowing separate redirection
|
||||
- **Logs**: Runtime information and progress updates go to stderr
|
||||
|
||||
This separation allows flexible output handling:
|
||||
```bash
|
||||
# Events to file, bloom filter visible in terminal
|
||||
./aggregator -npub npub1... -output events.jsonl
|
||||
|
||||
# Both events and bloom filter to separate files
|
||||
./aggregator -npub npub1... -output events.jsonl 2>bloom_filter.txt
|
||||
|
||||
# Events piped to another program, bloom filter saved
|
||||
./aggregator -npub npub1... 2>bloom_filter.txt | jq '.content'
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user