realy-protocol/doc/events_queries.adoc

= REALY protocol event/query specification

JSON is awful, and space inefficient, and complex to parse due to its intolerance of terminal commas and annoying to work with because of its retarded, multi-standards of string escaping.

Line structured documents are much more readily amenable to human reading and editing, and `\n`/`;`/`:` is more efficient than `","` as an item separator. Data structures can be much more simply expressed in a similar way as how they are in programming languages.

It is one of the guiding principles of the Unix philosophy to keep data in plain text, human readable format wherever possible, forcing the interposition of a parser just for humans to read the data adds extra brittleness to a protocol.

REALY protocol format is extremely simple and should be trivial to parse in any programming language with basic string slicing operators.

---

== Base64 Encoding

To save space and eliminate the need for ugly `=` padding characters, we invoke  link:https://datatracker.ietf.org/doc/html/rfc4648#section-3.2[RFC 4648 section 3.2] for the case of using base64 URL encoding without padding because we know the data length. In this case, it is used for IDs and pubkeys (32 bytes payload each, 43 characters base64 raw URL encoded) and signatures (64 bytes payload, 86 characters base64 raw URL encoded) - the further benefit here is the exact same string can be used in HTTP GET parameters `?key=value&...` context. The standard `=` padding would break this usage as well.

For ease of human usage, also, it is recommended when the value is printed in plain text that it be on its own line so triple click catches all of it including the normally word-wise separated `-` hyphen/minus character, as follows:

    CF4I5dXYPZ_lu2pYRjey1QMDmgNJEyT-MM8Vvj6EnZM

For those who can't find a "raw" codec for base64, the 32 byte length has 1`=` pad suffix and the 64 byte length has 2: `==` and this can be trimmed off and added back to conform to this requirement. Due to the fact that potentially there can be hundreds if not thousands of these in event content and tag fields the benefit can be quite great, as well as the benefit of being able to use these codes also in URL parameter values.

== Sockets and HTTP

Only subscriptions require server push messaging pattern, thus all other queries in REALY can be done with simple HTTP POST requests.

A relay should respond to a `subscribe` request by upgrading from http to a websocket.

It is unnecessary messages and work to use websockets for queries that match the HTTP request/response pattern, and by only requiring sockets for APIs that actually need server initiated messaging, the complexity of the relay is greatly reduced.

There can be a separate subscription type also, where there is delivering the IDs only, or forwarding the whole event.

=== HTTP Authentication

For the most part, all queries and submissions must be authenticated in order to enable a REALY relay to allow access.

To enable this, a suffix is added to messages with the following format:

`<message payload>\n` // all messages must be terminated with a newline

`<request URL>\n`

`<unix timestamp in decimal ascii>\n`

`<public key of signer>\n`

`<signature>\n`

For simplicity, the signature is on a separate line, just as it is in the event format, this avoids needing to have a separate codec, and for the same reason the timestamp and public key.

For reasons of security, a relay should not allow a time skew in the timestamp of more than 15 seconds.

The signature is upon the Blake 2b message hash of everything up to the semicolon preceding it, and only relates to the HTTP POST payload, not including the header.

Even subscription messages should be signed the same way, to avoid needing a secondary protocol. "open" relays that have no access control (which is retarded, but just to be complete) must still require this authentication message, but simply the client can use one-shot keys to sign with, as it also serves as a HMAC to validate the consistency of the request data, since it is based on the hash.

== Events

The format of events is as follows - the monospace segments are the exact text, including the necessary linebreak characters, the rest is descriptive.

---

`<type name>\n` // can be anything, hierarchic names like note/html note/md are possible, or type.subtype or whatever

`<pubkey>\n` // encoded in URL-base64 with the padding `=` elided

`<unix second precision timestamp in decimal ascii>\n`

`tags:\n`

`key:value;extra;...\n` // zero or more line separated, fields cannot contain a semicolon, end with newline instead of semicolon, key lowercase alphanumeric, first alpha, no whitespace or symbols, only key and following `:` are mandatory

`\n` // tags end with a double linebreak

`content:\n` // literally this word on one line *directly* after the newline of the previous

`<content>\n` // any number of further line breaks, last line is signature, everything before signature line is part of the canonical hash

-> The canonical form is the above, creating the message hash that is generated with Blake 2b <-

---

`<ed25519 signature encoded in URL-base64>\n` // this field would have two padding chars `==`, these should be elided

---

The binary data - Event Ids, Pubkeys and Signatures are encoded in raw base64 URL encoding (without padding), Signatures are 86 characters long, with the two padding characters elided `==`, Ids and Pubkeys are 43 characters long, with a single padding character elided `=`.

The database stored form of this event should make use of an event ID hash to monotonic serial ID number as the key to associating the filter indexes of an event store.

Event ID hashes will be encoded in URL-base64 where used in tags or mentioned in content with the prefix `e:`. Public keys must be prefixed with `p:` Tag keys should be intelligible words and a specification for their structure should be defined by users of them and shared with other REALY devs.

Indexing tag keys should be done with a truncated Blake2b hash cut at 8 bytes in the event store, keys should be short and thus the chances of collisions are practically zero.

== Publishing

Submitting an event to be stored is the same as a result sent from an Event Id query except with the type of operation inteded: `store\n` to store an event, `replace:<Event Id>\n` to replace an existing event and `relay\n` to not store but send to subscribers with open matching filters. Replace will not be accepted if the message type and pubkey are different to the original that is specified.

The use of specific different types of store requests eliminates the complexity of defining event types as replaceable, by making this intent explicit. A relay can also only allow one kind, such as a pure relay, which only accepts `relay` requests but neither `store` nor `replace`.

An event is then acknowledged to be stored or rejected with a message `ok:<true/false>;<Event Id>;<reason type>:human readable part` where the reason type is one of a set of common types to indicate the reason for the false

Events that are returned have the `<subscription Id>:<Event Id>\n` as the first line, and then the event in the format described above afterwards.

== Queries

There is three types of queries in REALY:

=== Filter

A filter has one or more of the fields listed below, and headed with `filter`:

----
filter:<subscription Id>\n
pubkeys:<one>;<two>;...\n // these match as OR
timestamp:<since>;<until\n // either can be empty but not both, omit line for this, both are inclusive
tags:
<key>:<value>\n // indexes are not required or used for more than the key and value
... // several matches can be present, they will act as OR
----

The result returned from this is a newline separated list of event ID hashes encoded in base64, a following Event Id search is required to retrieve them. This obviates the need for pagination as the 45 bytes per event per result is far less than sending the whole event and the client is then free to paginate how they like without making for an onerous implementation requirement or nebulous result limit specification.

The results must be in reverse chronological order so the client knows it can paginate them from newest to oldest as required by the user interface.

If instead of `filter\n` at the top there is `subscribe:<subscription Id>\n` the relay should return any events it finds the Id for and then subsequently will forward the Event Id of any new matching event that comes in until the client sends a `close:<subscription Id>\n` message.

Once all stored events are returned, the relay will send `end:<subscription Id>\n` to notify the client that here after will only be events that just arrived.

`subscribe_full:<subscription Id>` should be used to request the events be directly delivered instead of just the event IDs associated with the subscription filter.

In the case of events that are published via the `relay` command, it is necessary that therefore there must be one or more "chanserv" style relays also connected to the relay to whom the clients know they can request such events, and a "nickserv" type specialized relay would need to exist also for creating access whitelists - by compiling singular edits to these lists and using a subscription mechanism to notify such clients of the need to update their ACL.

=== Text

A text search is just `search:<subscription Id>:` followed by a series of space separated tokens if the event store has a full text index, terminated with a newline.

=== Event Id

Event requests are as follows:

----
events:<subscription Id>\n
<event ID one>\n
...
----

Unlike in event tags and content, the `e:` prefix is unnecessary. The previous two query types only have lists of events in return, and to fetch the event a client then must send an `events` request.

Normally clients will gather a potentially longer list of events and then send Event Id queries in segments according to the requirements of the user interface.

The results are returned as a series as follows, for each item returned:

----
event:<subscription Id>:<Event Id>\n
<event>\n
...
----