WireProto/README.adoc

438 lines
18 KiB
Plaintext
Raw Normal View History

2024-07-07 23:33:21 -04:00
////
WireProto Specification © 2024 by Brent Saner is licensed under Creative Commons Attribution-ShareAlike 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/
////
= WireProto Specification
Brent Saner <bts@square-r00t.net>
Last rendered {localdatetime}
:doctype: book
:docinfo: shared
:data-uri:
:imagesdir: images
:sectlinks:
:sectnums:
:sectnumlevels: 7
:toc: preamble
:toc2: left
:idprefix:
:toclevels: 7
:source-highlighter: rouge
:docinfo: shared
:this_protover: 1
:this_protover_hex: 0x00000001
2024-07-09 18:30:10 -04:00
//:lib_ver: master
//:lib_ver_ref: branch
:lib_ver: 1.0.0
:lib_ver_ref: tag
2024-07-07 23:33:21 -04:00
[id="license"]
== License
++++
include::LICENSE.html[]
++++
In a nutshell, means you can:
* Use it in commercial/proprietary/internal works...
* Expand upon/change the specification...
** (As long as it is released under the same Creative Commons license)
As long as you attribute the original (this document). This can be as simple as something like:
====
Based on WireProto version <protocol version> as found at https://wireproto.io/.
====
More details certainly helps, though; you may want to mention the exact date you "forked" it, etc.
Please see the full text as collapsed above or https://creativecommons.org/licenses/by-sa/4.0/legalcode.en[the online version^] of the license for full legal copy.
NOTE: In the event of the embedded text in this document differing from the online version, the online version is assumed to take precedence as the valid license applicable to this work.
[id="proto"]
== Protocol
2024-07-09 18:30:10 -04:00
The WireProto data packing API is a custom wire protocol//message format designed for incredibly performant, unambiguous, predictable, platform-agnostic, implementation-agnostic communication. It is based heavily on the https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.key[OpenSSH "v1" key format^] https://git.r00t2.io/r00t2/go_sshkeys/src/branch/master/_ref/KEY_GUIDE.html#v1_plain_2[(example/details)] packing method.
2024-07-07 23:33:21 -04:00
It supports arbitrary binary values, which means they can be anything according to the implementation-specific details; a common practice is to encode ("marshal") a Go struct to JSON bytes, and set that as a WireProto field's value.
It supports both static construction/parsing/dissection and stream approaches in a single format, as well as multiple commands per request message/multiple answers per response message.
2024-07-09 18:30:10 -04:00
*All* packed uint32 (_unsigned 32-bit integer_) values are a https://en.wikipedia.org/wiki/Endianness[big-endian^] 4-byte sequence (e.g. `3712599402` == `0xdd49c56a`, or [`0xdd`, `0x49`, `0xc5`, `0x6a`]).
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
This specification's <<proto_ver>> is `{this_protover}` (`{this_protover_hex}`).
For other releases/finalized versions of this specification, see https://git.r00t2.io/r00t2/WireProto/tags[here^].
For in-development versions, drafts, etc. of this specification, see https://git.r00t2.io/r00t2/WireProto/branches[here^].
[id="proto_reqresp"]
=== Requests/Responses
WireProto indicates two types of Messages/communication ends: a _Requester_ (_Requesting End_) and a _Responder_ (_Responding End_).
This terminology is intentionally implementation-agnostic. A _Requester_ is any end of a communication that is *requesting data*, and the _Responder_ is any end of a communication that is *providing that data*. A Responder may not always be present (e.g. in the case of using WireProto for local disk serialization/caching, etc.), and a "client" may be a Requester, Responder, or both -- likewise for a "server".
2024-07-07 23:33:21 -04:00
[id="lib"]
2024-07-09 18:30:10 -04:00
=== Reference Library
The WireProto specification is accompanied by a reference library for Golang, https://git.r00t2.io/r00t2/go_wireproto["WireProto"^] (https://git.r00t2.io/r00t2/wireproto[_source_^]):
2024-07-07 23:33:21 -04:00
++++
<a href="https://pkg.go.dev/go.pkg.dev/r00t2.io/wireproto">
<img src="https://pkg.go.dev/badge/go.pkg.dev/r00t2.io/wireproto.svg"
alt="Go Reference">
</a>
2024-07-09 18:30:10 -04:00
<br />
<br />
2024-07-07 23:33:21 -04:00
++++
2024-07-09 18:30:10 -04:00
Additional reference libraries may be available in the future.
2024-07-07 23:33:21 -04:00
[id="ytho"]
2024-07-09 18:30:10 -04:00
=== Why Yet Another Message Format?
Because existing methods of serializing data in a structured way (e.g. JSON, XML, YAML) are slow/bloaty, inaccurate, and/or inflexible. They struggle with binary or abritrary data (or in e.g. XML's case requiring intermediate conditional encoding/decoding).
2024-07-07 23:33:21 -04:00
If it can be represented as bytes (which all digital data can), WireProto can send and receive it.
Additionally:
2024-07-09 18:30:10 -04:00
* https://protobuf.dev/[*Protobuf*^] has performance issues (yes, really; protobufs have large overhead compared to WireProto) and is restrictive on data types for future-proofing.
2024-07-07 23:33:21 -04:00
* https://go.dev/blog/gob[*Gob*^] is very language-limiting and does not support e.g. nil pointers and cyclical values.
2024-07-09 18:30:10 -04:00
* https://capnproto.org/[Cap'n Proto^] has wide language support and excellent performance but is terribly non-idiomatic, requiring the code to be generated from the schema and not vice versa (which is only ideal if you have only one communication interface and is, in the author's opinion, the entirely incorrect approach).
* https://en.wikipedia.org/wiki/JSON_streaming[JSON streams^] have no delimiters defined which makes it an inconvenience if using a parser that does not know when the message ends/is complete, or if it is expecting a standalone JSON object (e.g. native vanilla Golang JSON parsing).
2024-07-07 23:33:21 -04:00
[TIP]
====
WireProto is only used for binary packing/unpacking; this means it can be used with any e.g. https://pkg.go.dev/net#Conn[`net.Conn`^] (and even has helper functions explicitly to facilitate this), storage on-disk, etc.
2024-07-09 18:30:10 -04:00
As such it is transport/storage-agnostic, and can be used with a https://pkg.go.dev/net#Dial[TCP socket, UDP socket, IPC (InterProcess Communication)/UDS (UNIX Domain Socket) handle,^] https://pkg.go.dev/crypto/tls#Dial[TLS-tunneled TCP socket^], etc.
See the <<lib>> for details.
2024-07-07 23:33:21 -04:00
====
[id="msg"]
== Message Format
[TIP]
====
Throughout this document, you may see references to things like `LF`, `SOH`, and so forth.
These refer to _ASCII control characters_. You will also see many values represented in hex.
2024-07-09 18:30:10 -04:00
You can find more details about this (along with a full ASCII reference) https://square-r00t.net/ascii.html[here^]. Note that the specification fully supports UTF-8 (or any other arbitrary encoding) -- just be sure that your <<alloc_size, size allocators>> are aligned to the *byte count* and not *character count* (as these may not be equal depending on encoding).
2024-07-07 23:33:21 -04:00
====
2024-07-09 18:30:10 -04:00
Each *message* is composed of:
2024-07-07 23:33:21 -04:00
* The <<msg_respstatus>>footnote:responly[Response messages only.]
2024-07-09 18:30:10 -04:00
* A <<cksum, Checksum>>footnote:optreq[Optional for Request.]footnote:reqresp[Required for Response.]
2024-07-07 23:33:21 -04:00
* A <<hdrs_msgstart>>
* A <<proto_ver>>
2024-07-09 18:30:10 -04:00
* A <<hdrs_bodystart>>
2024-07-07 23:33:21 -04:00
* A <<msg_grp>> <<alloc_cnt>>
* A <<msg_grp>> <<alloc_size>>
* One (or more) <<msg_grp>>(s), each of which contain:
** One (or more) <<msg_grp_rec>>(s), each of which contain:
*** One (or more) <<msg_grp_rec_kv, Field/Value pair>>(s), each of which contain:
**** A <<msg_grp_rec_kv_nm>>
**** A <<msg_grp_rec_kv_val>>
2024-07-09 18:30:10 -04:00
*** A <<msg_grp_recresp, copy of the original record>>footnote:responly[]
2024-07-07 23:33:21 -04:00
* A <<hdrs_bodyend>>
* A <<hdrs_msgend>>
[id="msg_respstatus"]
=== Response Status
2024-07-09 18:30:10 -04:00
For response messages, a speciall "summary byte" is prepended; a status indicator.
This allows requesting ends to quickly bail in the case of an error if no further parsing is desired.
2024-07-07 23:33:21 -04:00
The status will be indicated by one of <<hdrs_respstart, two values>>: an ASCII `ACK` (`0x06`) for all requests being returned successfully or an ASCII `NAK` (`0x15`) if one or more errors were encountered across all records.
[id="proto_ver"]
=== Protocol Version
The protocol version is a packed uint32 that denotes which version of this protocol specification is being used.
It is maintained seperately from the *library* version/repo tags.
The current protocol version (as demonstrated in this document) is `{this_protover}` (`{this_protover_hex}`).
2024-07-09 18:30:10 -04:00
NOTE: Version `0` is reserved for current `HEAD` of the `master` branch of this specification and should be considered experimental, not conforming to any specific protocol message format version.
2024-07-07 23:33:21 -04:00
[id="msg_grp"]
=== Record Group
A record group contains multiple related <<msg_grp_rec, Records>>. It is common to only have a single Record Group.
Its structure is:
. <<msg_grp_rec>> <<alloc_cnt>>
. <<msg_grp_rec>> <<alloc_size>>
. One (or more) <<msg_grp_rec, Records>>
[id="msg_grp_rec"]
==== Record
2024-07-09 18:30:10 -04:00
A record contains multiple related <<msg_grp_rec_kv, Field/Value Pairs (FVP)>> and, if a Response Record, a copy of the original reference Request Record it is responding to.
2024-07-07 23:33:21 -04:00
Its structure is:
. <<msg_grp_rec_kv>> <<alloc_cnt>>
. <<msg_grp_rec_kv>> <<alloc_size>>
2024-07-09 18:30:10 -04:00
.. One (or more) <<msg_grp_rec_kv, Field/Value Pairs>>
. <<msg_grp_recresp>> <<alloc_size>>footnote:responly[]
2024-07-07 23:33:21 -04:00
[id="msg_grp_rec_kv"]
===== Field/Value Pair (Key/Value Pair)
A field/value pair (also referred to as a key/value pair) contains a matched <<msg_grp_rec_kv_nm>> and its <<msg_grp_rec_kv_val>>.
Its structure is:
. <<msg_grp_rec_kv_nm>> <<alloc_size>>
. <<msg_grp_rec_kv_val>> <<alloc_size>>
. A single <<msg_grp_rec_kv_nm>>
. A single matching <<msg_grp_rec_kv_val>>
[IMPORTANT]
====
2024-07-09 18:30:10 -04:00
Unlike most/all other <<alloc>> for other sections/levels, the field name and value allocators are consecutive <<alloc_size, Size Allocators>>! This is because there is *only one* field name and value per <<msg_grp_rec>>.
2024-07-07 23:33:21 -04:00
====
[id="msg_grp_rec_kv_nm"]
====== Field Name
2024-07-09 18:30:10 -04:00
The field name is usually from a finite set of allowed names. The <<msg_grp_rec_kv_val>>, while written as bytes, often contains data defined by the field name. (That is, the parsing of <<msg_grp_rec_kv_val>> often depends on its Field Name.) It is recommended that the field name be a UTF-8-compatible string for simplified serializing and https://www.wireshark.org/[on-the-wire debugging^].
While there is no technical requirement that a field name be unique per-<<msg_grp_rec>>, it is generally recommended (unless emulating/encoding arrays of data in separate <<msg_grp_rec_kv, field/value pairs>>).
2024-07-07 23:33:21 -04:00
Its structure is:
2024-07-09 18:30:10 -04:00
. A name/identifier in bytes
2024-07-07 23:33:21 -04:00
[id="msg_grp_rec_kv_val"]
====== Field Value
A field's value is, on the wire, just a series of bytes. The actual content of those bytes, including any structure or encoding, is likely to/probably depends on the paired <<msg_grp_rec_kv_nm>>.
Its structure is:
2024-07-09 18:30:10 -04:00
. A value in bytes
2024-07-07 23:33:21 -04:00
[id="msg_grp_recresp"]
2024-07-09 18:30:10 -04:00
===== Copy of Original Record
This contains a "copy" of the original/request's <<msg_grp_rec>> that this record is in response to. It is only present in Response message and must not be included in Request messages.
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
It is a complete <<msg_grp_rec>> from the request embedded inside the responding Record.
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
For example, if a record contains multiple <<msg_grp_rec_kv, field/value pairs>> specifying a query of some data then the response record will contain a copy of that record's query data.
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
[NOTE]
====
While *not recommended*, it *is* within specification/permissible to "alias" a request record via a session-unique identifier (e.g. https://datatracker.ietf.org/doc/html/rfc4122[UUIDv4^]), *provided* the promise that the requesting end retains an identifiable copy of/can lookup or associate its original record based on that identifying alias.
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
For example, a requesting end may specify _its own_ provided identifier as an <<msg_grp_rec_kv, field/value pair>> (e.g. `identifier:f18231973d08417e877dd1a2f8e8ab74`) along with additional data. The returning Response Record may then include *only* an original/request record with an FVP of `identifier:f18231973d08417e877dd1a2f8e8ab74` along with the requested data.
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
Alternatively for another example, a responding end may return a Response Record with an original/request record of a single FVP such as `ref_id:46823da27f8749df9dee8f0bded8cce9` or the like. The requesting end *must* then be able to retrieve the full copy of the original request record as a standalone Response Record based on that `ref_id`. Responding ends *may* enforce lifetimes for request record lookup in this case but they must be promised.
2024-07-07 23:33:21 -04:00
====
[id="cksum"]
== Checksums
2024-07-09 18:30:10 -04:00
Checksums are optional for the requesting end but the responding end *must* send them. *If present* in the request, the responder *must* validate to ensure the checksum matches the message body (<<hdrs_bodystart>> to <<hdrs_bodyend>>, inclusive). If the checksum does not match, an error *must* be returned.
2024-07-07 23:33:21 -04:00
They are represented as a big-endian-packed uint32.
2024-07-09 18:30:10 -04:00
The checksum must be prefixed with a <<hdrs_cksum>>. If no checksum is provided in a request, this prefix *must not* be included in the sequence.
2024-07-07 23:33:21 -04:00
[TIP]
====
2024-07-09 18:30:10 -04:00
A responder can quickly check if a checksum is present by checking the first byte in requests. If it is <<hdrs_cksum, `CKSUM`>>, a checksum is provided. If it is <<hdrs_msgstart, `MSGSTART`>>, one was *not* provided.
2024-07-07 23:33:21 -04:00
====
2024-07-09 18:30:10 -04:00
The checksum method used is the https://users.ece.cmu.edu/~koopman/crc/crc32.html[IEEE 802.3 CRC-32^], which should be natively available for all/most implementations/languages as it is perhaps the most ubiquitous of CRC-32 variants (e.g. https://docs.python.org/3/library/zlib.html#zlib.crc32[Python^], https://pkg.go.dev/hash/crc32[Golang^], https://github.com/gcc-mirror/gcc/blob/master/libiberty/crc32.c[GNU C/glibc^](?), https://crates.io/keywords/crc32[Rust^], etc.). (Polynomial `0x04c11db7`, reversed polynomial `0xedb88320`.)
If one needs to implement the appropriate CRC32 implementation, there is extensive detail at the https://en.wikipedia.org/wiki/Cyclic_redundancy_check[CRC Wikipedia article^].
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
To confirm the correct CRC32 implementation is being used (as there are *many* "CRC-32" algorithms/methods/functions/libraries), the following validations may be used:
2024-07-07 23:33:21 -04:00
.CRC-32 Validations
[cols="^.^2m,3m,^.^1m,^.^2m,^.^2m",options="header"]
|===
| String ^.^| Bytes | Checksum (integer) | Checksum (bytes, little-endian) | Checksum (bytes, big-endian)
2024-07-09 18:30:10 -04:00
| WireProto | 0x5769726550726f746f | 815806352 | 0x30a03790 | 0x9037a030
2024-07-07 23:33:21 -04:00
| FooBarBazQuux | 0x466f6f42617242617a51757578 | 983022564 | 0xe4bb973a | 0x3a97bbe4
| 0123456789abcdef | 0x30313233343536373839616263646566 | 1757737011 | 0x33f0c468 | 0x68c4f033
|===
[id="hdrs"]
== Headers
Certain sections are wrapped with an identifying header. Those headers are included below for reference.
[id="hdrs_respstart"]
2024-07-09 18:30:10 -04:00
=== `RESPSTART` Indicator
2024-07-07 23:33:21 -04:00
Responses have a <<msg_respstatus>>.footnote:responly[]
It is either an `ACK` (`0x06`) or `NAK` (`0x15`).
[id="hdrs_cksum"]
=== `CKSUM` Header Prefix
2024-07-09 18:30:10 -04:00
A <<cksum, checksum>>, if providedfootnote:optreq[]footnote:reqresp[], will have a prefix header of `ESC` (`0x1b`).
2024-07-07 23:33:21 -04:00
[id="hdrs_msgstart"]
=== `MSGSTART` Header Prefix
2024-07-09 18:30:10 -04:00
The message start header indicates a start of a "message". It is used to delineate operational headers from specification information (e.g. <<proto_ver>>) and data.
2024-07-07 23:33:21 -04:00
It is an `SOH` (`0x01`).
[id="hdrs_bodystart"]
=== `BODYSTART` Header Prefix
2024-07-09 18:30:10 -04:00
The body start header indicates that data/records follow. All bytes between `BODYSTART` and <<hdrs_bodyend, `BODYEND`>> are to be assumed to be directly pertinent to the request/response rather than operational.
2024-07-07 23:33:21 -04:00
It is an `STX` (`0x02`).
[id="hdrs_bodyend"]
=== `BODYEND` Sequence
2024-07-09 18:30:10 -04:00
The body end prefix indicates the end of data/records. All bytes between <<hdrs_bodystart, `BODYSTART`>> and `BODYEND` are to be assumed to be directly pertinent to the request/response rather than operational.
2024-07-07 23:33:21 -04:00
It is an `ETX` (`0x03`).
[id="hdrs_msgend"]
=== `MSGEND` Sequence
2024-07-09 18:30:10 -04:00
The message end prefix indicates that a message in its entirety has ended, and if no further communication is necessary per implementation the connection may be disconnected.
2024-07-07 23:33:21 -04:00
It is an `EOT` (`0x04`).
[id="alloc"]
== Allocators
There are two type of allocators included for each following sequence of bytes: `count allocators` and `size allocators`.
2024-07-09 18:30:10 -04:00
<<alloc_size, Size allocators>> can be used by receiving ends to efficiently pre-allocate buffers and for sending ends to indicate the amount of remaining data expected.
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
They are usually preceded with a <<alloc_cnt, count allocator>> to allow for pre-allocating e.g. slice/array sizes, but not always (e.g. <<msg_grp_rec_kv, field/value pairs>> have two <<alloc_size, size allocators>>).
2024-07-07 23:33:21 -04:00
2024-07-09 18:30:10 -04:00
All allocators are unsigned 32-bit integers, big-endian-packed.
2024-07-07 23:33:21 -04:00
[id="alloc_cnt"]
=== Count Allocator
Count allocators indicate *how many* children objects are contained.
[id="alloc_size"]
=== Size Allocator
2024-07-09 18:30:10 -04:00
Size allocators indicate *how much* (in bytes) all children objects are combined as one block. They include the allocators themselves of child objects, etc. as well.
2024-07-07 23:33:21 -04:00
[id="ref"]
== Reference Model and Examples
2024-07-09 18:30:10 -04:00
For a more visual explanation, given the following e.g. Golang structs from the <<lib>> (`wireproto.Request{}` and `wireproto.Response{}`):
2024-07-07 23:33:21 -04:00
[id="ref_single"]
=== Single/Simple
[id="ref_single_req"]
==== Single/Simple Request
[%collapsible]
.Example Message Structure (Simple Request)
====
[source,go]
----
include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_simple_req.go[]
----
====
Would then serialize as (in hex):
[%collapsible]
.Annotated Hex
====
[source,text]
----
include::docs/data/request.simple.txt[]
----
====
Or, non-annotated:
[source,text]
----
include::docs/data/request.simple.hex[]
----
[id="ref_single_resp"]
==== Single/Simple Response
[%collapsible]
.Example Message Structure (Simple Response)
====
[source,go]
----
include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_simple_resp.go[]
----
====
Would then serialize as (in hex):
[%collapsible]
.Annotated Hex
====
[source,text]
----
include::docs/data/response.simple.txt[]
----
====
Or, non-annotated:
[source,text]
----
include::docs/data/response.simple.hex[]
----
[id="ref_multi"]
=== Multiple/Many/Complex
2024-07-09 18:30:10 -04:00
Multiple records, record groups, etc. can be specified in one message.
2024-07-07 23:33:21 -04:00
[id="ref_multi_req"]
==== Complex Request
[%collapsible]
.Example Message Structure (Multiple/Many Requests, Single Message)
====
[source,go]
----
include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_multi_req.go[]
----
====
Would then serialize as (in hex):
[%collapsible]
.Annotated Hex
====
[source,text]
----
include::docs/data/request.multi.txt[]
----
====
Or, non-annotated:
[source,text]
----
include::docs/data/request.multi.hex[]
----
[id="ref_multi_resp"]
==== Complex Response
[%collapsible]
.Example Message Structure (Response to Multiple/Many Requests, Single Message)
====
[source,go]
----
include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_multi_resp.go[]
----
====
Would then serialize as (in hex):
[%collapsible]
.Annotated Hex
====
[source,text]
----
include::docs/data/response.multi.txt[]
----
====
Or, non-annotated:
[source,text]
----
include::docs/data/response.multi.hex[]
----