Decide on encoding of circulex messages
Open, To DoPublic


Currently, the encoding is a mixture of artisanal hand-crafted byte-strings and JSON. The former is necessary because it's very hard to define a canonical JSON encoding of an object that you want to hash or sign. And yet, we've still ended up with the situation where a bilateral agreement contains the hashes of invitation messages, which contain JSON, thus forcing instances to keep the exact JSON encodings of objects, rather than simply storing the objects in any convenient form that allows reconstruction of the encoding that was hashed.

To avoid this problem, we could choose a different (easily canonicalizable) encoding and use it for everything, instead of mixing it with bespoke encodings. But which encoding scheme?

Bencode has the property that every object has exactly one valid encoding, so you get canonicity for free. But it encodes integers in decimal, which seems less than ideal (though JSON does the same).

CBOR seems more thoroughly thought-through, and by design is very compact and easy for computers to encode and decode. It lacks canonicity-by-default, but the RFC gives advice on defining a canonical form where necessary, which seems much easier than it would be for JSON. There's even what appears to be a near-standard way of specifying CBOR data structures.

Event Timeline

tim created this task.Dec 20 2018, 5:54 PM
tim triaged this task as To Do priority.