I’m a huge fan of using existing standards whenever possible, even in a private project where I’ll have to write a partial implementation myself, to ease future unforeseen integrations. That said, the simplicity and other immediate benefits of S-expressions are too good to pass up.
I have to design a public and a private API, for browsers and future internal micro-services, mostly for code we’ll write ourselves but leaving open the possibility of occasional third-party public API users. JSON will be central to this API, for its native browser support and because of its type-safe server-side options (with Google Protocol Buffers for example).
I have specific objectives to meet with the API request design:
- Express nested horizontal filtering (like GraphQL)
- Lightweight for browsers to create
- Easy for programmers to create from scratch, to transpose into SQL
- Efficient for servers to parse, ready-made parsers available
- Human readable and writable (eliminates binary Protobuf)
- Compatible with a packet stream like WebSockets
I have considered and excluded GraphQL as a thought experiment, because it is not parsed by standard JSON parsers, which gives it equal footing vs other non-JSON possibilities. It is also non-trivial to implement server-side and has its own schemas which would duplicate my existing efforts with Google Protocol Buffers and OCaml/Yojson. If the experiment goes nowhere, I can always fall back to GraphQL as both a query language and data exchange format, but it really seems overkill for what I’m trying to achieve.
Let’s use two examples to explore the possibilities:
1: “Give me the qty, price, product ID and French product description of the first 10 debit-type transactions for which the user is 123, 234 or 345, ordered by descending id and ascending user.”
2: “Give me the id, e-mail address and name of users whose e-mail address contains “@example.com”.”
REST
My first thought was for the REST style: meaningful URLs, no body for read-only requests and JSON results. The two requests, liberally formatted, could look like this:
GET /transaction
?select=qty,price,product(id,desc(fr))
&f[]=user&o[]=in&v[]=[123,234,345]
&f[]=type&o[]=is&v[]=debit
&sort_f[]=id&sort_o[]=desc
&sort_f[]=user&sort_o[]=asc
&limit=10
&page=2
GET /user
?select=id,email,name
&f[]=email&o[]=like&v[]=@example.com
Notice select=product(id,desc(fr))
(inspired by PostgREST) to represent the horizontal filtering.
Pros:
- Lightweight? 178 and 61 characters (compacted), some redundancy but not a ton
- Easy to create? yes, a simple HTML FORM could do it
- Efficient? yes, query strings are common and easy to parse
- Human? tolerable, definitely not great
Cons:
- The structure of the
select
field strays from pure REST - The field
f[x]
, operatoro[]
and valuev[]
parameters are inter-dependent and thus error-prone - Limited to adding filtering criteria, no possibility of an
OR
or nested structures - WebSockets? no, at best HTTP/2 which we currently have no server-side support for
S-Expressions
Next, I experimented with S-Expressions, which could be packed in an HTTP query string or a request body, or even in a string of a greater JSON or Protobuf payload. It started out SQL-like but evolved a little bit. Our requests could look like this:
get transactions
(only qty price (product id (desc fr)))
(if
(user is 123 234 345) and (type is debit)
)
(sort id desc user asc)
(limit 10 page 2)
get users (only id email name) (if email has "@example.com")
It almost reads like plain English!
Pros:
- Lightweight? yes: 145 and 60 characters
- Easy to create? yes, parentheses match the intent (though some made optional for readability)
- Efficient? yes, S-expressions are among the easiest syntaxes to parse
- Human? much easier to read than REST filter arguments, rivals SQL, tiny grammar
- Ready-made parsers abound
- Structure allows for
(if)
to contain arbitrary boolean expressions - WebSockets? yes, it’s not tied to HTTP
Cons:
- DIY grammar, S-expressions are not as well-known as HTTP query strings, SQL or JSON
- Grammar for
only
mixes field names and sub-lists for clarity at the expense of parsing simplicity
Custom JSON
Based on the above, I experimented with making a JSON structure which would provide similar versatility. The intent is to stay as simple as possible and to be expressed in regular JSON, unlike GraphQL, to benefit from existing parsers.
{
"get": "transactions",
"only": ["qty", "price", {"product": ["id", {"desc": ["fr"]}]}],
"if": [
["user", "is", [123, 234, 345]],
"and", ["type", "is", "debit"]
],
"sort": [ "id", "desc", "user", "asc" ],
"limit": 10,
"page": 2
}
{
"get": "users",
"only": ["id", "email", "name"],
"if": [
["email", "like", "@example.com"]
]
}
Pros:
- Efficient? yes, JSON is easy to parse and create
- Easy to create? yes, in most languages
- Ready-made parsers abound
- Structure can be made to allow for arbitrary boolean expressions
- WebSockets? yes, it’s not tied to HTTP
Cons:
- Lightweight? at 197 and 83 characters (compacted), not especially
- Not type-safe (requires branching parsing based on input types)
- Still a DIY grammar, in JSON packaging
- Human? A bit harder to read than S-expressions with its quotes and more error-prone to write with its use of
[]
and{}
Type-safe JSON
The JSON format above is tricky to parse: at several points, you you can have several types and branch interpretation based on that difference (for example, in the possible nesting of only
). Let’s create a small Protobuf 3 IDL to describe something explicit:
message QueryExpr {
repeated QueryExpr and = 1;
repeated QueryExpr or = 2;
repeated string is = 3;
repeated string like = 4;
// Other operators...
}
message FieldSet {
repeated string col = 1;
map<string,FieldSet> obj = 2;
}
message SortOrder {
oneof _SortOrder {
string asc = 1;
string desc = 2;
}
}
message Get {
string from = 1;
FieldSet only = 2;
repeated QueryExpr if = 3;
repeated SortOrder sort = 4;
uint32 limit = 5;
uint32 page = 6;
}
message Query {
oneof Mode {
Get get = 1;
}
}
Query 1, liberally formatted, would look like:
{
"get": {
"from": "transactions",
"only": {
"col": [ "qty", "price" ],
"obj": {
"product": {
"col": [ "id" ],
"obj": { "desc": { "col": [ "fr" ]}}
}
}
},
"if": [
{
"and": [
{ "is": [ "user", "123", "234", "345" ] },
{ "is": [ "type", "debit" ] }
]
}
],
"sort": [ { "desc": "id" }, { "asc": "user" } ],
"limit": 10,
"page": 2
}
}
Unsurprisingly, we’ve grown to 258 characters (compacted) and strayed a bit further away from human readability. It can still represent impossible things like specifying an integer
comparison to a non-integer field. I would seriously consider this if we were using binary Protobuf messages, but for JSON this is tedious to work with and requires much more parsing overhead.
Conclusion
The S-expression experiment turned out to be human-friendly, despite being very efficient to parse and create, while also being the lightest in size. Mature parsers abound and the format is trivial to implement if necessary. Those few extra parentheses are easy to forgive when we consider the rest.
I will use S-expressions to formulate queries, instead of the much more complex Protobuf IDL I had started to design or the simpler one above, or even GraphQL which would be overkill for my simple project where performance and ease of development ultimately trump portability.