On relational tables

← Back to Kevin's homepagePublished: 2018 Aug 25

I’ve been exploring Alloy for the past few months and it’s raised the interesting problem of how to best display relational data within tables.

There are plenty of data visualization resources on designing charts (scatter plots, bar charts, line charts, etc.) and tables (summary statistics, pivot tables) of quantitative data, but I haven’t been able to find much about tables of relational, mostly non-quantitative data.

I’m interested in exploring something akin to the grammar of graphics, but for relational data tables.

This should make more sense after an example.

The jug puzzle

How do you measure 4 quarts using just a 5 quart jug and 3 quart jug?

We can find the solution by encoding this problem in Alloy, which will find a solution. The latest version, Alloy 5, will also visualize the solution using tables:

Alloy table output

(I’ve truncated the full output since it doesn’t really matter — it’s easy to download and run Alloy if you want to see the full example or other visualizations of the solution.)

Here we have three tables: One with our two jugs, one with an instance of a Pour operation (from jug0 to jug1), and one with the list of steps required to solve the puzzle (start with the jugs empty, then fill the 5-quart jug, pour as much as you can into the 3-quart jug, etc.).

The tables are shown in a completely normal form: There’s one table for every signature, with rows corresponding to atoms and columns corresponding to fields (if a field has multiple values, the values are stacked; if a value is itself a relation, subcolumns are drawn; the jugAmounts field of State demonstrates both situations).

Together these tables show the full solution, but it’s not particularly easy to read or understand.

(That is, if I were running some kind of puzzle solution factory, I’d want my employees to have clearer instructions.)

I can think of three kinds of transformations to this normalized view that would be helpful for specifying clearer presentational tables.

Specifying joins

The first transformation is joining; combining fields from multiple signatures into a single table.

In the jug example, joins must be performed manually by looking between tables. E.g., the State2 atom in the State table is associated with the pour0 atom, but one must refer to the Pour table to see what jug you’re supposed to pour into what.

Joining would not only allow us to avoid a manual lookup, it would also allow us to hide entirely the pour0 atom. This would definitely be an improvement, since a Pour atom is synthetic. That is, unlike a jug atom (which represents a physical jug), the Pour atoms (and all other atoms derived from op) exist only to encode instructions that the puzzle-solver needs to perform.

Joining these instructions directly onto the State table as new columns would make things clearer:

Joined table values

Then one could look at one table and read out everything they need, row-by-row: Fill jug0, pour from jug0 into jug1, empty jug0, etc.

A grammar would potentially need to handle the following questions around joins:

which columns to join; in what order; renaming
should joined columns show grouping / inheritance somehow (e.g., all these columns are from signatures that derive from the abstract op signature)
joining with multiple atoms (should this be handled in the grammar, or via introduction of synthetic type in underlying Alloy model?)
specify whether joined tuples should be consumed (as in the above example) or remain visible

Render functions

The second kind of transformation to improve presentation is transformation of the instance data itself (the atoms, tuples, and relations).

Rather than an explicit join to introduce new columns, one could specify a render function that maps, e.g., the pour0 atom to the string “Pour from jug0 to jug1”. The resulting “table” would then be a simple list of steps.

This kind of functionality would be particularly useful for rendering out complex presentational markup. E.g., instead of displaying an atom as Person7, the function could examine the appropriate fields (or other relations) to generate markup containing a profile picture, full name, etc.

Open questions:

Some kind of mechanism to specify render functions to a context (like CSS selectors or other aspect-oriented programming approach). I.e., we want the transformation only in the context of the State table’s op column; it shouldn’t transform atoms that appear in the Pour table.
Can render functions operate at higher levels (i.e., taking a relation or set of relations as argument, rather than just atoms)? Is this render function extension mechanism open, in the sense that atom transformation functions are automatically called on the results of relation transformation functions? (Presumably, this would constrain the return types of the higher level to something more “interpretable” than strings/markup.)
Can the underlying Alloy types be used to enforce, e.g., exhaustive pattern matches within render functions?
How much power do the higher-order render functions need to be useful for presentation? Can they filter or synthesize new data, or should such operations be handled within the underlying Alloy model?

Untidying

Alloy’s table visualization is “tidy” in Hadley Wickham’s tidy data sense: columns correspond to fields and rows to atoms and their field values.

However, an “untidy” presentation may be an improvement. Our example problem only has two jugs, and it’d be clearer to follow what’s going on if their amounts were displayed as columns:

Untidy table

Here, we’ve essentially “transposed” State‘s jugAmounts field (a relation of type Jug one -> Int) from nested rows (Alloy’s normalized form) into top-level columns. (See section 3.1 in Hadley’s paper for more on this situation.)

Open questions:

How to specify whether the original field name (jugAmounts) remains as header that spans the new columns, or whether it can be dropped entirely (as it is in the drawing above)
How are missing values indicated, in case a row doesn’t contain the relevant tuple?
What happens when tuples share a first value? I.e., when a single Jug maps to multiple integers? (this is avoided in our example because Jug one -> Int ensures each jug maps to exactly one Int; but in the general case there may be multiple values.)
How does this generalize to tuples of arity greater than 2? Is the first tuple value promoted to a column, and the rest drawn as inlined subcolumns?
If tuples with arity greater than 2 are allowed, should the grammar allow one to specify “inner” tuple values to be lifted into columns? (Alloy tuples have unnamed positions and joins use only the first or last tuple value; but that doesn’t mean our presentational grammar must have the same restrictions.)

Further notes

Software Abstractions, by Daniel Jackson. A great intro book on formal modeling / logic and Alloy.
Tidy Data, by Hadley Wickham. A paper outlining how to get reshape data into a form suitable for analysis. (Thanks to Andy Chu for reminding me to re-read this awesome paper!)
Alloy 5 release
Jug puzzle in Alloy
Sudoku in Alloy

Email me if you have any questions, relevant personal experiences, or want to collaborate.

On relational tables

The jug puzzle

Specifying joins

Render functions

Untidying

Other questions: Essence vs. Accident

Other questions: Ordering

Other questions: Grammar embedding

Further notes