On relational tables← Back to Kevin's homepagePublished: 2018 Aug 25
I’ve been exploring Alloy for the past few months and it’s raised the interesting problem of how to best display relational data within tables.
There are plenty of data visualization resources on designing charts (scatter plots, bar charts, line charts, etc.) and tables (summary statistics, pivot tables) of quantitative data, but I haven’t been able to find much about tables of relational, mostly non-quantitative data.
I’m interested in exploring something akin to the grammar of graphics, but for relational data tables.
This should make more sense after an example.
The jug puzzle
How do you measure 4 quarts using just a 5 quart jug and 3 quart jug?
We can find the solution by encoding this problem in Alloy, which will find a solution. The latest version, Alloy 5, will also visualize the solution using tables:
(I’ve truncated the full output since it doesn’t really matter — it’s easy to download and run Alloy if you want to see the full example or other visualizations of the solution.)
Here we have three tables: One with our two jugs, one with an instance of a
Pour operation (from
jug1), and one with the list of steps required to solve the puzzle (start with the jugs empty, then fill the 5-quart jug, pour as much as you can into the 3-quart jug, etc.).
The tables are shown in a completely normal form: There’s one table for every signature, with rows corresponding to atoms and columns corresponding to fields (if a field has multiple values, the values are stacked; if a value is itself a relation, subcolumns are drawn; the
jugAmounts field of
State demonstrates both situations).
Together these tables show the full solution, but it’s not particularly easy to read or understand.
(That is, if I were running some kind of puzzle solution factory, I’d want my employees to have clearer instructions.)
I can think of three kinds of transformations to this normalized view that would be helpful for specifying clearer presentational tables.
The first transformation is joining; combining fields from multiple signatures into a single table.
In the jug example, joins must be performed manually by looking between tables.
State2 atom in the
State table is associated with the
pour0 atom, but one must refer to the
Pour table to see what jug you’re supposed to pour into what.
Joining would not only allow us to avoid a manual lookup, it would also allow us to hide entirely the
This would definitely be an improvement, since a
Pour atom is synthetic.
That is, unlike a
jug atom (which represents a physical jug), the
Pour atoms (and all other atoms derived from
op) exist only to encode instructions that the puzzle-solver needs to perform.
Joining these instructions directly onto the
State table as new columns would make things clearer:
Then one could look at one table and read out everything they need, row-by-row: Fill
jug0, pour from
A grammar would potentially need to handle the following questions around joins:
- which columns to join; in what order; renaming
- should joined columns show grouping / inheritance somehow (e.g., all these columns are from signatures that derive from the abstract
- joining with multiple atoms (should this be handled in the grammar, or via introduction of synthetic type in underlying Alloy model?)
- specify whether joined tuples should be consumed (as in the above example) or remain visible
The second kind of transformation to improve presentation is transformation of the instance data itself (the atoms, tuples, and relations).
Rather than an explicit join to introduce new columns, one could specify a render function that maps, e.g., the
pour0 atom to the string “Pour from jug0 to jug1”.
The resulting “table” would then be a simple list of steps.
This kind of functionality would be particularly useful for rendering out complex presentational markup.
E.g., instead of displaying an atom as
Person7, the function could examine the appropriate fields (or other relations) to generate markup containing a profile picture, full name, etc.
Some kind of mechanism to specify render functions to a context (like CSS selectors or other aspect-oriented programming approach). I.e., we want the transformation only in the context of the
opcolumn; it shouldn’t transform atoms that appear in the
Can render functions operate at higher levels (i.e., taking a relation or set of relations as argument, rather than just atoms)? Is this render function extension mechanism open, in the sense that atom transformation functions are automatically called on the results of relation transformation functions? (Presumably, this would constrain the return types of the higher level to something more “interpretable” than strings/markup.)
Can the underlying Alloy types be used to enforce, e.g., exhaustive pattern matches within render functions?
How much power do the higher-order render functions need to be useful for presentation? Can they filter or synthesize new data, or should such operations be handled within the underlying Alloy model?
Alloy’s table visualization is “tidy” in Hadley Wickham’s tidy data sense: columns correspond to fields and rows to atoms and their field values.
However, an “untidy” presentation may be an improvement. Our example problem only has two jugs, and it’d be clearer to follow what’s going on if their amounts were displayed as columns:
Here, we’ve essentially “transposed”
jugAmounts field (a relation of type
Jug one -> Int) from nested rows (Alloy’s normalized form) into top-level columns.
(See section 3.1 in Hadley’s paper for more on this situation.)
How to specify whether the original field name (
jugAmounts) remains as header that spans the new columns, or whether it can be dropped entirely (as it is in the drawing above)
How are missing values indicated, in case a row doesn’t contain the relevant tuple?
What happens when tuples share a first value? I.e., when a single Jug maps to multiple integers? (this is avoided in our example because
Jug one -> Intensures each jug maps to exactly one Int; but in the general case there may be multiple values.)
How does this generalize to tuples of arity greater than 2? Is the first tuple value promoted to a column, and the rest drawn as inlined subcolumns?
If tuples with arity greater than 2 are allowed, should the grammar allow one to specify “inner” tuple values to be lifted into columns? (Alloy tuples have unnamed positions and joins use only the first or last tuple value; but that doesn’t mean our presentational grammar must have the same restrictions.)
Other questions: Essence vs. Accident
Alloy finds instances that satisfy the constraints of the provided model, but an individual instance can’t always reflect the full nature of the model.
E.g., this table of the
| Person | Name | Snack | |---------+-------+--------| | Person0 | Kevin | |
Doesn’t indicate the arity of the
Name relation (Can I have multiple names? Must I have at least one name?).
Is the empty
Snack an accident of this instance, or does the model forbid me from having any snacks in any instance? (I hope not.)
Such constraints could be visualized using header icons, cell shading, placeholder values, etc.
However, other constraints may be more difficult to show.
E.g., Alloy subtypes are disjoint, so in the joined table example above all, exactly one set of columns from the
Op-derived signatures (
Fill) may be filled for any given row.
How could this be indicated visually?
Of course, not all model constraints are going to be possible to draw visually (if they could be, you probably would’ve already solved your problem on paper rather than turning to a formal modeling tool + SAT-solvers). Would visualizing simple constraints but omitting the impossible-to-visualize complex constraints cause people to misinterpret the underlying model by conveying a false sense of complete understanding?
Other questions: Ordering
(Arguably this question is a special case of essence/accident, but it’s worth calling out on its own.)
Rendering a collection as a list/table requires ordering the elements. But since Alloy relations are sets, their order is arbitrary.
This will probably only become an issue when you actually do want an ordering between some atoms.
(Like in the jug puzzle, where we want to see the
State atoms in order, otherwise we don’t really have a solution.)
Fields (columns) can be ordered according to source order (i.e., based on where they are defined in the textual Alloy code), though there may be presentational situations where you’d want to specify a different order.
For example, should the grammar be sufficiently expressive to allow one to render a sudoku puzzle as a 9x9 grid? (Of course you can solve sudoku with Alloy! It’s not the most machine-efficient way to do it, but since I don’t have a CS degree or know how to C++ safely, it’s the most efficient way for me to do it.)
Other questions: Grammar embedding
It’s not clear to me whether a presentational grammar should be “embedded” within Alloy, whether it should be implemented as a standalone program that consumes relational data from any source, or some hybrid approach
Embedding in Alloy:
- Likely easier to learn and use for people already familiar with Alloy
- Existing typechecker catches issues related to typo’d signatures, fields, etc.
- Alloy’s existing semantics would make it easy to specify/implement things like partial ordering (show this field before that one)
Standalone program / library:
- Allows for user-defined data transformations (via, uh, programming)
- Allows usage of data structures besides sets/tuples/relations
- Enables complex markup, exporting PDFs/graphics, etc.
- A relational data presentational table generator would be useful independent of Alloy
- Software Abstractions, by Daniel Jackson. A great intro book on formal modeling / logic and Alloy.
- Tidy Data, by Hadley Wickham. A paper outlining how to get reshape data into a form suitable for analysis. (Thanks to Andy Chu for reminding me to re-read this awesome paper!)
- Alloy 5 release
- Jug puzzle in Alloy
- Sudoku in Alloy
Email me if you have any questions, relevant personal experiences, or want to collaborate.