Thoughts on direct manipulation
← Back to Kevin's homepagePublished: 2018 MarchNote: I’m sketching out thoughts on direct manipulation and how concepts are represented in computer interfaces. Not yet a coherent arguement — more like rough observations and conjectures.
Physical manipulations must map unambiguously to the underlying conceptual manipulations
Direct manipulation of shapes, in, e.g., Powerpoint or Illustrator, works well because there are no hidden concepts: Shapes just have size and color.
But direct manipulation falls apart for, e.g., a bar in a bar graph, because there are other concepts:
- underlying data, which the bars somehow represent
- a scaling function that maps an aspect of each datum to a bar’s height
- some kind of iteration (with sort order) that positions the bars horizontally, etc.
If you drag the top of a bar upward, how should the underlying concepts change?
- Does that specific datum change? What if the mapping isn’t reversible? (say, bar height = datum.foo + datum.bar, or sin(datum))
- Does the scaling function change? For all bars, or just the one you touched? Linearly?
- Or is the chart a monolithic entity, with all bars, axes, and labels moving or scaling upward?
Some of these issues can be resolved by reifying physically the hidden concepts. E.g., If the chart is a monolithic entity, it can be given visual “drag handles” which, when manipulated, translate/scale/rotate the entire chart.
However, taking this approach to the limit — reifying concepts like iteration, conditionals, and functions — yields a visual programming language. That’s fine, but it’s not what people usually mean when they say they want “direct manipulation”.
Other examples of physical/conceptual mismatch:
- Visual website builders, which are missing fundamental website concepts like CSS, data binding, conditional breakpoint logic, etc. 
- Shared network filesystems mounted as if they were “local” folders, which don’t expose concepts like the presence of other users, an unreliable network, timeouts, retries, etc. 
Partitioning is fundamental
So what do people actually want when they say they want “direct manipulation”? I suspect that really what they want is a representation that matches how they think of the task at hand.
The tricky part is when the “same” task must go through many different representations.
When wearing a data analyst hat, one might want to think of a bar chart in terms of data mapping, scaling functions, etc.
That same person might then forget all that, put on their graphic design hat, and think only of shadows, pixel adjustments, overall proportions, how the colors contrast, etc.
In that light, the common practice of moving work along a pipeline of distinct representations:
- pulling data from a database
- munging that data in code
- mapping the munged data to vector graphics
- rasterizing and touching up manually in Photoshop
makes perfect sense: Each step involves a set of concepts which cannot exist elsewhere.
There’s often a desire to combine steps: “It’s too much manual work to style charts in Photoshop, we should do that directly in code so that when the data updates the final chart automatically updates too!”
This desire is completely understandable, but one should keep in mind that discarding a representation necessarily means throwing away useful concepts.
Some visual concepts might easily exist between both code and Photoshop — it’s easy to specify that the chart should be 600px wide or use Helvetica labels.
But other concepts cannot be understood in code — like whether the title looks better above or to the right of the graphic; whether some whitespace should be 20px or 24px; whether the bar fill should be #000000 or #333333; or if certain text boxes need to aligned visually, rather than according to exact font metrics.
Removing the manual process of Photoshop touch ups essentially removes the ability to even concieve of these manipulations.
It’s worth making a distinction between representations with distinct concepts, and different interfaces to the same underlying concepts.
An example of the latter would be manipulating the position of shapes graphically vs manipulating them by editing coordinates in a JSON file. The interfaces are different, but the underlying conceptual representation is the same: pixel coordinates.