Deliberately Unimplemented | Intricate Software

One of the most fundamentally usable programs I have ever encountered is Cargo, the build system for the Rust programming language. I often turn to it for inspiration on the subject of tool architecture. It is notable not only for what it does, but also for what it doesn’t do. It occupies a particular role in the tool stack and knows it; it leaves certain problems unsolved so as to not become the tool for solving them. It solves others in ways that constrict the user, but which permit other parties to rely on these restrictions for their own relative freedom. The way in which its features are architected reflects two complementary design principles I call tool bounding and tool lensing which I believe are crucial for making well-designed command-line programs.

Note: No prior knowledge whatsoever of Rust or Cargo is required to understand this article.

Tool Bounding

Tool bounding is an evolution of the old-school Unix philosophy, “do one thing and do it well”. A program with a good sense of bounding knows that it is, first and foremost, a tool: it does not have to reach the user’s goal all on its own, but only needs to fill some role in the user’s own goal-seeking process. A screwdriver is used for driving screws. It is not used for hammering nails, but (diverging from Unix philosophy) this is not a statement about how many features it should have; a high tech screwdriver would be all the more useful for a laser sight, and a pressure sensor that tells you you’re likely to strip a Phillips head.

The command cargo build renders source code into machine code at the package level, the goal of any build tool. It has many features that enable this process to be more intricate; an example is a build script. A build script is a member of a package, and receives information about the compilation environment and can generate sources, linkage directives, and compile-time flags. Building the package necessitates first building and running the build script, then interpreting its outputs and adjusting the build process based on these directives.

There is not, however, a post-build script. No user code will be run after the artifact has been emitted, nor even will a user-specified file path for the artifact be accepted. This means that there is no way within Cargo itself to generate an MSI installer with WiX. Nor does there need to be. WiX works the same on a Cargo-generated artifact as an MSBuild-generated one; the user can author their own WiX script as per usual and run the tool normally on it. If this is something the user wants to suffix a cargo build invocation with, there are many dependent task runners to work with, from the lowly PowerShell script all the way up to MSBuild.

Implementing a post-build task could be phrased as work that the Cargo team does not need to do because it has already been done for them. But I feel it is more important to phrase it instead as a role Cargo does not occupy. When using a dependent task runner like MSBuild or Maven, it is common wisdom that you should not mix more than one; that is, if your project is principally built with MSBuild but contains a Java component, the component should not be a Maven project with mvn commands, but rather its sources should be integrated similar to your C++ sources and built with javac commands. Cargo is one of the exceptions to this rule; it is universally treated as the compiler for Rust, and direct interaction with rustc is rare. Its intentional self-limitation enables this: it does not contain the capabilities and therefore responsibilities that should not be mixed, performing no¹ tasks that another system would wish to.

In general, there is usually just one place where a particular problem gets solved. If some particular environment variable must control conditional compilation, there is no question whether it is part of Cargo or not: it is, build scripts are responsible for setting conditional compilation flags. You are discouraged by doing it differently by the fact that setting one externally invalidates the whole build cache instead of just the affected compilation units. Similarly, if some additional file besides the machine code must be generated from your source code, there is no question whether it is part of Cargo or not: it isn’t, Cargo is bounded to not support that. It enables an intricate build process to be maintainable, when there is a single definite place for everything.

Tool Lensing

The real world is messy. Features besides the core task are often unavoidable, or simply too convenient to pass up. When adding features outside the tool’s bounding, lensing ensures they do not get in the way of other parts of the system. Lensing is the tool’s rules for reality that internal categorization schemes operate off of. Tools invent jargon with names like “target”, “step”, “crate”, “shortcode”, “class” - defined by fiat for a purpose defined as a use case or collection thereof, and then given rules and behavioral restrictions so that many different parts of the tool can treat them the same.

Cargo is frequently the project manager in addition to the package builder, with its workspace feature allowing multiple packages to share build configuration and artifacts, and be auto-dispatched to from project-level commands via the -p flag. These are two different roles, but inconvenient to locate in two different tools. So, the question that bounding is all about avoiding comes up: What is a package, and what is a workspace? When does a feature go in one vs the other? If a tool author were to evaluate this on a command-by-command basis, “this makes more sense here”, they would step back to discover a tool with an annoying number of quirks and edge cases. Cargo solves this by defining rules which bucket reality into use cases.

A package is the unit of dependency management. When you depend on something, you are a package, and that something is also a package. Packages have dependencies, and workspaces do not. This is an entirely separate statement from which file the actual dependency declarations appear in. When multiple packages have the same dependency, it is convenient to coordinate the version in the workspace manifest. But because packages are the unit of dependency management, that dependency is then duplicated in the package manifest, with a workspace = true modifier instead of a version. The lensing rule resolves the question of whether dependencies can be created in more than one place: no, there is a single master list of dependencies in a package manifest, though they may appear in more than one place for convenience reasons.

More importantly, this means the system is predictable in certain ways. If the user knows that a package is the unit of dependency management and a workspace is just for putting several in a folder, they know that (say) a package can be moved from one workspace to another with very little fuss, without having to hear from a senior developer the story of how hard or easy it was that one time to know for sure. The user does not have to know how workspace dependencies are implemented to know this: they know it if they know that the rule exists, and that the rule is followed.

Another example, not related to workspaces, is the two forms of test Cargo understands; unit tests are for testing functions internal to the software, integration tests are for testing as the software user experiences the software. For a library, this means that an integration test cannot access private items. Cargo users request this feature every so often, and are rejected according to the rule. If the author wants to expose a testing-only interface, they must do so in a way available to downstream dependents; if testing from the perspective of the user is infeasible without hacks, that sounds like a test failure.

More interesting is what it looks like to integration-test executables. This lensing rule about what integration tests are provides guidance in answering whether a tested executable should be compiled in library form and its public functions (previously untouchable) made available: no. The user experiences an executable as a program, invoked in a textual or graphical shell. Therefore an integration test has no library access to this code at all, and instead receives a path to the executable in an environment variable. This interface would be unintuitive if thinking about integration tests as a way of testing code, because code is clearly based on functions. Thinking of integration tests as tests from the perspective of the end user, however, makes things clearer to reason about: this is what the end user sees, now work backwards from there to the testing system design for it.

Putting the two together

These are philosophical approaches to tool architecture design. They do not answer any particular technical question, nor even what the code should look like on the inside. But it is my experience that having bounding and lensing clear in your head when writing a tool, or clearly laid down in a project document when collaborating with others, is key to making a maintainable and growable tool interface. They answer for you whether a particular feature is appropriate in your tool, and where is most appropriate to fit it if so. This is not a question that you would be unable to answer, but answering it a hundred independent times results in subtly different decision-making each time; and developing a tool according to philosophical rules is much easier than altering it to fit philosophical rules after the fact.

They also instruct users, if they are documented well enough. A user who knows the way build scripts work, searching for a way to set an environment variable for a package’s compilation, will likely search first for a build script directive, rather than trying (and failing) with the conventional function for setting an environment variable. A user who knows the difference Cargo imposes between build information and package information knows whether to first search the documentation for Cargo’s configuration or its manifests, when trying to find a particular setting. Most importantly, a user who understands package separation and the role of workspaces can lay out and design their own code in a way that neatly splits into packages. Comprehension of the rules can substitute for experience.

They even instruct user feature requests and bug reports. The way users treat problems is very different depending on which framing they understand those problems through. Consider a tool which must run a user-defined script located in a configuration file. If this command can be multi-line, users will naturally treat this field as a script, pushing for environment variable substitution and command output substitution and other shell syntax until an explicit shell runner is added. However, if the author imposes restrictions even further than “no shell syntax” such as “only one command”, and then forces the issue by requiring the script to interpret an environment variable for some reason, the most convenient option for the user will be an external shell script, run by filename - and then all future complaints will be in the frame of external shell scripts, and nobody will bother the author about shell syntax in their configuration.

A final note: Neither the rules implied by bounding, nor the rules the author sets for themself in lensing, need be absolute. The real world is messy, and there may be places in which you need to cheat. Knowing that a particular feature breaks the design rules will lead the author to naturally cauterize its interaction with adjacent features, possibly connecting it to where it logically should go instead, and in general design in a circle around it. The rule will improve the project even in the places where it is not precisely followed, by characterizing how other code should reflect the fact that it is not followed.

With the exception of locating foreign libraries, which C-language complex projects often wish to control. These scripts are explicitly marked as such, and have a convenient setting to disable them. ↩︎