Typeclasses are Interfaces 2.0: From Java to Rust

You may have heard the term ’typeclass’ thrown around with the increasing popularity of Rust, because Rust has typeclasses (spelled trait). You may have heard it during an explanation of what ‘monads’ are, because monads are important in Haskell, and Haskell has typeclasses (spelled class). You may have been mystified, after looking this term up, and not quite being able to tell the difference between them and interfaces.

This article will attempt to explain them as a series of transformations from a concept you almost certainly do already know, interfaces.

Java will be the base language here. If you do not know Java, interfaces look like this:

interface Runnable {
    void run();
}
class SomeTask implements Runnable {
    public void run() {
        /* do something */
    }
}
<T extends Runnable> void runTask(T t) {
    t.run();
}

Interfaces have a list of method stubs, and types implement interfaces by providing concrete methods with the same signature as the stub.

The goal

In Rust, there is a library called serde, short for ‘serialization/deserialization’. It provides, among other things, Serialize and Deserialize typeclasses (traits) which you can implement for your own types, and which are already implemented for basic types like String and HashSet.

Deserialize defines how to decode the type from a serde deserialization format, agnostic to the format, but take JSON for example. The deserialize function receives a deserializer object, tells the deserializer it is expecting a string, and gives it a visitor; and then the deserializer uses the visitor to visit the next JSON value, which is hopefully a string.

In Rust syntax (but not the real trait; this has some unnecessary Rust particularities removed, namely lifetimes and error handling):

trait Deserialize {
    fn deserialize<D>(deserializer: D) -> Self
       where D: Deserializer;
}

trait Deserializer {
    ...
    fn deserialize_string<V>(self, visitor: V) -> V::Value
       where V: Visitor;
}

trait Visitor {
    type Value;
    ...
    fn visit_string(self, v: String) -> Self::Value;
}
impl Deserialize for String { ... }
impl<T: Deserialize> Deserialize for HashSet<T> { ... }

And every single component of this is impossible to represent in Java’s type system. Deserialization in Java usually requires ‘registries’ of separate deserializing-facilitation objects, where missing the ability to deserialize a particular type is a runtime logic error rather than a compile-time type error.

The gap between the two can be described in terms of five upgrades to interfaces. Each should be easy to understand on its own.

#1: Typeclasses can talk about the type that implements them

A function for deserializing a type returns that type. If you have a deserializing function for HashSet, it should return HashSet. Java does not provide a way to say this in an interface, though.

An isolated example of this is an interface declaring that a type can be deep-copied to a new value of the same type. For example, a mutable string can be copied to a new mutable string, so you can mutate it without affecting the original.

That ‘same type’ part is important for the semantics of the interface. There’s two usual patterns for this. The first is to just declare a general type, then force the user to downcast and promise that it’s the right type at runtime.

interface Copyable {
    Object copy();
}

<T extends Copyable> T makeCopy(T instance) {
    return (T)instance.copy();
}

Those looking for slightly more type safety will turn to what C++ calls the Curiously Recurring Template Pattern. A type parameter is introduced, which is not truly a parameter, but exists only so the implementor can be talked about.

interface Copyable<T extends Copyable<T>> {
    T copy();
}

<T extends Copyable<T>> T makeCopy(T instance) {
    return instance.copy();
}

Both of these are unsatisfactory, especially when we start getting into collections. When your language has typeclasses, the concrete type of the implementor can be talked about. In Java you would phrase it as ’the most specific type of this’, so the natural syntax is This.

interface Copyable {
    This copy();
}

<T extends Copyable> T makeCopy(T instance) {
    return instance.copy(); // requires no casts
}

Here, the requirement that all implementors of the Copyable interface can be copied to a new instance of the same type is exactly encoded, with no loss of type safety or dummy parameters. When Copyable is implemented for Integer, copy’s declared return type is Integer.

Note: In languages like Java with virtual inheritance, this interacts badly with it. If class Foo says new Foo(), that is not a value of type This, because if class Bar extends Foo, then within Bar, This would be Bar. Only final classes can assume they are their own This.

#2: Typeclasses contain function stubs, not method stubs

A function for deserializing a type from a stream cannot be a method for an instance of that type: the whole point is you don’t have an instance yet, and are trying to get one. In Java parlance, it needs to be a ‘static method’.

An isolated example of this is an interface for creating a ‘default’ value of a type. For example, the default value of Integer is zero.

In Java, this is done with the ‘factory pattern’. There is a separate object that exists just to construct values of the target object.

interface DefaultFactory<T> {
    T newDefault();
}

<T, F extends DefaultFactory<T>> T makeDefault(F factory) {
    return factory.newDefault();
}

This requires you to cart a separate object around instead of the static type information being sufficient on its own. It provides type safety, unlike the hacks in #1, but it’s still not good enough. Typeclasses need to be able to have stubs for functions that are not methods, provided entirely by the static type.

interface Default {
    static Self newDefault();
}

<T extends Default> T makeDefault() {
    return T.newDefault();
}

Here no additional method parameter is required in order to access the function. T provides the implementation just the same as it would for methods.

Note: This actually overlaps with existing valid syntax in Java (concretely implemented static interface members), though it can for the most part be a smooth upgrade.

Note: This means that generics must affect compilation (‘reified’ generics). Languages that released without static typing and had it bolted on later generally cannot do this, nor can Java which decided that its own late addition of generics in particular should be runtime-erased.

#3: Typeclasses can be implemented for foreign types

A third-party serialization framework wants its deserialization interface to be implemented for all the types it supports. This includes stdlib types like String and HashSet.

In Java, String implements only the interfaces that the stdlib authors saw fit to implement for it. It isn’t going to implement any more in the future, except via pull request.

Factory-like patterns recur again in a workaround for some parts of this shortcoming. Ordered collections like TreeSet will take an extra parameter in case you’re using them with a class that forgot to implement Comparable.

interface Comparator<T> {
    int compare(T left, T right);
}

<T> void sort(List<T> elements, Comparator<T> orderingFunction) { ... }

This actually does not need replacement. Multiple libraries defining competing Comparable implementations would be bad, and Rust doesn’t let you implement std::cmp::Ord for other people’s types for this reason.

The real illustrating use-case is a function being able to define its own new abstractions. Take for example an HTTP client, and the method that sets the body of the request.

Obviously the body of an HTTP request is bytes, but both of these are unsatisfactory:

void setBody(InputStream bodyStream) { ... }
// or:
void setBody(byte[] body) { ... }

The latter is obvious, you don’t want a 2GiB video buffered into memory. But even the former will result in unnecessary buffering, because the client is dealing with an OutputStream and must copy all the data from one to the other, one buffer-width at a time. Not to mention it doesn’t support range requests.

What you really want is the following interface:

interface HttpBody {
    void writeTo(OutputStream stream, Range requestedRange);
}

But many users do have a byte[] or InputStream, which do not implement HttpBody, and the ideal API accepts all the things users have to provide. So your method will make use of overloading:

// all of these at the same time:
<T extends HttpBody> void setBody(T body) { ... }
void setBody(InputStream bodyStream) { ... }
void setBody(byte[] body) { ... }
void setBody(String body) { ... }

And this works as long as it’s just a method. But once you have to use this set of types in any other context, like a new method addMultipartBody, it gets complicated very fast.

Interfaces are often viewed as a property of an object, but typeclasses are a property of the function generic over them. If you have a function that accepts byte[], String, and InputStream, and treats them all according to a capability expressible by an interface, you do not need overloading to express this. Java would need new syntax to express that you are adding an interface to an existing type:

interface HttpBody {
    void writeTo(OutputStream stream, Range requestedRange);
}
// existing types implement it
adapt byte[] implements HttpBody {
    void writeTo(OutputStream stream, Range requestedRange) {
        ...
    }
}
<T extends HttpBody> void setBody(T body) { ... }
<T extends HttpBody> void addMultipartBody(T body) { ... }

The bound extends HttpBody is now the first and last fact that you need to know about a type to know that it can be used as an HTTP body.

#4: Typeclasses can have type alias stubs

Now it’s time to make a bit of a leap.

First, type aliases as a feature. Nothing complicated, just another name for an existing type. The only important quirk is that it’s different from renaming an import because it can interact with generics.

typealias Fishbowl = Bowl<Fish>;
void addFishToBowl(Fishbowl bowl, Fish fish) { ... }

Anyway, the deserialization function needs to make use of a stateful visitor. In case of a list, for example, it will step through and deserialize each element of the list. This visitor will end up returning the object it constructs, and the deserializer requires that the deserializable object and the visitor’s returned type are the same type.

To extract that down to a simpler form, let’s talk about iterators. Iterator and Iterable are two separate interfaces because iterators contain the iteration state and List, which can be iterated over, should not contain iteration state. So the collection has a method for creating an iterator object, and then the iterator object has a method for stepping to the next element.

Both of these interfaces talk about another type: for the iterable, the iterator; and for the iterator, the element. But this is not semantically a parameter in the way that setBody has a type parameter for the body. It is an output of the implementation rather than an input: the implementation decides what type goes in that slot, not the caller, and only one type can ever be in that slot per implementor.

These two interfaces express this fact two different ways. Iterable does not name the type at all, but just uses the Iterator interface as the type, while Iterator uses a type parameter:

interface Iterable<T> {
    Iterator<T> iterator();
}
interface Iterator<T> {
    T next();
    boolean hasNext();
}
<E, T extends Iterable<E>> List<E> iterableToList(T iterable) { ... }

Each would be unsatisfactory but for some Java-specific quirks. Java does not require that the interface stub signature exactly match the implementation signature: it can return a more specific type. But absent this feature, you would not be able to reference e.g. a particular type’s iterator being Copyable, if it was an erased interface type (you can copy a list iterator, but not an InputStream converted to a byte iterator). Java also does not allow the same class to implement both Foo<A> and Foo<B> (due to generics being erased). But in C#, which does, and is almost exactly like Java in almost every way, IEnumerator<T> does not encode the property that an iterable collection has exactly one kind of element, and so the aforementioned adapt block could not adapt HttpBody to both IEnumerator<byte> and IEnumerator<char>, because the same type can implement both and it wouldn’t be clear which adapt block to go with.

Typeclasses have another kind of stub besides method stubs: type alias stubs. A type alias stub encodes the aforementioned property that there is exactly one type, and you can talk about this type, but the implementor decides what type it actually is.

interface Iterable {
    typealias Iter extends Iterator;
    Iter iterator();
}
interface Iterable {
    typealias Element;
    Element next();
    boolean hasNext();
}
// able to refer to Element directly
<T extends Iterable> List<T.Element> iterableToList(T iterable) { ... }

If you have a type parameter bounded by an interface, and that interface defines type alias stubs, you can use these aliases anywhere types are accepted: return types, other generics, etc.

#5: Typeclass implementations can be conditional

This is the crux. Everything else has just been some convenient features on top of interfaces. We are not yet at interfaces 2.0. To get there, we must be able to say one specific thing:

List<T> implements Deserialize if T implements Deserialize. But List<T> is otherwise perfectly usable without T needing to implement Deserialize.

Any type math can go in this spot. InclusiveRange<T> implements Iterable if T implements Stepwise. List<Character>, but no other List, implements Formattable.

The list of hacks in Java for the inability to express this is pretty large.

  • Object, which every type inherits from, has some random methods like hashCode and toString that not every type can usefully implement, just because List<T> does want to usefully implement them in the case that its element type does.
  • One reason the vision of statically checked exception handling failed is because you can’t split abstractions into fallible and infallible versions usable with the same generic class.
  • The real-world Copyable, Cloneable, is almost useless, so everyone just shares mutable objects around and causes bugs with them and eventually converts to functional programming to get away from mutability entirely.

With this final feature, we can represent serde::Deserialize in Java:

interface Deserializable {
    static <D extends Deserializer> This deserialize(Deserializer d);
}
interface Deserializer {
    ...
    <V extends Visitor> V.Value deserializeString(V visitor);
}
interface Visitor {
    typealias Value;
    ...
    Value visitString(String v);
}
adapt String implements Deserializable { ... }
adapt <T extends Deserializable> HashSet<T> implements Deserializable { ... }

Conclusion

A typeclass represents the practical ability for an API to define the abstract interface it needs from the types it works with, regardless of the shape of the abstraction or the provenance of the types. Many of its features can be imperfectly emulated with reflection-based systems at runtime, but the strong typing provided by typeclasses can eliminate 99% of instanceof checks and 99% of type-based overloading. You can find them in Rust traits, Haskell classes, Swift protocols, Scala 3 givens, OCaml modules, and (if you really squint) C++20 concepts.