C++ doesn’t know how to do customization points that aren’t operators

If you’re a C++ programmer and you want to customize the behavior of your class for a certain operation, you’ll have one of two experiences:

If the behavior you’re trying to customize is spelled as punctuation: Easy peasy. You might even be able to =default it.
If the behavior you’re trying to customize is spelled as an English word: Horrible.

Examples of the first category: Copy construction. Assignment (operator=). Addition/concatenation (operator+). Equality-comparison (operator==). Destruction.

Examples of the second category: Swapping. Hashing. Providing a “default order” for the benefit of std::map, without exposing it to the world via operator<. Three-way comparison. Stringifying. Relocating.

C++ just cannot figure out how to deal with the operations in category 2, except to move them into category 1. Sometimes the efforts to move an operation into category 1 are quite nice; sometimes they’re a mess. Here are some examples:

Three-way comparison

This is the obvious one for C++2a geeks. We’ve known for a long time that three-way comparison is a fundamental operation. I gave a talk on it at CppCon 2014. Stepanov (or somebody) made sure to give string a method a.compare(b).

But vector doesn’t have a three-way compare! Quick quiz: Why not?

Well, vector<T>::compare should return the result of lexicographically comparing the first elements of the two vectors, then the second elements, and so on, until it finds a difference. In C++17 we’d spell this something like

template<class T>
int vector<T>::compare(const vector<T>& rhs) const {
    for (size_t i = 0; i < size(); ++i) {
        if (i >= rhs.size())
            return +1;
        int c = (*this)[i].compare(rhs[i]);
        if (c != 0) return c;
    }
    return (size() < rhs.size()) ? -1 : 0;
}

But this won’t work unless T itself has a compare member function! And most class types don’t have such a method (for historical reasons)… but certainly no primitive type has such a method. If you want this approach to work for int, you’ll have to lift a.compare(b) into a free function and provide overloads for int. Maybe do some ADL. (Or maybe not. I highly recommend my previous blog post on customization point design for functions.)

template<class T>
int vector<T>::compare(const vector<T>& rhs) const {
    for (size_t i = 0; i < size(); ++i) {
        if (i >= rhs.size())
            return +1;
        int c = std::compare((*this)[i], rhs[i]);
        if (c != 0) return c;
    }
    return (size() < rhs.size()) ? -1 : 0;
}

And then notice that if std::compare is implemented as a function, it will interact with ADL, which might break existing code that uses ADL calls to compare(x, y). So std::compare probably ought to be a CPO (customization point object), as explained in the blog post above.

And… just yuck. This is so much code (and design work). So what’s the solution in C++2a? Simply extend the C++ parser to support operator<=>! Now you can write

template<class T>
auto vector<T>::operator<=>(const vector<T>& rhs) const
    -> decltype(front() <=> front())
{
    for (size_t i = 0; i < size(); ++i) {
        if (i >= rhs.size())
            return std::strong_ordering::greater;
        auto c = (*this)[i] <=> rhs[i];
        if (!std::is_eq(c)) return c;
    }
    return size() <=> rhs.size();
}

Except that the actual STL probably can’t write that, because most user types won’t have implemented operator<=>, and we’d probably like it to fall back to operator< in that case… but if it does fall back, then how do we decide which ordering to use for our return type? (UPDATE: We punt that problem onto the library function std::compare_3way, whose answer is that anything with operator< and no operator<=> must be strongly ordered by definition.)

So switching from a named function compare to an infix operator <=> has solved some of our usability problems and design decisions, but not all of them.

Swapping

Swapping is probably the best-known named operation in C++. It uses basically the model that I outlined above for compare: you need to provide a named member function a.swap(b) and then also an ADL free function swap(a, b) that just calls the member function. (If the STL had followed my customization point design for functions, they would have made std::swap a CPO, and made it look first for a.swap(b) before falling back to the ADL free function. Then you’d only ever have to write the member function, not both versions.)

swap is so fundamental, and so kinda messed up in its current named-function state, that there have been proposals to get rid of its name as well. See Walter Brown’s N3746 “Proposing a C++1Y Swap Operator, v2” (August 2013), where the suggested spelling was operator:=:. However, these proposals have not taken flight the way operator<=> has.

Hashing

You might think that given the pattern set by a.compare(b) and a.swap(b), and the copious precedent for a.hash() in third-party libraries, the STL would use a.hash() as the spelling for the hashing operation. Nope! It uses std::hash<T>{}(a).

Hashing is intimately related to equality-comparison: it must preserve the invariant that a == b implies a.hash() == b.hash().

As of C++2a, you’ll be able to =default your equality-comparison by jumping through a few hoops — hoops which might get a bit easier to jump through, in the at-least-two-years between August 2018 and whenever C++2a is released. But it is highly unlikely that you’ll be able to =default your hashing operation! So you’ll be writing something like this:

class Widget {
    int a, b, c;
public:
    auto operator<=>(const Widget&) const = default;
    bool operator<(const Widget&) const = delete;
    bool operator<=(const Widget&) const = delete;
    bool operator>(const Widget&) const = delete;
    bool operator>=(const Widget&) const = delete;

    size_t hash() const {
        return size_t(a)+b+c; // whatever
    }
};

template<>
struct std::hash<Widget> {
    size_t operator()(const Widget& w) const {
        return w.hash();
    }
}

The infelicity here is that if you add a new field d to your class, it will instantly show up in the equality operation operator== with no further code changes… but it will not show up in the hashing operation! You’ll have to remember to go add it in there by hand.

This isn’t a huge breaking problem, though, because even if the hash function forgets to hash in field d, it still preserves the important invariant that a == b implies a.hash() == b.hash(). Our code doesn’t break because of the oversight; it just gains a little bit of technical debt.

Stringifying

Stringifying is a dubious “success story” of operator-ification. Java and JavaScript and so on do stringification like this:

class Widget {
    Gadget a, b;
public:
    std::string toString() const {
        return a.toString() + " " + b.toString();
    }
};

This doesn’t work in C++ for the usual reason: primitive types like int don’t have member functions. So we could lift stringification into a named function… which was finally kinda-sorta done in C++11.

class Widget {
    int a, b;
public:
    std::string toString() const {
        using std::to_string;
        return to_string(a) + " " + to_string(b);
    }
};

std::string to_string(const Widget& w) {
    return w.toString();
}

Except that std::to_string isn’t really advertised as a customization point, and the STL doesn’t bother to provide it for any type other than the primitive numeric types. Heck, we don’t even have to_string(char) or to_string(std::string), let alone to_string(bool)!

So what’s the accepted way to implement to_string, since C++98? Turn it into an operator, of course!

class Widget {
    int a, b;
public:
    // Piece A
    friend std::ostream& operator<<(std::ostream& os, const Widget& w) const {
        return os << a << " " << b;
    }
};

If we trust that everybody in our codebase plays along with iostreams and implements operator<<, then we can implement generic stringification as

// Piece B
namespace my {
    template<class T, class = void> struct has_tostring : std::false_type {};
    template<class T> struct has_tostring<T, decltype(void(std::declval<const T&>().to_string())) : std::true_type {};

    template<class T>
    std::string to_string(const T& t) {
        if constexpr (has_tostring<T>::value) {
            return t.to_string();
        } else {
            std::ostringstream oss;
            oss << t;
            return std::move(oss).str();
        }
    }
} // namespace my

The comments // Piece A and // Piece B refer to my previous blog post on customization points for functions.

Conclusion?

I don’t have a strong conclusion here, except to observe that C++ keeps swinging at the “customized operation” ball and missing, over and over and over. Each customization point we add — swap, hash, to_string?? — ends up with its own idiosyncratic design, and the only ones that really stick comfortably are the ones where we move them from category 2 (named functions) into category 1 (nameless operators). C++’s idea of “fixing” the problems with an operation is invariably to turn it into an operator (see: <<, <=>, :=:).

P1063 extends this thesis to “fix” some of the problems with the Coroutines TS. (To be fair, I like P1063 a lot.)

We propose replacing the [Coroutines TS] co_await keyword with an operator-like token, which we tentatively suggest spelling [<-]…

Posted 2018-08-23

customization-points operator-spaceship rant