WG21: Avoid creating ADL situations on functions with semi-common names

This is a sequel and corollary to my previous blog post “Avoid ADL calls to functions with common names” (2018-06-17).

Recall that the problem in that case was some client code that did

template<class T> int size(const T& x) {
    return x.size();
}

...
for (int i=0; i < size(cl.lst); ++i)
...

This worked fine in C++14, but gave compiler errors in C++17 —

./rug.cpp:585:18: error: call to 'size' is ambiguous
  for(int i=0; i<size(cl.lst); i++)
                 ^~~~
/usr/include/c++/v1/iterator:1584:16: note: candidate
      function [with _Cont = std::__1::vector<hr::cell *, std::__1::allocator<hr::cell *> >]
constexpr auto size(const _Cont& __c) -> decltype(__c.size()) { return __c.size(); }
               ^
./hyper.h:400:23: note: candidate function [with T = std::__1::vector<hr::cell *, std::__1::allocator<hr::cell *> >]
template<class T> int size(const T& x) {return x.size(); }
                      ^

— because the C++17 standard library added a new ADL-able function named std::size. This means that any client code using size as the name of a free function will have to be refactored for C++17. In my blog post I wrote:

Unfortunately, C++ doesn’t have a really great solution here. Future standards can add names into namespace std, which can then break your existing code’s unqualified calls by causing unwanted ADL.

The only answer I can really think of is — Don’t do that, then. Just like in my zip example above: in our C++ programs, we need to maintain a good intuitive sense for when some unqualified name (like begin, or size, or zip) is likely to get stepped on in a future standard, and avoid using that name today, even if it has not yet been stepped on. This is arguably a very unfortunate state of affairs… but it’s the state of affairs we’ve got.

Avoid making ADL calls to functions with common names.

The flip side

However, for C++2a, I’d like the C++ Standard Committee to consider that every bargain has two sides. In exchange for client programmers’ avoiding common names such as begin, end, size…, I’d like the Committee to give some consideration to avoiding uncommon names. Or rather, avoiding names that are in that nebulous gray zone between “common and obvious” (say, size) and “rare and unique” (say, do_bind_nth_regex_parameter). For names in this gray area (say, hash_value), the Committee should please try to avoid creating greedy function templates with these names.

In San Diego this coming week, the Committee will be discussing Walter Brown’s proposal P0549R4 “Adjuncts for std::hash,” which is based on work by Lisa Lippincott. P0549 proposes to add a whole zoo of helpers to the already unwieldy customization point that is std::hash. In particular:

template<class T> using hash_for = hash<remove_cvref_t<T>>;
template<class T> using is_hashable = is_enabled_hash<hash_for<T>>;

template<class T> requires is_hashable_v<T>
size_t hash_value(T&& t)
  noexcept(noexcept(hash_for<T>{}(std::forward<T>(t))))
{
    return hash_for<T>{}(std::forward<T>(t));
}

As written, this creates an ADL situation for client code using the function name hash_value. Here’s a piece of code whose behavior silently changes as a result:

#include <string>
#include <type_traits>
#include <utility>

#ifdef P0549
namespace std {
    template<class T>
    size_t hash_value(T&& t)
    {
        hash<remove_cv_t<remove_reference_t<T>>> H;
        return H(std::forward<T>(t));
    }
}
#endif

namespace my {
    using Str = std::string;

    size_t hash_value(const Str& s) {
        return s[0];
    }

    int r() {
        Str s = "hello world";
        return hash_value(s);
    }
}

Without -DP0549, the line return hash_value(s) is a call to my::hash_value(const Str&), which is exactly what the average programmer would expect (since the call is coming from inside namespace my).

With -DP0549, the same line generates a call to std::hash_value<std::string&>(std::string&), which is likely to be very surprising, and also doesn’t do the same thing. (In fact, under -DP0549, this program has the surprising behavior that hash_value(s) != hash_value(std::as_const(s))!)

The problem here is that the proposed std::hash_value function template is a greedy function template. (Prior to C++2a Concepts, I might have called it an unconstrained template; but in C++2a that word has a technical meaning, and technically speaking, the proposed std::hash_value is constrained.) But it’s still extremely promiscuous in what it accepts, and because it accepts by forwarding reference, it’s generally able to form a better match (in the overload-resolution sense) than whatever non-template function the client programmer might have written.

(Incidentally, here is the same example using Walter’s proposed requires-clause, with several typos fixed, compiled using a Concepts-enabled compiler, just to prove that the constrainedness of the function template is not relevant to the issue I’m talking about here.)

A general solution for good libraries

Libraries that are trying to be “good” might consider that our ADL problem here comes from the fact that we put the generic algorithm std::hash_value into namespace std right alongside the concrete container type std::string. Our generic algorithm works on any type, by design; it is not specially related to any particular concrete type. Therefore it has no reason to live in the same namespace as any concrete type.

So for these libraries, we might make a general rule that generic algorithms should live in their own namespace, and types should live in their own separate namespace(s). Then it would be more difficult for ADL to confuse a new generic algorithm taking T and a user-defined function taking a concrete type.

A specific solution for `hash_value`

The right answer for P0549R4 is actually to step back and consider the purpose of the proposed std::hash_value. It’s supposed to be a heterogeneous entry point that performs std::hash<T>()(t) for any T. Where else in the STL do we see that pattern?— Since C++14, we see it in std::less.

template<class = void>
struct less;

template<>
struct less<void> {
    template<class T, class U>
    bool operator()(T&& t, U&& u) const {
        return std::forward<T>(t) < std::forward<U>(u);
    }
};

This means that we can write for example std::less<>(1, 2.0) and get the right answer. (Or write std::less<>(-1, 0u) and get the wrong answer… but let that pass.)

So the obvious API for a heterogeneous std::hash in C++2a would be std::hash<>:

template<class = void>
struct hash;

template<>
struct hash<void> {
    template<class T>
    size_t operator()(T&& t) const {
        return std::hash<std::remove_cvref_t<T>>{}(std::forward<T>(t));
    }
};

(In both of these snippets, the noexcept and requires clauses have been left as an exercise for the reader. They aren’t relevant to the point.)

So we don’t need `hash_value`?

Yeah. Once you’re able to hash any hashable object at all by writing e.g.

template<class T>
size_t hash_my_kv_pair(const std::pair<std::string, T>& kv) {
    return std::hash<>{}(kv.first) + std::hash<>{}(kv.second);
}

you don’t need any function named hash_value at all.

We could technically still provide one, just as a pure convenience function for people who would rather write the six characters _value than the four characters <>{}. But it’s important to make it non-ADL-able, which means making it a callable object (a.k.a. “customization point object” (CPO), even though it’s not actually a customization point).

namespace std {
    constexpr inline auto hash_value = [](auto&& t) {
        return std::hash<>{}(std::forward<T>(t));
    };
}

CPOs, like all non-functions, do not suffer from ADL; so this would be an acceptable definition of std::hash_value if we really wanted it for some reason. But in practice, I wouldn’t expect anyone to be unsatisfied with a heterogeneous std::hash<>.

Posted 2018-11-04

argument-dependent-lookup customization-points pitfalls proposal san-diego-2018 wg21

WG21: Avoid creating ADL situations on functions with semi-common names

The flip side

A general solution for good libraries

A specific solution for hash_value

So we don’t need hash_value?

A specific solution for `hash_value`

So we don’t need `hash_value`?