C++ idiom of the day: arrow_proxy
arrow_proxyI’m currently working on a reference implementation of
P0429R6 flat_map.
Here’s a C++ metaprogramming idiom I learned in the process.
Consider a zip_range that refers to two other containers and lets you
iterate them in parallel:
std::vector<int> keys;
std::vector<int> values;
for (auto&& kv : make_zip_range(keys, values)) {
std::cout << kv.first << ": " << kv.second << "\n";
kv.second += 1;
}
Notice that in C++17 we can use CTAD and structured bindings to rewrite this as
for (auto&& [k, v] : zip_range(keys, values)) { std::cout << k << ": " << v << "\n"; }I don’t recommend it, but you can do it.
We can implement most of zip_range easily:
template<class Keys, class Values>
class zip_range {
using KeyIterator = typename Keys::iterator;
using ValueIterator = typename Values::iterator;
Keys *keys_;
Values *values_;
public:
using iterator = zip_iterator<KeyIterator, ValueIterator>;
iterator begin();
iterator end();
// boilerplate and const versions omitted
};
template<class KeyIterator, class ValueIterator>
class zip_iterator {
KeyIterator kit_;
ValueIterator vit_;
public:
using KeyType = typename KeyIterator::value_type;
using ValueType = typename ValueIterator::value_type;
using iterator_category = std::common_type_t<
nonstd::iterator_category_t<KeyIterator>,
nonstd::iterator_category_t<ValueIterator>
>;
using difference_type = ptrdiff_t;
using value_type = std::pair<KeyType, ValueType>;
…But now we hit a stumbling block! What should our reference type be?
This is the type returned by operator*, which is also the type of auto&& kv
in our sample code above. So it needs to be something with .first and .second
members, but somehow kv.second must refer to a particular element in values!
This gives us the right idea:
using reference = std::pair<KeyType&, ValueType&>;
reference operator*() const {
return {*kit_, *vit_};
}
In fact, to deal correctly with vector<bool>, we should use the
member typedefs provided by KeyIterator and ValueIterator:
using KeyReference = typename KeyIterator::reference;
using ValueReference = typename ValueIterator::reference;
using reference = std::pair<KeyReference, ValueReference>;
reference operator*() const {
return {*kit_, *vit_};
}
So far so good.
The real stumbling block: operator->
In the preceding paragraph, we succeeded at making this snippet work:
std::vector<int> keys = ...;
std::vector<int> values = ...;
auto zit = make_zip_range(keys, values).begin();
int& x = (*zit).first;
assert(&x == &keys.front());
Now we hit the real issue, which is what I really want to talk about!
int& x = zit->first;
assert(&x == &keys.front());
Recall the way an overloaded operator-> works in C++. For a standard smart pointer,
it would look like this:
T *operator->() const {
return rawptr_;
}
That is, int& x = zit->first will be evaluated the same as if we’d written
decltype(auto) temp = zit.operator->();
int& x = temp->first;
So we need our operator-> to return a pointer to a type with a first member.
Basically, we need to return a pair<KeyReference, ValueReference>*. Okay, but
where does that pair object live? Who controls its lifetime?
The crazy unsafe way to solve our problem is:
using reference = std::pair<KeyReference, ValueReference>;
using pointer = reference*;
reference operator*() const {
return {*kit_, *vit_};
}
pointer operator->() const {
static reference r;
r = {*kit_, *vit_};
return &r;
}
This is unsafe because if we wrote something like use(zit->first, (zit+1)->first)
we’d have undefined behavior: two modifications of r without an intervening sequence point.
Note that the above snippet actually won’t compile because
referenceis not default-constructible, but we can fix that by usingstd::optional<reference> r.
C++ weirdness to the rescue!
Notice that our expansion of zit->first is expressed
in terms of temp->first. That is, operator-> is in some sense “recursive”: if our
zip_iterator’s overloaded operator-> returns an object with an overloaded operator->,
then the compiler will call that operator->, and so on, all the way down, until
it (hopefully) bottoms out at a call to the built-in -> that works on native pointers.
Therefore a safe, but still crazy, way to solve our problem is:
using reference = std::pair<KeyReference, ValueReference>;
using pointer = std::unique_ptr<reference>;
reference operator*() const {
return {*kit_, *vit_};
}
pointer operator->() const {
return std::make_unique<reference>(*kit_, *vit_);
}
This works
because zit->first evaluates zit.operator->() to produce a unique_ptr temp1,
and then evaluates temp1.operator->() to produce a raw pointer reference *temp2, and finally
evaluates temp2->first to produce a KeyReference. However, if we do it this way,
every single use of zit->first causes a heap allocation! This is certainly not okay.
(Godbolt.)
So what we do is a little trick called an arrow proxy. It looks like this:
using reference = std::pair<KeyReference, ValueReference>;
using pointer = arrow_proxy<reference>;
reference operator*() const {
return {*kit_, *vit_};
}
pointer operator->() const {
return pointer{{*kit_, *vit_}};
}
where arrow_proxy is defined as this seven-line helper struct:
template<class Reference>
struct arrow_proxy {
Reference r;
Reference *operator->() {
return &r;
}
};
This works because zit->first
evaluates zit.operator->() to produce an arrow_proxy<reference> temp1, and then
evaluates temp1.operator->() to produce a raw pointer to the pair stored inside
the arrow_proxy object itself. The arrow_proxy, just like the unique_ptr in
our previous example, lives just long enough to make this code work — it lives
until the end of the full-expression containing the -> operator, and then
is destroyed, taking the pair<KeyReference, ValueReference> with it.
(Godbolt shows perfect codegen.)
Non-const operator->()
UPDATE, 2023-04-30: As of C++23, proxy reference types such as
vector<bool>::reference,pair<K&, V&>, andtuple<Ts&...>are required to support a const-qualifiedoperator=. So this whole section is obsolete; you can mark yourarrow_proxy::operator->() constwithout fear, as long as you use C++23 or later (Godbolt).
Notice that in our finished arrow_proxy, the operator->() is a non-const member function.
This goes against everything we’ve been taught! (See
“const is a contract” (2019-01-03).)
Normally, operator-> represents the “dereferencing” operation, and you don’t
need to modify an iterator in order to dereference it. That’s why
zip_iterator::operator->() const is declared const.
Yet Godbolt shows that if we add const to arrow_proxy::operator->(),
we get weird compiler errors. This is because on all three major library implementations,
vector<bool>::reference::operator=(bool) is a non-const member function! [UPDATE: Until C++20,
that is.]
We could eliminate the compiler error by making
ramutablemember ofarrow_proxy, but in my opinion that cure would be worse than the disease.
My intuition here is that zip_iterator::operator->() const is properly declared const
because it represents the “dereferencing” operation; but arrow_proxy::operator->()
arguably can be declared non-const because it doesn’t “represent” any semantic operation. An
arrow_proxy is not a pointer-like thing that is semantically “dereferenced.”
We needn’t bother trying to make it “const-correct” according to semantic rules,
because it has no semantics; it’s nothing but an invisible implementation detail of zip_iterator.
So there you are. When implementing zip_iterator, you’re going to need an arrow_proxy.
Various implementations of the arrow_proxy idiom — not all 100% conforming to my
recommendations here — can be found in
pybind11,
Paul Fultz’s hmr,
Ryan Haining’s cppitertools,
Boost iterator_facade
(which calls it operator_arrow_dispatch::proxy),
Boost iterator_archetypes,
and
SG14 flat_map.
