When ranges::for_each behaves differently from for
ranges::for_each behaves differently from forThis week I learned an interesting and dismaying fact:
C++11’s range-based for loop and C++20’s ranges::begin/end
use different protocols to find the “beginning” and “end” of a range!
For the range-based for loop ([stmt.ranged]),
the bounds of the loop are determined together, as a pair:
- If
rgis an array, then we usergandrg+Nas our bounds. - Otherwise, if
rg.begin()andrg.end()are both present, then we userg.begin()andrg.end(). - Otherwise, we use
begin(rg)andend(rg), looked up with ADL-only lookup.
For C++20’s ranges::begin and ranges::end, the bounds are determined
by [range.access.begin]
and [range.access.end]
separately, individually, without any cross-consultation:
- If
rgis an array, we usergandrg+Nas our bounds. - If
rg.begin()is well-formed and modelsinput_or_output_iterator, we userg.begin()as our lower bound. - Otherwise, if ADL-only
begin(rg)is well-formed and modelsinput_or_output_iterator, we usebegin(rg)as our lower bound. - Meanwhile, if
rg.end()is well-formed and modelssentinel_for<iterator_t<R>>, we userg.end()as our upper bound. - Otherwise, if ADL-only
end(rg)is well-formed and modelssentinel_for<iterator_t<R>>, we useend(rg)as our upper bound.
“Present” versus “valid”
The first — less important — difference between the two protocols is that
the core-language protocol says “present” where the Ranges library protocol says “well-formed”
(actually, “valid,” but I think those are synonyms in this context). So the core-language
protocol is more conservative in cases where rg.begin() is present but unusable for some reason;
for example, if it’s inaccessible, or deleted, or ambiguous, or returns something that’s not an
iterator (like int). The C++20 ranges::begin will happily ignore that troublesome result and
fall back on begin(rg) in that case.
This Godbolt shows one way this difference could matter:
class Secret {
auto begin() { return std::counted_iterator("Core\n", 5); }
auto end() { return std::default_sentinel; }
friend void friendly(Secret&);
};
auto begin(Secret&) { return std::counted_iterator("Library\n", 8); }
auto end(Secret&) { return std::default_sentinel; }
Inside the friendly function,
Secret rg;
for (char c : rg) { putchar(c); }
std::ranges::for_each(rg, [](char c) { putchar(c); });
have different behaviors: the former accesses the private member functions and prints “Core”;
those members are inaccessible from within for_each, so the latter falls back on the global
functions and prints “Library”.
Outside of friendly — say, in main — the former finds the member functions inaccessible
and gives a hard compiler error; the latter falls back on the global functions and prints “Library”.
Together versus separate
The more important difference between the two protocols is that
the core-language protocol requires that both bounds be advertised by
the same mechanism, which is a proxy for “implemented by the same person.”
If class C has an end method but no begin method, that’s probably a
deliberate (strange) choice by the type-author. The core language doesn’t allow
you to “override” that choice by providing your own free function begin(C&)
at namespace scope, unless you also provide your own free function end to match it.
But the Ranges library does! There’s no cross-talk between ranges::begin and ranges::end;
they’re determined independently.
(Actually, ranges::end does need to recompute decltype(ranges::begin(rg))
in order to check sentinel_for<iterator_t<R>>; but it doesn’t care which variety of begin
was found by ranges::begin, it just cares about the iterator type that resulted.)
This means that ranges::begin could find a member function and ranges::end a free function,
or vice versa.
This is a more exciting difference, because it means you can write a class where both the
core-language for loop and the library facility ranges::for_each are well-formed, but
they find different begins and thus do different things.
This Godbolt shows one way this difference could matter:
struct Evil {
auto begin() { return std::counted_iterator("Library", 7); }
friend auto begin(Evil&) { return std::counted_iterator("Core", 4); }
friend auto end(Evil&) { return std::default_sentinel; }
};
Evil rg;
for (char c : rg) { putchar(c); }
std::ranges::for_each(rg, [](char c) { putchar(c); });
Now, the core-language for loop finds a member rg.begin() but no rg.end(), so it falls
back to the ADL hidden friends begin(rg) and end(rg). But ranges::for_each determines
ranges::begin(rg) and ranges::end(rg) independently: rg.begin() for the former and
end(rg) for the latter. So the core-language loop starts at begin(rg) while the library
for_each starts at rg.begin().
If you’re a working programmer, this is just a weird bit of trivia that should never matter to
you: if it does, you’re surely doing something wrong! But if you’re a library implementor, this is
a real pain, because it means that there are range types (std::ranges::range<Evil> is true!)
where you aren’t allowed to use the core-language for loop to iterate them, because the
core-language for loop might iterate over different elements from those that C++20 Ranges sees.
This matters in places like C++23’s new from_range_t constructors and .insert_range methods,
as seen in the Godbolt above: my understanding is that a
conforming Ranges implementation must print “Library” in all those cases, never print “Core.”
This is awful, for both library vendors and users, because instantiating ranges::for_each is vastly
slower than just using the core-language control-flow construct. (Merely including the headers to get
a working version of for_each is pretty painful!) It would be a better world if library vendors
were somehow permitted to use the core-language for loop in their algorithms — although I have
no great ideas how to get there from here.
Hat tips to Tomasz Kamiński and Tim Song for first alerting me to this quirk; and to
Hewill Kang for explaining the Evil example on StackOverflow
and then immediately filing libc++ bug #119133
to report and discuss all the places libc++ currently prints “Core” when it should print “Library.”
See also:
