When ranges::for_each
behaves differently from for
ranges::for_each
behaves differently from for
This week I learned an interesting and dismaying fact:
C++11’s range-based for
loop and C++20’s ranges::begin
/end
use different protocols to find the “beginning” and “end” of a range!
For the range-based for
loop ([stmt.ranged]),
the bounds of the loop are determined together, as a pair:
- If
rg
is an array, then we userg
andrg+N
as our bounds. - Otherwise, if
rg.begin()
andrg.end()
are both present, then we userg.begin()
andrg.end()
. - Otherwise, we use
begin(rg)
andend(rg)
, looked up with ADL-only lookup.
For C++20’s ranges::begin
and ranges::end
, the bounds are determined
by [range.access.begin]
and [range.access.end]
separately, individually, without any cross-consultation:
- If
rg
is an array, we userg
andrg+N
as our bounds. - If
rg.begin()
is well-formed and modelsinput_or_output_iterator
, we userg.begin()
as our lower bound. - Otherwise, if ADL-only
begin(rg)
is well-formed and modelsinput_or_output_iterator
, we usebegin(rg)
as our lower bound. - Meanwhile, if
rg.end()
is well-formed and modelssentinel_for<iterator_t<R>>
, we userg.end()
as our upper bound. - Otherwise, if ADL-only
end(rg)
is well-formed and modelssentinel_for<iterator_t<R>>
, we useend(rg)
as our upper bound.
“Present” versus “valid”
The first — less important — difference between the two protocols is that
the core-language protocol says “present” where the Ranges library protocol says “well-formed”
(actually, “valid,” but I think those are synonyms in this context). So the core-language
protocol is more conservative in cases where rg.begin()
is present but unusable for some reason;
for example, if it’s inaccessible, or deleted, or ambiguous, or returns something that’s not an
iterator (like int
). The C++20 ranges::begin
will happily ignore that troublesome result and
fall back on begin(rg)
in that case.
This Godbolt shows one way this difference could matter:
class Secret {
auto begin() { return std::counted_iterator("Core\n", 5); }
auto end() { return std::default_sentinel; }
friend void friendly(Secret&);
};
auto begin(Secret&) { return std::counted_iterator("Library\n", 8); }
auto end(Secret&) { return std::default_sentinel; }
Inside the friendly
function,
Secret rg;
for (char c : rg) { putchar(c); }
std::ranges::for_each(rg, [](char c) { putchar(c); });
have different behaviors: the former accesses the private member functions and prints “Core”;
those members are inaccessible from within for_each
, so the latter falls back on the global
functions and prints “Library”.
Outside of friendly
— say, in main
— the former finds the member functions inaccessible
and gives a hard compiler error; the latter falls back on the global functions and prints “Library”.
Together versus separate
The more important difference between the two protocols is that
the core-language protocol requires that both bounds be advertised by
the same mechanism, which is a proxy for “implemented by the same person.”
If class C
has an end
method but no begin
method, that’s probably a
deliberate (strange) choice by the type-author. The core language doesn’t allow
you to “override” that choice by providing your own free function begin(C&)
at namespace scope, unless you also provide your own free function end
to match it.
But the Ranges library does! There’s no cross-talk between ranges::begin
and ranges::end
;
they’re determined independently.
(Actually, ranges::end
does need to recompute decltype(ranges::begin(rg))
in order to check sentinel_for<iterator_t<R>>
; but it doesn’t care which variety of begin
was found by ranges::begin
, it just cares about the iterator type that resulted.)
This means that ranges::begin
could find a member function and ranges::end
a free function,
or vice versa.
This is a more exciting difference, because it means you can write a class where both the
core-language for
loop and the library facility ranges::for_each
are well-formed, but
they find different begin
s and thus do different things.
This Godbolt shows one way this difference could matter:
struct Evil {
auto begin() { return std::counted_iterator("Library", 7); }
friend auto begin(Evil&) { return std::counted_iterator("Core", 4); }
friend auto end(Evil&) { return std::default_sentinel; }
};
Evil rg;
for (char c : rg) { putchar(c); }
std::ranges::for_each(rg, [](char c) { putchar(c); });
Now, the core-language for
loop finds a member rg.begin()
but no rg.end()
, so it falls
back to the ADL hidden friends begin(rg)
and end(rg)
. But ranges::for_each
determines
ranges::begin(rg)
and ranges::end(rg)
independently: rg.begin()
for the former and
end(rg)
for the latter. So the core-language loop starts at begin(rg)
while the library
for_each
starts at rg.begin()
.
If you’re a working programmer, this is just a weird bit of trivia that should never matter to
you: if it does, you’re surely doing something wrong! But if you’re a library implementor, this is
a real pain, because it means that there are range types (std::ranges::range<Evil>
is true
!)
where you aren’t allowed to use the core-language for
loop to iterate them, because the
core-language for
loop might iterate over different elements from those that C++20 Ranges sees.
This matters in places like C++23’s new from_range_t
constructors and .insert_range
methods,
as seen in the Godbolt above: my understanding is that a
conforming Ranges implementation must print “Library” in all those cases, never print “Core.”
This is awful, for both library vendors and users, because instantiating ranges::for_each
is vastly
slower than just using the core-language control-flow construct. (Merely including the headers to get
a working version of for_each
is pretty painful!) It would be a better world if library vendors
were somehow permitted to use the core-language for
loop in their algorithms — although I have
no great ideas how to get there from here.
Hat tips to Tomasz Kamiński and Tim Song for first alerting me to this quirk; and to
Hewill Kang for explaining the Evil
example on StackOverflow
and then immediately filing libc++ bug #119133
to report and discuss all the places libc++ currently prints “Core” when it should print “Library.”
See also: