Socrates on why there’s no erase_if
for span
erase_if
for span
With apologies to Plato; but not to Socrates, who’s already got one.
A month ago, I was passing through the marketplace
when I happened to meet with Eryximachus. He had an idea about std::erase_if
he wanted to share, and as I had written on the subject
only a few years earlier, we struck up a conversation.
You know, said Eryximachus, I think there should be a version of std::erase_if
that applies to std::span<T>
. It would be easy to implement; look here.
template<class T, class Pred>
requires !is_const<T>
size_t erase_if(span<T>& sp, Pred p) {
auto it = std::remove_if(sp.begin(), sp.end(), p);
size_t count = sp.end() - it;
sp = span<T>(sp.begin(), it);
return count;
}
He continued: This function is clever, if a bit unintuitive at first. You see, when
we erase from a vector
, we actually destroy elements right then and there,
because the vector owns its elements. A span
doesn’t own its elements; it merely
views them. But a span<T>
(for non-const T
) is allowed to modify those elements.
So this version of erase_if
simply shuffles the
filtered elements to the end of the existing range, and then snips off its own tail
so that those elements aren’t viewed by the span
any longer.
Those elements won’t be destroyed until the underlying container goes out of scope,
but they’ll be eliminated from the span, which is what we asked for.
I thought this was an interesting idea, but after some brief reflection I said: No, I don’t think that would be a good addition to the library at all. I think the library’s authors were right not to create such a function.
Why do you say that, Socrates?
I believe it would harm the ability to use span
as a drop-in replacement for
vector&
.
I don’t see how adding a new function can harm any existing ability of the library; can’t it only add functionality? I think you’d better explain what you mean. I know you’ve talked about “drop-in replacements” before, in the context of what you call “parameter-only types.”
That’s right. For example, if I have a function taking a parameter const string& s
,
it’s generally (although not always) possible for me to replace that parameter with
string_view s
. Most of the operations that worked on const string& s
will continue
to work just fine with string_view s
: printing, indexing, getting iterators into
the string, .substr()
, comparison, and so on. Operations that a string_view
can’t provide,
such as .c_str()
, .capacity()
, and +
,
will give me a compiler error.
Right; if my function’s body uses s.c_str()
or s.capacity()
, I won’t be able to change
its parameter const string& s
into string_view s
, because the code would no
longer compile. That’s what you meant when you said “(although not always).”
That’s right. But here’s the important point: Many operations continue
to work just fine with string_view
, and many operations rightly give
compiler errors, but there are very few operations that will continue to compile
yet produce different behavior. Quietly producing different behavior is one of the
worst things that can happen in a refactoring.
You say “very few,” but not “none.” Could you give an example of a string
operation
that quietly produces different behavior with string_view
?
Certainly. Suppose we make a copy of the parameter s
, and then try to modify only the copy:
void f(const std::string& s, int myInt) {
auto t = s;
if (myInt != -1) {
t = "the answer is " + std::to_string(myInt);
}
std::cout << t << '\n';
}
When t
is string
, this works as intended. When t
is string_view
, we end up
binding our string_view
to a temporary object, which dangles immediately, and so
the final line of the function prints garbage (Θειοκεραυνός).
That doesn’t look good, Socrates!
I agree. To make string_view
really fit for its purpose as a “parameter-only type”
and a drop-in replacement for const string&
, the library author should have deleted
its assignment operator. But even so, I really did have to go far out of my way to craft
that example. I couldn’t just assign directly into s
, because in the code we started with,
s
would have been a const string&
, immutable. So I had to make the example fairly
complicated and unnatural. Besides, the ability to assign string_view
objects, and use them as
regular types, is useful in its own way. We’ll go into that some other time.
All right; I’ll hold you to that! But for now I’d like to know how this philosophy
applies to std::span
.
My notion of drop-in replacement applies to span
in pretty much the same way as
it applied to string_view
. In fact, for spans of const types, it’s so similar that
we needn’t go over it again; I’ll just remark that span<const T>
is a drop-in replacement
for const vector<T>&
, and say no more. But there’s a wrinkle with span
that
didn’t occur with string_view
.
I suppose you’re referring to the fact that span<T>
allows you to modify its elements,
while string_view
doesn’t.
That’s right. We’ll say that span<T>
, without the const
, is a drop-in replacement
for a mutable vector<T>&
parameter.
How does that lack of const
change things?
Well, it means we must be much more careful about operations that quietly produce different
behavior! Consider all the things one can do with a string
that one can’t do with a
string_view
: there’s +=
, and resize
, and insert
and push_back
, and erase
and pop_back
,
and clear
, and many other operations.
Yes, but nobody would ever try to do those operations on a const string&
parameter, because
they simply wouldn’t compile.
Right; and they won’t compile if you try to do them with a string_view
either.
But you can do operations like resize
and clear
on a mutable vector<T>&
parameter,
so we must be vigilant in order to preserve our “drop-in replacement”–ness: each operation must
either do the same thing with both vector
and span
, or else it must not compile.
For example:
void g(std::vector<int>& v) {
std::ranges::sort(v);
v.clear();
}
void g(std::span<int> v) {
std::ranges::sort(v); // OK, does the same thing
v.clear(); // OK, does not compile
}
At this, Eryximachus furrowed his brow. Socrates, he said, I’m thinking of your earlier
example. You called it unnatural and contrived because you couldn’t assign straight into
s
. But with a mutable reference parameter vector<T>& v
, you could assign straight
into v
, like this:
void h(std::vector<int>& v) {
std::vector<int> v2 = {1,2,3};
v = v2;
}
This obviously does something different when v
is a span<int>
than when it’s a
vector<int>&
.
That’s true, I said, and again it shows that std::span
doesn’t quite live up to
the Platonic ideal of a parameter-only type. Just as with string_view
, the library authors
decided to make it a regular type, so that you can put spans inside containers and so on;
and in exchange, it loses some of this property we’re trying to give it.
On the other hand, I can’t resist pointing out that we do have some safety here:
v = {1,2,3};
compiles when v
is vector<int>&
(and incidentally also when v
is span<const int>
),
but safely fails to compile when v
is span<int>
. That’s because span<int>
, like
reference_wrapper<int>
, binds only to mutable lvalues; it can’t bind to an rvalue like that.
UPDATE, 2023-12-28: Here Socrates originally used “archaic” double-braces
{{1,2,3}}
for C++23 compatibility; see “std::span
should have a converting constructor…” (2021-10-03). In C++26span<const int>
no longer needs double braces.
To resume — said Eryximachus — we’re talking about the operations provided by vector<int>&
, and
asking for each operation whether it does the same thing on span<int>
, or a different thing.
In the latter case it should refuse to compile.
Right.
But how do you define “to do the same thing”?
Very simply, Eryximachus: We consider the effect on the caller’s vector. Anyone who calls
our original function void h(vector<int>&)
must have an lvalue vector to pass in. h
will
have some effect on that vector: It might sort the vector, or add 2 to every element of the vector,
or increase the length of the vector. When we change the signature of h
to void h(span<int>)
,
we expect that h
will continue to have the same effect — to “do the same thing” — or else
refuse to compile.
Well, said Eryximachus, if the original function h
increased the length of the vector, then
h(span<int>)
must certainly refuse to compile, because there’s no way the span
could increase
the length of the original vector without access to it!
That’s right, I said. So span
must not have a push_back
method. Now what do you think about
pop_back
?
Hm, said Eryximachus, I suppose the same logic applies. If the original function h(vector<int>&)
performed v.pop_back()
to decrease the length of the caller’s actual vector, then h(span<int>)
had better refuse to compile.
And likewise clear
, resize
, insert
, and erase
?
Yes, span
had better not provide any of those.
Now, what about non-member functions? Herb Sutter says a non-member function can be just as much “part of” a class as its member functions are.
Yes, Socrates. For example, swap
, or operators like <<
, can’t be member functions,
but would certainly be considered “part of” a class like vector
or span
. Even operator==
is frequently a hidden friend, not a member.
And what about non-members that aren’t called via ADL, such as std::erase_if
?
Yes, I suppose each overload of std::erase_if
is also “part of” the specific class
it was designed to support.
So we can say that vector
“provides std::erase_if
”?
Yes, Socrates.
Should span
, then, also provide std::erase_if
? Can we ensure that it “does the same thing”
as vector
’s std::erase_if
?
No, Socrates. If the original function h(vector<int>& v)
performed std::erase_if(v, pred)
,
it would decrease the length of the caller’s actual vector. But h(span<int> v)
can’t do that.
So it had better refuse to compile.
So, you no longer think there should be a version of std::erase_if
that applies to std::span<T>
?
Indeed, I’ve modified my views on that subject. But for the sake of argument, please, let’s imagine I knew of someone you had not yet convinced; do you think there’s another argument you could make that would convince even the strongest holdout?
Well, I don’t know about the strongest, I said. But if you’ll indulge me a moment longer, there is one more tack I could try.
By all means.
This holdout I’m trying to convince — do you think he’d
agree that span
is correct not to provide .clear()
? I mean, wouldn’t it be disastrous if
void k(vector<int>& v) { v.clear(); }
int main() { std::vector<int> v = {1,2,3}; k(v); }
were refactored into void k(span<int> v)
, compiled silently, and then cleared only the
span, instead of clearing main
’s actual vector?
I think he’d agree that that would be disastrous, and that span
quite properly
doesn’t provide a clear
method.
Then could he possibly disagree with the same example reformulated to use std::erase_if
,
like this?
bool True() { return true; }
void k(vector<int>& v) { std::erase_if(v, True); }
int main() { std::vector<int> v = {1,2,3}; k(v); }
No, Socrates, I don’t think even my imaginary holdout could possibly disagree with that example. Your logic seems to work on airy folk like him just as thoroughly as it works on flesh-and-blood people like me; it’s really quite marvelous.
Then I will bid you both good day.
Good day, Socrates.