How to erase from an STL container
C++20 introduces new library functions std::erase
and std::erase_if
.
Notice that I said “functions,” not “algorithms”: these are not implemented as
generic function templates, but rather as a closed set of function overloads,
scattered across the STL’s many container-related headers.
There’s a std::erase
for deque
,
and one for forward_list
,
and one for list
, and one for
vector
.
There’s a std::erase_if
for all of the above, and also
for map
and
multimap
,
and for set
and
multiset
,
and for unordered_map
and unordered_multimap
,
and for unordered_set
and unordered_multiset
.
Why are these new C++20 functions implemented as a massive overload set, instead of as a single “generic programming”–style function template?
Well, generic programming works only when the underlying algorithm is the
same for all possible input types. That is, it relies on the programmer to
create consistent public APIs for all their data types; once you have
consistent APIs, it’s easy to write generic algorithms in terms of those
APIs. The STL container types’ APIs are for the most part remarkably consistent,
which is why the STL is able to use so much generic programming;
but erase
and erase_if
are pushing into new territory
where the types’ APIs become noticeably inconsistent.
To erase several items from a vector, you have to do something
fundamentally different from erasing several items from a list or a set.
Erase all instances of 1
from a list<int>
Erasing a single node from a linked list is an O(1) operation that doesn’t involve
moving any data around; we simply repoint some prev/next pointers. Naturally,
we have to have access to the prev/next pointers in order to do this operation,
so this operation must be a member function of the list
container.
The “classic STL” member function std::list::erase(const_iterator pos)
encapsulates this pointer-twiddling operation: you give it an iterator to the
element you want to remove, and it deallocates that node and repoints the prev/next
pointers around it.
Of course, once that node has been deallocated, your original iterator pos
will
point into deallocated memory; so std::list::erase
helpfully returns you an
iterator to the “next” element of the list so that you can pick it up and keep iterating.
Here’s the “classic STL” erase-all-1s-in-the-container loop:
std::list<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
if (*it == 1) {
it = container.erase(it);
} else {
++it;
}
}
You may encounter minor variations on this theme, such as code that does container.erase(it++)
instead of it = container.erase(it)
, but I strongly recommend sticking to this exact formulation.
As a bonus, this exact formulation will work for all kinds of STL containers —
list
, set
, map
, vector
, deque
… The only container for which it won’t compile is forward_list
.
Removing manually from a forward_list
std::forward_list
puts a wacky twist on much of the classic STL API, because it lacks prev pointers.
The equivalent loop for forward_list
would look like this:
std::forward_list<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.before_begin(); it != container.end(); ) {
auto jt = std::next(it);
if (jt != container.end() && *jt == 1) {
container.erase_after(it);
} else {
it = jt;
}
}
But don’t write this awful loop if you can help it, because…
[forward_]list::remove
exists
If you know you’ve got some kind of linked list, but aren’t sure if it’s doubly or singly linked,
then the STL provides another verb for you. The remove
member function — not to be confused with the remove
algorithm! — erases all instances of a value from the list.
There’s also a remove_if
member function.
container.remove(1);
auto isOdd = [](int x) { return x % 2; };
container.remove_if(isOdd);
This is an example of what I meant above about how if your types provide a consistent API, then your
client code can use generic programming. Because both std::list
and std::forward_list
provide the
same remove
API, I can write a single function template that operates correctly on either type.
But that function template would fail to compile if I passed it, say, a std::vector
, because vector
doesn’t support the remove
API.
Erase all instances of 1
from a vector<int>
Consider our first snippet again, but now using vector
.
std::vector<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
if (*it == 1) {
it = container.erase(it);
} else {
++it;
}
}
This code will compile successfully and run correctly! However, for vector
, erasing a single element out
of the middle of the vector is not an O(1) pointer-twiddling operation; it’s an O(n) operation because we
have to shift the whole “tail” of the vector down by one element to keep everything contiguous.
Repeat that O(n) operation for each of the up-to-n 1
s in the container, and we’ve got a quadratic algorithm.
That’s no good! So for vector
specifically (and also for deque
), we want to combine the removal and
the shifting-down of our elements in a single-pass algorithm. “Remove and shift down” is the purpose of
the std::remove
algorithm, and so we invent the “erase-remove idiom,” which also comes in a remove_if
flavor:
std::vector<int> container = {3, 1, 4, 1, 5, 9};
auto it = std::remove(container.begin(), container.end(), 1);
container.erase(it, container.end());
auto isOdd = [](int x) { return x % 2; };
container.erase(
std::remove_if(container.begin(), container.end(), isOdd),
container.end()
);
This is the idiomatic way to erase from a vector
or deque
, because it’s O(n) instead of
O(n2).
The erase-remove idiom also works for list
and forward_list
… but for those containers it’s likely not as
efficient as our “erase
-in-a-loop” algorithm, or as just calling container.remove
. Because it is written
generically in terms of iterators, and involves the container itself only at the end of the whole process,
the erase-remove idiom cannot take advantage of pointer-twiddling tricks; it must do actual move-constructions
of container elements.
The erase-remove idiom (specifically, the std::remove
part) relies on being able to overwrite the elements
of the container. If your container elements are immutable, then you can’t use the erase-remove idiom.
This might happen if you have a container of lock_guard
s; but more realistically, it’ll happen if your
container is a std::set
or std::map
. The elements in a set
are immutable, because a set
is a
binary search tree; if you were able to overwrite
the value of some specific element (say, the leftmost element in the tree),
then you could overwrite it with a key that “belonged” elsewhere (say, all the way to the right of the tree),
breaking the tree invariant.
Unstable removal and the erase-partition idiom
If we must preserve our sequence’s order (say, if we’re keeping it sorted by some interesting property), then we should use the erase-remove idiom. But if the sequence is just a bag of values whose order we don’t care about at all, then we might consider moving single elements from the end of the sequence to fill each new gap as it’s created:
std::vector<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
if (*it == 1) {
*it = std::move(container.back());
container.pop_back();
} else {
++it;
}
}
Replace move-assignment with swap, and squash all of the container-resizing operations to the end of the operation, and you get this classic-looking snippet:
auto isOne = [](int x) { return x == 1; };
container.erase(
std::partition(container.begin(), container.end(), std::not_fn(isOne)),
container.end()
);
This idiom, which we might call the “erase-partition idiom,” is rare in my experience but may be useful in some situations. See also P0041 “Unstable remove algorithms” (September 2015).
Just like the erase-remove idiom, the erase-partition idiom works great for vector
and deque
,
works inefficiently for list
and forward_list
, and fails to compile for associative containers
such as set
whose elements are not mutable.
Erase all “instances” of 1
from a std::set
or std::multiset
Consider our first snippet again, but now using multiset
.
std::multiset<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
if (*it == 1) {
it = container.erase(it);
} else {
++it;
}
}
This code will compile successfully and run correctly! However, for set<int>
, you might realize that this
loop is doing much more work than it needs to. Erasing an element from a binary search tree shouldn’t be
an O(n) pointer-twiddling operation; it should be an O(lg n) pointer-twiddling operation! All we need to
do is locate the element(s) to remove within the tree, and then erase them:
// Beware!
auto [first, last] = container.equal_range(1);
container.erase(first, last);
In fact, the associative and unordered containers provide an even shorter shorthand for this operation:
// Beware!
container.erase(1);
But! When we do this kind of O(lg n) search operation on a binary search tree, by definition we are using the tree’s comparator, which may not match the “ordinary” behavior of our element type. Suppose our set stored books shelved by author’s surname:
struct ByAuthor {
bool operator()(const Book& a, const Book& b) const {
return a.author() < b.author();
}
};
std::multiset<Book, ByAuthor> container = {
Book("Moby-Dick", "Melville"),
Book("Hawaii", "Michener"),
Book("Chesapeake", "Michener"),
Book("Paradise Lost", "Milton"),
};
auto target = Book("Hawaii", "Michener");
container.erase(target); // Beware!
When we tell the container to erase Hawaii, what the container understands us to mean is “Erase every
element which is equivalent to Hawaii under my comparator.” That is, the code above will remove from
container
every book b
for which container.key_comp()(target, b) == false
&& container.key_comp()(b, target) == false
— i.e., every book with author "Michener"
, including both
Hawaii and Chesapeake. Sometimes this is exactly what we want. But it is a significantly different
operation from std::list::remove
or std::forward_list::remove
, and so it makes sense that it gets
a significantly different name (erase
rather than remove
).
When we use the erase
member function of an associative or unordered container, we are telling the
container to use its own notion of “equivalence” to avoid searching whole swaths of the container
for matches. This, incidentally, explains why there is a set::erase(Key)
member function but no
set::erase_if(Predicate)
: The latter has no relation to the container’s own comparator,
and therefore the container itself has nothing to contribute. It’s the same reason there’s a
list::remove(Key)
but no vector::remove(Key)
.
Which brings us to std::erase
and std::erase_if
Suppose your generic code has access to
a container your aunt gave you which you don’t know what it is,
and you just want to remove all the copies of Hawaii from it (but without removing also
the copies of Chesapeake if it turns out to be a multiset<Book, ByAuthor>
). Prior to C++20,
the way to do this would be something like
template<class Container>
void aloha_oe(Container& container)
{
static_assert(std::is_same_v<typename Container::value_type, Book>);
auto hawaii = Book("Hawaii", "Michener");
if constexpr (is_list_or_forward_list<Container>) {
container.remove(hawaii);
} else if constexpr (is_vector_or_deque<Container>) {
container.erase(std::remove(container.begin(), container.end(), hawaii), container.end());
} else if constexpr (is_associative_with_default_comparator<Container>) {
container.erase(hawaii);
} else if constexpr (is_associative<Container>) {
for (auto it = container.begin(); it != container.end(); ) {
if (*it == 1) {
it = container.erase(it);
} else {
++it;
}
}
} else {
static_assert(sizeof(Container) < 0, "I don't know how to erase from this container");
}
}
In C++20, the overload set of std::erase
and std::erase_if
permits us to stop open-coding
the it = container.erase(it)
idiom and to collapse some of those branches together.
template<class Container>
void aloha_oe(Container& container)
{
static_assert(std::same_as<typename Container::value_type, Book>);
auto hawaii = Book("Hawaii", "Michener");
if constexpr (is_associative_with_default_comparator<Container>) {
container.erase(hawaii);
} else if constexpr (is_sequence_or_associative<Container>) {
auto is_hawaii = [&](const Book& b) { return b == hawaii; };
std::erase_if(container, is_hawaii);
} else {
static_assert(sizeof(Container) < 0, "I don't know how to erase from this container");
}
}
Notice that it’s still faster to use container.erase(hawaii)
— with its O(lg n) tree lookup —
if you happen to know that the container’s comparator imposes exactly the same equivalence classes
as the ordinary meaning of ==
does.
Notice also that for the associative containers such as std::set
, there is no free function
std::erase(set, hawaii)
. The rationale for that decision in C++20 was that it would be
utterly confusing if set.erase(hawaii)
did one thing (erase all Michener books)
and erase(set, hawaii)
did something else (erase Hawaii specifically). Of course, for
the sequence containers, std::erase
and std::erase_if
end up having exactly the same
performance characteristics, and so there is no reason for std::erase
to exist for the
sequence containers either. I tentatively recommend pretending that std::erase
(without the _if
suffix) does not exist.
In a perfect world, C++20 would simply have added container::erase_if
member functions to
every STL container type and that’s it. My impression is that the only reason they didn’t do
that is that WG21 has a long-standing grudge against member functions. The 1990s-era STL
uses member function APIs liberally for things like list::remove_if
, but anything that wasn’t
forced through in the STL’s initial rush has been held off for decades.
(C++20 finally added set::contains
,
but still doesn’t have vector::contains
.) The primary rationale given in P1209R0
is even sillier than “grudge against member functions”: it’s “we didn’t want to complicate the
layout of the table with which the [then-current] Standard specifies container members.”
(Non-member APIs don’t have to fit into that table.)
Erasing all instances of 1
from a container of your own design
By now it should be pretty clear that “erasing from a container” is a very idiosyncratic API.
Different STL containers have radically different APIs for erasure,
and even C++20’s std::erase
/std::erase_if
is a thin shim that fails to unify
anything beyond the STL’s closed set of container types and also fails to deliver
O(lg n) performance on the most common associative containers.
So it continues to be important for C++ programmers to know the two classic STL idioms
to erase from containers: the erase-remove idiom (for non-list sequence containers)
and the it = container.erase(it)
idiom (for lists and associative containers).
You’ll need these idioms in C++17 for sure; and you’ll need them in C++20 whenever you deal with
containers other than the STL ones. There is no std::erase_if
for the Boost containers,
for example.
Ah, so should I maybe treat
erase_if
as an ADL customization point?
The papers that proposed std::erase_if
(N4009,
P1209)
don’t say anything about ADL, neither pro nor con. My own take is
“no, you should never treat anything as an ADL customization point if you can help it —
with the grandfathered-in exception of swap
.” If you want your own type to have an erase_if
API, you should make it a proper public member function of the type.
When writing generic code, you should not assume that any other type author
acts differently from how you would act; so, you shouldn’t assume that
any other type author is going to implement a non-member erase_if
.
(But, if you go against my advice and do make ADL calls to erase_if
, at least notice
that you should not use the “std::swap
two-step.”
With swap
, and also with
begin
,
size
,
data
,
and the rest of C++17’s ill-advised customization points, it’s reasonable
to fall back on the generic template in namespace std
when ADL finds nothing better.
For erase_if
, there is no such generic template on which to fall back.)
Verdicts on std::erase_if
std::erase
and std::erase_if
are clearly beneficial if you
are writing generic code to handle an unknown sequence container — i.e., you know it’s
vector
, deque
, or list
, but you don’t know which. In that case, std::erase
will never be slower than the erase-remove idiom, and (for list
specifically)
may be an improvement.
std::erase
does not exist for associative containers, because it would be confusing
if erase(container, hawaii)
and container.erase(hawaii)
had different behaviors.
For the associative containers, you might prefer using std::erase_if
over
open-coding the it = container.erase(it)
idiom. On the other hand, it = container.erase(it)
is generic code that will work for a wider variety of containers, including your own;
std::erase_if
is a convenience function provided only for the STL’s closed collection
of containers.