How to erase from an STL container

C++20 introduces new library functions std::erase and std::erase_if. Notice that I said “functions,” not “algorithms”: these are not implemented as generic function templates, but rather as a closed set of function overloads, scattered across the STL’s many container-related headers.

There’s a std::erase for deque, and one for forward_list, and one for list, and one for vector.

There’s a std::erase_if for all of the above, and also for map and multimap, and for set and multiset, and for unordered_map and unordered_multimap, and for unordered_set and unordered_multiset.

Why are these new C++20 functions implemented as a massive overload set, instead of as a single “generic programming”–style function template?

Well, generic programming works only when the underlying algorithm is the same for all possible input types. That is, it relies on the programmer to create consistent public APIs for all their data types; once you have consistent APIs, it’s easy to write generic algorithms in terms of those APIs. The STL container types’ APIs are for the most part remarkably consistent, which is why the STL is able to use so much generic programming; but erase and erase_if are pushing into new territory where the types’ APIs become noticeably inconsistent. To erase several items from a vector, you have to do something fundamentally different from erasing several items from a list or a set.

Erase all instances of 1 from a list<int>

Erasing a single node from a linked list is an O(1) operation that doesn’t involve moving any data around; we simply repoint some prev/next pointers. Naturally, we have to have access to the prev/next pointers in order to do this operation, so this operation must be a member function of the list container.

The “classic STL” member function std::list::erase(const_iterator pos) encapsulates this pointer-twiddling operation: you give it an iterator to the element you want to remove, and it deallocates that node and repoints the prev/next pointers around it.

Of course, once that node has been deallocated, your original iterator pos will point into deallocated memory; so std::list::erase helpfully returns you an iterator to the “next” element of the list so that you can pick it up and keep iterating. Here’s the “classic STL” erase-all-1s-in-the-container loop:

std::list<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
    if (*it == 1) {
        it = container.erase(it);
    } else {
        ++it;
    }
}

You may encounter minor variations on this theme, such as code that does container.erase(it++) instead of it = container.erase(it), but I strongly recommend sticking to this exact formulation. As a bonus, this exact formulation will work for all kinds of STL containers — list, set, map, vector, deque… The only container for which it won’t compile is forward_list.

Removing manually from a forward_list

std::forward_list puts a wacky twist on much of the classic STL API, because it lacks prev pointers. The equivalent loop for forward_list would look like this:

std::forward_list<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.before_begin(); it != container.end(); ) {
    auto jt = std::next(it);
    if (jt != container.end() && *jt == 1) {
        container.erase_after(it);
    } else {
        it = jt;
    }
}

But don’t write this awful loop if you can help it, because…

[forward_]list::remove exists

If you know you’ve got some kind of linked list, but aren’t sure if it’s doubly or singly linked, then the STL provides another verb for you. The remove member function — not to be confused with the remove algorithm! — erases all instances of a value from the list. There’s also a remove_if member function.

container.remove(1);

auto isOdd = [](int x) { return x % 2; };
container.remove_if(isOdd);

This is an example of what I meant above about how if your types provide a consistent API, then your client code can use generic programming. Because both std::list and std::forward_list provide the same remove API, I can write a single function template that operates correctly on either type. But that function template would fail to compile if I passed it, say, a std::vector, because vector doesn’t support the remove API.

Erase all instances of 1 from a vector<int>

Consider our first snippet again, but now using vector.

std::vector<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
    if (*it == 1) {
        it = container.erase(it);
    } else {
        ++it;
    }
}

This code will compile successfully and run correctly! However, for vector, erasing a single element out of the middle of the vector is not an O(1) pointer-twiddling operation; it’s an O(n) operation because we have to shift the whole “tail” of the vector down by one element to keep everything contiguous.

Repeat that O(n) operation for each of the up-to-n 1s in the container, and we’ve got a quadratic algorithm. That’s no good! So for vector specifically (and also for deque), we want to combine the removal and the shifting-down of our elements in a single-pass algorithm. “Remove and shift down” is the purpose of the std::remove algorithm, and so we invent the “erase-remove idiom,” which also comes in a remove_if flavor:

std::vector<int> container = {3, 1, 4, 1, 5, 9};
auto it = std::remove(container.begin(), container.end(), 1);
container.erase(it, container.end());

auto isOdd = [](int x) { return x % 2; };
container.erase(
    std::remove_if(container.begin(), container.end(), isOdd),
    container.end()
);

This is the idiomatic way to erase from a vector or deque, because it’s O(n) instead of O(n2).

The erase-remove idiom also works for list and forward_list… but for those containers it’s likely not as efficient as our “erase-in-a-loop” algorithm, or as just calling container.remove. Because it is written generically in terms of iterators, and involves the container itself only at the end of the whole process, the erase-remove idiom cannot take advantage of pointer-twiddling tricks; it must do actual move-constructions of container elements.

The erase-remove idiom (specifically, the std::remove part) relies on being able to overwrite the elements of the container. If your container elements are immutable, then you can’t use the erase-remove idiom. This might happen if you have a container of lock_guards; but more realistically, it’ll happen if your container is a std::set or std::map. The elements in a set are immutable, because a set is a binary search tree; if you were able to overwrite the value of some specific element (say, the leftmost element in the tree), then you could overwrite it with a key that “belonged” elsewhere (say, all the way to the right of the tree), breaking the tree invariant.

Unstable removal and the erase-partition idiom

If we must preserve our sequence’s order (say, if we’re keeping it sorted by some interesting property), then we should use the erase-remove idiom. But if the sequence is just a bag of values whose order we don’t care about at all, then we might consider moving single elements from the end of the sequence to fill each new gap as it’s created:

std::vector<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
    if (*it == 1) {
        *it = std::move(container.back());
        container.pop_back();
    } else {
        ++it;
    }
}

Replace move-assignment with swap, and squash all of the container-resizing operations to the end of the operation, and you get this classic-looking snippet:

auto isOne = [](int x) { return x == 1; };
container.erase(
    std::partition(container.begin(), container.end(), std::not_fn(isOne)),
    container.end()
);

This idiom, which we might call the “erase-partition idiom,” is rare in my experience but may be useful in some situations. See also P0041 “Unstable remove algorithms” (September 2015).

Just like the erase-remove idiom, the erase-partition idiom works great for vector and deque, works inefficiently for list and forward_list, and fails to compile for associative containers such as set whose elements are not mutable.

Erase all “instances” of 1 from a std::set or std::multiset

Consider our first snippet again, but now using multiset.

std::multiset<int> container = {3, 1, 4, 1, 5, 9};
for (auto it = container.begin(); it != container.end(); ) {
    if (*it == 1) {
        it = container.erase(it);
    } else {
        ++it;
    }
}

This code will compile successfully and run correctly! However, for set<int>, you might realize that this loop is doing much more work than it needs to. Erasing an element from a binary search tree shouldn’t be an O(n) pointer-twiddling operation; it should be an O(lg n) pointer-twiddling operation! All we need to do is locate the element(s) to remove within the tree, and then erase them:

// Beware!
auto [first, last] = container.equal_range(1);
container.erase(first, last);

In fact, the associative and unordered containers provide an even shorter shorthand for this operation:

// Beware!
container.erase(1);

But! When we do this kind of O(lg n) search operation on a binary search tree, by definition we are using the tree’s comparator, which may not match the “ordinary” behavior of our element type. Suppose our set stored books shelved by author’s surname:

struct ByAuthor {
    bool operator()(const Book& a, const Book& b) const {
        return a.author() < b.author();
    }
};

std::multiset<Book, ByAuthor> container = {
    Book("Moby-Dick", "Melville"),
    Book("Hawaii", "Michener"),
    Book("Chesapeake", "Michener"),
    Book("Paradise Lost", "Milton"),
};

auto target = Book("Hawaii", "Michener");
container.erase(target);  // Beware!

When we tell the container to erase Hawaii, what the container understands us to mean is “Erase every element which is equivalent to Hawaii under my comparator.” That is, the code above will remove from container every book b for which container.key_comp()(target, b) == false && container.key_comp()(b, target) == false — i.e., every book with author "Michener", including both Hawaii and Chesapeake. Sometimes this is exactly what we want. But it is a significantly different operation from std::list::remove or std::forward_list::remove, and so it makes sense that it gets a significantly different name (erase rather than remove).

When we use the erase member function of an associative or unordered container, we are telling the container to use its own notion of “equivalence” to avoid searching whole swaths of the container for matches. This, incidentally, explains why there is a set::erase(Key) member function but no set::erase_if(Predicate): The latter has no relation to the container’s own comparator, and therefore the container itself has nothing to contribute. It’s the same reason there’s a list::remove(Key) but no vector::remove(Key).

Which brings us to std::erase and std::erase_if

Suppose your generic code has access to a container your aunt gave you which you don’t know what it is, and you just want to remove all the copies of Hawaii from it (but without removing also the copies of Chesapeake if it turns out to be a multiset<Book, ByAuthor>). Prior to C++20, the way to do this would be something like

template<class Container>
void aloha_oe(Container& container)
{
    static_assert(std::is_same_v<typename Container::value_type, Book>);
    auto hawaii = Book("Hawaii", "Michener");

    if constexpr (is_list_or_forward_list<Container>) {
        container.remove(hawaii);
    } else if constexpr (is_vector_or_deque<Container>) {
        container.erase(std::remove(container.begin(), container.end(), hawaii), container.end());
    } else if constexpr (is_associative_with_default_comparator<Container>) {
        container.erase(hawaii);
    } else if constexpr (is_associative<Container>) {
        for (auto it = container.begin(); it != container.end(); ) {
            if (*it == 1) {
                it = container.erase(it);
            } else {
                ++it;
            }
        }
    } else {
        static_assert(sizeof(Container) < 0, "I don't know how to erase from this container");
    }
}

In C++20, the overload set of std::erase and std::erase_if permits us to stop open-coding the it = container.erase(it) idiom and to collapse some of those branches together.

template<class Container>
void aloha_oe(Container& container)
{
    static_assert(std::same_as<typename Container::value_type, Book>);
    auto hawaii = Book("Hawaii", "Michener");

    if constexpr (is_associative_with_default_comparator<Container>) {
        container.erase(hawaii);
    } else if constexpr (is_sequence_or_associative<Container>) {
        auto is_hawaii = [&](const Book& b) { return b == hawaii; };
        std::erase_if(container, is_hawaii);
    } else {
        static_assert(sizeof(Container) < 0, "I don't know how to erase from this container");
    }
}

Notice that it’s still faster to use container.erase(hawaii) — with its O(lg n) tree lookup — if you happen to know that the container’s comparator imposes exactly the same equivalence classes as the ordinary meaning of == does.

Notice also that for the associative containers such as std::set, there is no free function std::erase(set, hawaii). The rationale for that decision in C++20 was that it would be utterly confusing if set.erase(hawaii) did one thing (erase all Michener books) and erase(set, hawaii) did something else (erase Hawaii specifically). Of course, for the sequence containers, std::erase and std::erase_if end up having exactly the same performance characteristics, and so there is no reason for std::erase to exist for the sequence containers either. I tentatively recommend pretending that std::erase (without the _if suffix) does not exist.

In a perfect world, C++20 would simply have added container::erase_if member functions to every STL container type and that’s it. My impression is that the only reason they didn’t do that is that WG21 has a long-standing grudge against member functions. The 1990s-era STL uses member function APIs liberally for things like list::remove_if, but anything that wasn’t forced through in the STL’s initial rush has been held off for decades. (C++20 finally added set::contains, but still doesn’t have vector::contains.) The primary rationale given in P1209R0 is even sillier than “grudge against member functions”: it’s “we didn’t want to complicate the layout of the table with which the [then-current] Standard specifies container members.” (Non-member APIs don’t have to fit into that table.)

Erasing all instances of 1 from a container of your own design

By now it should be pretty clear that “erasing from a container” is a very idiosyncratic API. Different STL containers have radically different APIs for erasure, and even C++20’s std::erase/std::erase_if is a thin shim that fails to unify anything beyond the STL’s closed set of container types and also fails to deliver O(lg n) performance on the most common associative containers.

So it continues to be important for C++ programmers to know the two classic STL idioms to erase from containers: the erase-remove idiom (for non-list sequence containers) and the it = container.erase(it) idiom (for lists and associative containers). You’ll need these idioms in C++17 for sure; and you’ll need them in C++20 whenever you deal with containers other than the STL ones. There is no std::erase_if for the Boost containers, for example.

Ah, so should I maybe treat erase_if as an ADL customization point?

The papers that proposed std::erase_if (N4009, P1209) don’t say anything about ADL, neither pro nor con. My own take is “no, you should never treat anything as an ADL customization point if you can help it — with the grandfathered-in exception of swap.” If you want your own type to have an erase_if API, you should make it a proper public member function of the type. When writing generic code, you should not assume that any other type author acts differently from how you would act; so, you shouldn’t assume that any other type author is going to implement a non-member erase_if.

(But, if you go against my advice and do make ADL calls to erase_if, at least notice that you should not use the “std::swap two-step.” With swap, and also with begin, size, data, and the rest of C++17’s ill-advised customization points, it’s reasonable to fall back on the generic template in namespace std when ADL finds nothing better. For erase_if, there is no such generic template on which to fall back.)

Verdicts on std::erase_if

std::erase and std::erase_if are clearly beneficial if you are writing generic code to handle an unknown sequence container — i.e., you know it’s vector, deque, or list, but you don’t know which. In that case, std::erase will never be slower than the erase-remove idiom, and (for list specifically) may be an improvement.

std::erase does not exist for associative containers, because it would be confusing if erase(container, hawaii) and container.erase(hawaii) had different behaviors.

For the associative containers, you might prefer using std::erase_if over open-coding the it = container.erase(it) idiom. On the other hand, it = container.erase(it) is generic code that will work for a wider variety of containers, including your own; std::erase_if is a convenience function provided only for the STL’s closed collection of containers.

Posted 2020-07-08