Fetishizing class invariants

What should be the behavior of the following code?

bool danger = true;
struct RexDangerLess {
    bool operator()(int a, int b) const {
        if (danger && (a == 1 || b == 1)) throw "oops";
        return (a > b);
    }
};

int main() {
    std::priority_queue<int, std::vector<int>, RexDangerLess> pq;
    for (int x : {2,3,4,5,6,7,1,3,4,5,6,7}) {
        try { pq.push(x); } catch (...) {}
    }
    danger = false;
    while (!pq.empty()) {
        printf("%d ", pq.top());
        pq.pop();
    }
}

If you’re familiar with priority_queue, you’ll know that it keeps its elements in max-heap order. RexDangerLess behaves mostly like std::greater<int>, so we’d naturally expect to see the priority queue’s elements print out in sorted order:

1 2 3 3 4 4 5 5 6 6 7 7

But what we actually see is

2 3 3 4 4 1 5 5 6 6 7 7

Or consider this code:

struct DJDangerFunc : std::function<bool(int, int)> {
    using std::function<bool(int, int)>::function;
    DJDangerFunc(const DJDangerFunc&) = default;
    DJDangerFunc& operator=(const DJDangerFunc&) { throw "oops"; }
};

int main() {
    std::priority_queue<int, std::vector<int>, DJDangerFunc> pq(std::less<>{});
    std::priority_queue<int, std::vector<int>, DJDangerFunc> pq2(std::greater<>{});
    for (int x : {1,2,3,4,5}) {
        pq2.push(x);
    }
    try { pq = pq2; } catch (...) {}
    pq.push(6);
    while (!pq.empty()) {
        printf("%d ", pq.top());
        pq.pop();
    }
}

On libstdc++, this prints

6 2 5 4 3 1

On libc++, it prints

6 3 5 4 2 1

What happened? Well, when certain user-supplied components of priority_queue throw exceptions, priority_queue rightly abandons its class invariant. It is not one hundred percent true that a priority_queue is always kept in max-heap order. It preserves max-heap order only when its user-supplied pieces are well enough behaved.


This week in Kona, LWG spent a fair bit of time discussing P0429R6 “A Standard flat_map, and kind of went down a rabbit-hole on flat_map’s class invariants. See, flat_map has a lot of invariants. It stores two vectors (one of keys and one of values), and a user-supplied comparator similar to priority_queue’s. Its class invariants include:

  • The two vectors are always of the same size.

  • The “keys” vector is always sorted in terms of the comparator.

  • Each key’s position always matches the position of its corresponding value.

How do we perform an insert(k, v) operation on a flat_map? Well, if we start by emplacing the key at the end of the keys vector, then we temporarily break the first two invariants — so, if emplacing at the end of the values vector throws an exception, we must do extra work to restore those invariants. On the other hand, if we start by emplacing the key at its proper place in the keys vector (so as to avoid breaking the second invariant), then we temporarily break the first and third invariants, and again must do extra work to restore them if an exception is thrown.

Worse, if we emplace at the end and then sort the vector to get it back into sorted order, then a throwing comparator such as RexDangerLess can really mess us up! If the comparator is allowed to throw, then reliably sorting the vector becomes impossible — if an exception is thrown from sort, then we know we’ve broken the second invariant, and likely the third as well.

Similar problems crop up in flat_map::swap, move-assignment, copy-assignment (as we saw with priority_queue in the DJDangerFunc example), and even more places. The insert(initializer_list) method is a particularly difficult case to make “exception-safe,” if you care about preserving flat_map’s class invariants. And that’s even after LWG decided to require is_nothrow_swappable_v<KeyContainer> and is_nothrow_swappable_v<MappedContainer> (which I think is maybe a bit burdensome on the programmer-in-the-street who rarely uses the noexcept specifier in practice, even if swap is the number two place you’d want to use it). There was even discussion in LWG of the idea that if an exception is thrown at a sufficiently inopportune time, the flat_map should clear() itself — dropping all your data on the floor — in order to restore its class invariants!

But should the implementor of flat_map care about preserving these invariants? I say no. We’ve got precedent (in the form of priority_queue) for a standard container adaptor that lives up to its promise of being a thin wrapper around a simple algorithm. Users who supply throwing operations to their flat_map should expect to get broken just as badly as they’re broken today with priority_queue. “Play stupid games, win stupid prizes,” as they say.


How do we codify this design principle?

I would say that if any container adaptor encounters an exception from a user-provided operation, then the container adaptor should promise nothing more but to propagate that exception and enter a “mostly invalid” state. An adaptor in a “mostly invalid” state should support the following operations: destroy, assign-into, clear(), and (in flat_map’s case) replace() and extract(). Nothing else should be guaranteed — not even size().

In flat_map’s case, size() returns the size of the keys vector which is invariably the size of the values vector as well. But in the “mostly invalid” state, that invariant could have been broken, and so whatever number we return even for size() can’t be trusted.

Inserting or finding in a flat_map delegates to std::lower_bound (or in practice, std::partition_point). If the keys vector has become unsorted, then those functions will have undefined behavior. So inserting in a “mostly invalid” flat_map should be just as undefined as pushing or popping in a “mostly invalid” priority_queue.

One open question on which I expect experts may disagree: Should flat_map’s extract() operation (which moves-from its underlying vectors) put the map into a “mostly invalid” state, or should it ensure the postcondition that after the containers have been extracted the map is guaranteed empty()? Earlier this week I thought the latter (and suggested that the postcondition should be added to flat_map’s spec), but I think I’ve almost entirely come around to the former.

I think wider recognition and adoption of the “mostly invalid state” principle will make it easier for people (including WG21!) to write their own container adaptors. It will certainly make the specification of flat_map shorter!

Posted 2019-02-24