P1144 PMR koans
This is the fourth in a series of blog posts
(I,
II,
III,
IV)
that I started after the Issaquah WG21 meeting, in order to
(1) resolve the technical differences between
P1144 “std::is_trivially_relocatable”
and the now-multiple Bloomberg papers on the topic
(P2786R1,
P2839R0,
and the comparative analysis P2814R0),
and (2) convince the rest of the C++ Committee
that these resolutions are actually okay to ship.
This post’s topic is the reason (as I understand it) that Bloomberg is suddenly re-interested in trivial relocatability: the polymorphic allocator model.
If the C++ allocator model and/or PMR are unfamiliar to you, you should check out yesterday’s “A not-so-quick introduction to the C++ allocator model” (2023-06-02).
If the unfamiliar part is trivial relocation, check out the wealth of information in my “Trivially Relocatable” talk from C++Now 2019, and/or the #relocatability tag on this blog.
Things we’d like to have
Let’s start with some “friendly” examples. We want the implementation to optimize each of these examples in the “obvious” way. This section contains no tricks or gotchas. Those will come in the next section.
We’ll use the following types in the examples:
struct TC { int i; };
struct TR { std::shared_ptr<int> p; };
static_assert(std::is_trivially_copyable_v<TC>);
static_assert(std::is_trivially_relocatable_v<TR>);
1. Trivial relocatability should propagate in the natural way
We already relied on this premise when we said that TR was trivially relocatable.
Rule-of-Zero types, all of whose members are trivially relocatable, must be
considered trivially relocatable themselves.
2. Vector reallocation should use memcpy
std::vector<TC> v(100);
v.reserve(v.capacity() + 1);
Today, this will use TC’s move-constructor and destructor in a loop;
but we’d like it to use a single 400-byte memcpy instead. Right?
(P1144 and P2786 agree on this.)
There are several subcategories here, each with their own ramifications:
2b. Vector reallocation of PMR types should use memcpy
std::vector<std::pmr::vector<int>> v(100);
v.reserve(v.capacity() + 1);
2c. Reallocation in pmr::vector should use memcpy
std::pmr::vector<TC> v(100);
v.reserve(v.capacity() + 1);
2d. Reallocation of PMR types in pmr::vector should use memcpy
std::pmr::vector<std::pmr::vector<int>> v(100);
v.reserve(v.capacity() + 1);
P2786 §4.3 and §6.4 strongly imply that P2786 would like to support this use-case
in particular. Also notice that (2b) and (2d) imply is_trivially_relocatable_v<pmr::vector<int>>;
this will become important later. I don’t think there’s any way to achieve (2b)
or (2d) without violating (3)…
3. PMR shouldn’t be a special case
namespace My {
template<class T>
struct polymorphic_allocator { ~~~~ };
template<class T>
using vector = std::vector<T, My::polymorphic_allocator<T>>;
};
My::vector<My::vector<int>> v(100);
~~~~
PMR types have many non-trivial layers of indirection (e.g.
polymorphic_allocator::construct and the
soon-to-be-undeprecated
polymorphic_allocator::destroy. We’ll be tempted to say
“The implementation doesn’t actually need to look through all these layers;
it can just look for exactly pmr::foo and special-case it somehow.”
Here are three counterarguments:
-
C++ should always permit reinventing the wheel.
My::polymorphic_allocatorshould be able to work exactly as well aspmr::polymorphic_allocator. If Joe Coder can’t implement PMR without dabbling in secret vendor-specific magic, that’s bad. -
Vendors themselves think special cases are yucky. If you manage your own implementation (cough Bloomberg), feel free to add all the special cases you want; but if WG21 standardizes a proposal that permits “bad performance, or good performance after a bunch of ad-hoc special-case code,” well, that’s just a verbose way of saying “bad performance.” (Vendors already don’t like PMR. You want them to cruft up their code with special cases just to rescue PMR’s performance? That’s not going to happen.) In other words, if the vendor can’t implement PMR without dabbling in vendor-specific magic, that’s bad.
-
Most importantly, if you really believe the vendor should just special-case PMR, then you don’t need P1144 or P2786 at all! Look at libstdc++, for example; it’s recognized
dequeas a special case for vector reallocation since late 2018. Vendors don’t need the Standard’s permission to add special cases, when they want to (which is rare). A proposal must handle arbitrary user-defined types likeMy::vectorandMy::allocator, or it has no point at all.
4. Vector insert and erase should use memmove
std::vector<TC> v(100);
v.erase(v.begin());
Today, this will call TC’s move-assignment operator 99 times
and then destroy the rightmost TC. But surely we’d like it to destroy the
leftmost TC and then memmove 99*sizeof(TC) bytes leftward.
Otherwise we’re leaving performance on the table.
P2786 doesn’t mention this case, but Bloomberg’s bsl::vector does
this exact optimization; it would be silly to forbid std::vector from
doing it. See “Should assignment affect is_trivially_relocatable?” (2024-01-02).
Again there are subcategories here, each with their own ramifications. The analogous cases to (2b) and (2c) are actually a little tricky:
std::vector<std::pmr::vector<int>> v(100);
v.erase(v.begin());
std::pmr::vector<TC> v(100);
v.erase(v.begin());
4d. Insert and erase of PMR types in pmr::vector should use memmove
std::pmr::vector<std::pmr::vector<int>> v(100);
v.erase(v.begin());
Today, this will call pmr::vector<int>’s move-assignment operator 99 times
and then destroy the rightmost pmr::vector. But since all of the elements in v have
the same allocator, this is equivalent to destroying the leftmost pmr::vector
and then memmoving 99*sizeof(v[0]) bytes leftward.
We’d really like to make this work, if we can.
5. swap_ranges of trivially copyable types should use memswap
TC a[100];
TC b[100];
std::swap_ranges(a, a+100, b);
Today, this will call std::swap(a[i], b[i]) in a loop;
but we’d like it to swap a’s bytes with b’s, in whatever size chunks
are most performant. Right?
5b. rotate of trivially copyable types should use memswap
TC a[100];
std::rotate(a, a+50, a+100);
In practice this is equivalent to std::swap_ranges(a, a+50, a+50),
and will automatically benefit from whatever performance enhancements
we give swap_ranges.
Problematic koans
The memory-stomping deallocator
template<class T>
struct A : std::allocator<T> {
template<class U>
void destroy(U *p) {
std::destroy_at(p);
memset(p, 0xDE, sizeof(U));
}
};
int main() {
std::vector<int, A<int>> v = {1,2,3,4,5};
v.erase(v.begin());
assert(v.data()[4] == 0xDEDEDEDE); // UB
}
Allocator A’s intent is to mitigate the damage of dangling pointer dereferences
by overwriting every destroyed object with 0xDEDEDEDE. But if v.erase ends
the lifetime of v[4] by relocating out of it (instead of calling A::destroy
on it), then v[4] won’t get overwritten. This defeats A’s intention.
The effect in this case is theoretically unobservable: the line reading
v.data()[4] has UB. (vector would be within its rights to go back after
the destruction of v[4] and write into that memory an arbitrary value; say, “5”.)
The natural solution in this case is to say that vector::erase isn’t allowed
to use relocation if A::destroy is non-trivial. The core language refuses to
make a type “naturally” trivially relocatable unless its value-semantic operations
are defaulted; vector likewise should refuse to trivially relocate a type
unless the allocator’s construct and destroy are defaulted.
Allocators already nerf the vector pessimization: a nothrow-move-constructible type can still throw from
A::construct. This is LWG 2461.
The PMR aggregate
struct Agg {
std::pmr::vector<int> pv_;
};
int main() {
std::vector<Agg> v;
v.push_back({ std::pmr::vector<int>({1,2}, &mr1) });
v.push_back({ std::pmr::vector<int>({3,4}, &mr2) });
v.erase(v.begin());
assert(v[0].pv_.get_allocator().resource() == &mr1);
}
If you accept (3) plus any of (2b,2d,4d), then you ought to accept
is_trivially_relocatable_v<pmr::vector<int>>. Then (1) implies
that Agg is also trivially relocatable; then (4) implies that this v.erase
should use memmove. But then we have a problem!
There are several natural solutions in this case:
-
Reject (2b,2d,4d) or (3):
pmr::vector<int>simply can’t be trivially relocatable. This means it gets none of the optimizations we’re talking about, unless you reject (3) and write special-case optimizations for it. -
Reject (4,4d):
insertanderasesimply can’t use memmove. -
Make the “strong claim” (as I historically have) that
vviolatesvector’s preconditions. We’ve put intova type which satisfiesstd::relocatablebut the values we inserted, in this context, fail to modelstd::relocatable. This is UB, just like if we tried tostd::sortan array offloatwhere some of the values were NaN. The entire STL has an invisible precondition that types that look value-semantic (statically at compile time) must also act value-semantic (dynamically at runtime), and if they don’t, that’s simply “library UB” — all bets are off. -
Make the weaker claim that
Aggdoes modelstd::relocatable, but only by treatingpv_.get_allocator()as non-salient. Then our mistake wasn’t in theerase, but only that theassertexpects something it has no right to expect. We may reasonably expect a certain value inv[0], but we mustn’t expect anything in particular of its non-salient attributes. It would be equally groundless of us to base the correctness of our C++ program on the assumption that, say,v[0].pv_.capacity() == 2.
The evil allocator-aware type
This is why (4c) was tricky. Godbolt:
struct Widget {
using allocator_type = std::pmr::polymorphic_allocator<Widget>;
Widget(int i) : i_(i) {}
explicit Widget(int i, allocator_type) : i_(i) {}
explicit Widget(const Widget& rhs, allocator_type) : i_(rhs.i_ + 100) {}
int i_;
};
static_assert(std::is_trivially_copyable_v<Widget>);
int main() {
std::pmr::vector<Widget> v = {1,2,3};
assert(v[0].i_ == 101);
v.reserve(v.capacity() + 1);
assert(v[0].i_ == 201);
}
If you accept (2d), then you should accept that v’s reallocation should be able
to use memcpy. After all, the element type Widget is trivially copyable, just like int!
But Widget’s constructor overload set has been carefully crafted to count the number
of times it’s called via polymorphic_allocator::construct.
This snippet is by far the most pathological, but it’s also the most clearly well-defined.
We’re not inspecting deallocated memory, nor non-salient state. Widget clearly
models all the semantic requirements of std::relocatable — heck, it’s trivially copyable!
So there are relatively few ways we can deal with this.
-
Reject (2c,2d): As in “The memory-stomping deallocator,” say
vector<T,A>isn’t allowed to reallocate in terms of relocation whenA::constructorA::destroyis non-trivial. This implies thatpmr::vector<int>can’t use memcpy to reallocate its buffer. -
Reject (3), e.g. by using our intimate knowledge of
polymorphic_allocator::constructto letpmr::vector<T>use memcpy as long as!uses_allocator<T, allocator_type>. Even then, (2d) remains out of reach.
Playing around with this example, you’ll find that GCC has trouble with the very first
assert: GCC 13+ ingeniously constructsWidget(int, allocator_type)directly intov’s storage, instead of constructingWidget(int)into aninitializer_listand thenWidget(const Widget&, allocator_type)intov. That’s now bug #110102.
The rotate
std::pmr::vector<int> vv[] = {
std::pmr::vector<int>({1,2,3}, &mr1),
std::pmr::vector<int>({4,5,6}, &mr2),
};
std::rotate(vv, vv+1, vv+2);
// In practice, equivalent to swap(vv[0], vv[1])
If you accept (3) plus any of (2b,2d,4d), then you accept is_trivially_relocatable_v<pmr::vector<int>>.
If you then accept (5,5b), you’ll expect this rotate to use memswap.
But that would swap the arenas of vv[0] and vv[1]!
Swapping the arenas would actually be preferable to what this code does today, i.e.,
UB and heap corruption.
But there is a third possible outcome, preferred by Bloomberg:
to copy vv[0]’s data into mr2 and copy vv[1]’s data into mr1, just as if we
had done vv[0] = std::exchange(vv[1], vv[0]) manually.
There are a few ways to deal with this:
-
Reject
is_trivially_relocatable_v<pmr::vector<int>>; but nobody wants to do that. -
Make the strong claim that the range
vvviolates the preconditions ofrotate, as if you tried tosorta range of NaNs. -
Make the weaker claim that the
rotateis well-defined butget_allocator()isn’t salient, so it’s actually okay thatvv[0]switched arenas as a result of the rotate. -
Claim that
rotate(vv, vv+1, vv+2)should actually be well-defined, e.g. by adopting P0178 “Allocators and swap” (Meredith, February 2016). As a corollary, reject (5,5b): generic algorithms likeswap,swap_ranges,rotate,sort, etc. simply aren’t allowed to use trivial relocation, even onTC, because they must act as-if they’re using ADLswapand they’re no longer allowed to assume that ADLswapswaps only the values of its arguments. (See LWG 2153 for more details.)
I think this is the biggest disagreement between P1144 and Bloomberg: On the one hand I’m
completely unwilling to give up the performance benefits of (5,5b), and on the other hand
the Bloomberg authors seem unwilling yet to agree either that allocators are non-salient or
that it’s okay for rotate(vv, vv+1, vv+2) to remain undefined.
Here are a couple of pie-in-the-sky ideas that might lead to answers of their own:
-
Our last claim above complained that the library couldn’t tell whether ADL
swapwas “defaulted.” We could teach the core language aboutswapand allow it to be defaulted, the same way C++20 allows an ADLoperator==to be defaulted. A defaulted operation can be queried for triviality, exactly as I proposed in “The Best Type Traits C++ Doesn’t Have” (C++Now 2018). N3746 “A C++1y Swap Operator” (Brown, August 2013) was an early attempt to bringswapinto the core language. We could try again, without the operator syntax (which I think N3746 proposed only because C++11 couldn’t=defaultanything that wasn’t a special member function). -
C++’s object model physically ensures that a contiguous range of polymorphic objects (e.g.
Cat[5],vector<Cat>) always have the same dynamic type. (If you can legally use pointer arithmetic to get from one to the next, they must have the same type.) Could we logically mandate somehow that a contiguous range of PMR-style objects must all be associated with the same arena? P2685 “Language Support for Scoped Objects” (Meredith & Berne, May 2023) is related, but I haven’t read more than the introduction yet. (Obvious pitfall: Piecewise contiguous ranges, likedeque<pmr::string>. Contiguity is not cut-and-dried; in the limit this leads to the strong claim.)
This concludes the set of problematic koans… for now. If I stumble across any more problematic snippets, I’ll update this post.
