P1144 PMR koans
This is the fourth in a series of blog posts
(I,
II,
III,
IV)
that I started after the Issaquah WG21 meeting, in order to
(1) resolve the technical differences between
P1144 “std::is_trivially_relocatable
”
and the now-multiple Bloomberg papers on the topic
(P2786R1,
P2839R0,
and the comparative analysis P2814R0),
and (2) convince the rest of the C++ Committee
that these resolutions are actually okay to ship.
This post’s topic is the reason (as I understand it) that Bloomberg is suddenly re-interested in trivial relocatability: the polymorphic allocator model.
If the C++ allocator model and/or PMR are unfamiliar to you, you should check out yesterday’s “A not-so-quick introduction to the C++ allocator model” (2023-06-02).
If the unfamiliar part is trivial relocation, check out the wealth of information in my “Trivially Relocatable” talk from C++Now 2019, and/or the #relocatability tag on this blog.
Things we’d like to have
Let’s start with some “friendly” examples. We want the implementation to optimize each of these examples in the “obvious” way. This section contains no tricks or gotchas. Those will come in the next section.
We’ll use the following types in the examples:
struct TC { int i; };
struct TR { std::shared_ptr<int> p; };
static_assert(std::is_trivially_copyable_v<TC>);
static_assert(std::is_trivially_relocatable_v<TR>);
1. Trivial relocatability should propagate in the natural way
We already relied on this premise when we said that TR
was trivially relocatable.
Rule-of-Zero types, all of whose members are trivially relocatable, must be
considered trivially relocatable themselves.
2. Vector reallocation should use memcpy
std::vector<TC> v(100);
v.reserve(v.capacity() + 1);
Today, this will use TC
’s move-constructor and destructor in a loop;
but we’d like it to use a single 400-byte memcpy
instead. Right?
(P1144 and P2786 agree on this.)
There are several subcategories here, each with their own ramifications:
2b. Vector reallocation of PMR types should use memcpy
std::vector<std::pmr::vector<int>> v(100);
v.reserve(v.capacity() + 1);
2c. Reallocation in pmr::vector
should use memcpy
std::pmr::vector<TC> v(100);
v.reserve(v.capacity() + 1);
2d. Reallocation of PMR types in pmr::vector
should use memcpy
std::pmr::vector<std::pmr::vector<int>> v(100);
v.reserve(v.capacity() + 1);
P2786 §4.3 and §6.4 strongly imply that P2786 would like to support this use-case
in particular. Also notice that (2b) and (2d) imply is_trivially_relocatable_v<pmr::vector<int>>
;
this will become important later. I don’t think there’s any way to achieve (2b)
or (2d) without violating (3)…
3. PMR shouldn’t be a special case
namespace My {
template<class T>
struct polymorphic_allocator { ~~~~ };
template<class T>
using vector = std::vector<T, My::polymorphic_allocator<T>>;
};
My::vector<My::vector<int>> v(100);
~~~~
PMR types have many non-trivial layers of indirection (e.g.
polymorphic_allocator::construct
and the
soon-to-be-undeprecated
polymorphic_allocator::destroy
. We’ll be tempted to say
“The implementation doesn’t actually need to look through all these layers;
it can just look for exactly pmr::foo
and special-case it somehow.”
Here are three counterarguments:
-
C++ should always permit reinventing the wheel.
My::polymorphic_allocator
should be able to work exactly as well aspmr::polymorphic_allocator
. If Joe Coder can’t implement PMR without dabbling in secret vendor-specific magic, that’s bad. -
Vendors themselves think special cases are yucky. If you manage your own implementation (cough Bloomberg), feel free to add all the special cases you want; but if WG21 standardizes a proposal that permits “bad performance, or good performance after a bunch of ad-hoc special-case code,” well, that’s just a verbose way of saying “bad performance.” (Vendors already don’t like PMR. You want them to cruft up their code with special cases just to rescue PMR’s performance? That’s not going to happen.) In other words, if the vendor can’t implement PMR without dabbling in vendor-specific magic, that’s bad.
-
Most importantly, if you really believe the vendor should just special-case PMR, then you don’t need P1144 or P2786 at all! Look at libstdc++, for example; it’s recognized
deque
as a special case for vector reallocation since late 2018. Vendors don’t need the Standard’s permission to add special cases, when they want to (which is rare). A proposal must handle arbitrary user-defined types likeMy::vector
andMy::allocator
, or it has no point at all.
4. Vector insert and erase should use memmove
std::vector<TC> v(100);
v.erase(v.begin());
Today, this will call TC
’s move-assignment operator 99 times
and then destroy the rightmost TC
. But surely we’d like it to destroy the
leftmost TC
and then memmove 99*sizeof(TC)
bytes leftward.
Otherwise we’re leaving performance on the table.
P2786 doesn’t mention this case, but Bloomberg’s bsl::vector
does
this exact optimization; it would be silly to forbid std::vector
from
doing it. See “Should assignment affect is_trivially_relocatable
?” (2024-01-02).
Again there are subcategories here, each with their own ramifications. The analogous cases to (2b) and (2c) are actually a little tricky:
std::vector<std::pmr::vector<int>> v(100);
v.erase(v.begin());
std::pmr::vector<TC> v(100);
v.erase(v.begin());
4d. Insert and erase of PMR types in pmr::vector
should use memmove
std::pmr::vector<std::pmr::vector<int>> v(100);
v.erase(v.begin());
Today, this will call pmr::vector<int>
’s move-assignment operator 99 times
and then destroy the rightmost pmr::vector
. But since all of the elements in v
have
the same allocator, this is equivalent to destroying the leftmost pmr::vector
and then memmoving 99*sizeof(v[0])
bytes leftward.
We’d really like to make this work, if we can.
5. swap_ranges
of trivially copyable types should use memswap
TC a[100];
TC b[100];
std::swap_ranges(a, a+100, b);
Today, this will call std::swap(a[i], b[i])
in a loop;
but we’d like it to swap a
’s bytes with b
’s, in whatever size chunks
are most performant. Right?
5b. rotate
of trivially copyable types should use memswap
TC a[100];
std::rotate(a, a+50, a+100);
In practice this is equivalent to std::swap_ranges(a, a+50, a+50)
,
and will automatically benefit from whatever performance enhancements
we give swap_ranges
.
Problematic koans
The memory-stomping deallocator
template<class T>
struct A : std::allocator<T> {
template<class U>
void destroy(U *p) {
std::destroy_at(p);
memset(p, 0xDE, sizeof(U));
}
};
int main() {
std::vector<int, A<int>> v = {1,2,3,4,5};
v.erase(v.begin());
assert(v.data()[4] == 0xDEDEDEDE); // UB
}
Allocator A
’s intent is to mitigate the damage of dangling pointer dereferences
by overwriting every destroyed object with 0xDEDEDEDE
. But if v.erase
ends
the lifetime of v[4]
by relocating out of it (instead of calling A::destroy
on it), then v[4]
won’t get overwritten. This defeats A
’s intention.
The effect in this case is theoretically unobservable: the line reading
v.data()[4]
has UB. (vector
would be within its rights to go back after
the destruction of v[4]
and write into that memory an arbitrary value; say, “5”.)
The natural solution in this case is to say that vector::erase
isn’t allowed
to use relocation if A::destroy
is non-trivial. The core language refuses to
make a type “naturally” trivially relocatable unless its value-semantic operations
are defaulted; vector
likewise should refuse to trivially relocate a type
unless the allocator’s construct
and destroy
are defaulted.
Allocators already nerf the vector pessimization: a nothrow-move-constructible type can still throw from
A::construct
. This is LWG 2461.
The PMR aggregate
struct Agg {
std::pmr::vector<int> pv_;
};
int main() {
std::vector<Agg> v;
v.push_back({ std::pmr::vector<int>({1,2}, &mr1) });
v.push_back({ std::pmr::vector<int>({3,4}, &mr2) });
v.erase(v.begin());
assert(v[0].pv_.get_allocator().resource() == &mr1);
}
If you accept (3) plus any of (2b,2d,4d), then you ought to accept
is_trivially_relocatable_v<pmr::vector<int>>
. Then (1) implies
that Agg
is also trivially relocatable; then (4) implies that this v.erase
should use memmove. But then we have a problem!
There are several natural solutions in this case:
-
Reject (2b,2d,4d) or (3):
pmr::vector<int>
simply can’t be trivially relocatable. This means it gets none of the optimizations we’re talking about, unless you reject (3) and write special-case optimizations for it. -
Reject (4,4d):
insert
anderase
simply can’t use memmove. -
Make the “strong claim” (as I historically have) that
v
violatesvector
’s preconditions. We’ve put intov
a type which satisfiesstd::relocatable
but the values we inserted, in this context, fail to modelstd::relocatable
. This is UB, just like if we tried tostd::sort
an array offloat
where some of the values were NaN. The entire STL has an invisible precondition that types that look value-semantic (statically at compile time) must also act value-semantic (dynamically at runtime), and if they don’t, that’s simply “library UB” — all bets are off. -
Make the weaker claim that
Agg
does modelstd::relocatable
, but only by treatingpv_.get_allocator()
as non-salient. Then our mistake wasn’t in theerase
, but only that theassert
expects something it has no right to expect. We may reasonably expect a certain value inv[0]
, but we mustn’t expect anything in particular of its non-salient attributes. It would be equally groundless of us to base the correctness of our C++ program on the assumption that, say,v[0].pv_.capacity() == 2
.
The evil allocator-aware type
This is why (4c) was tricky. Godbolt:
struct Widget {
using allocator_type = std::pmr::polymorphic_allocator<Widget>;
Widget(int i) : i_(i) {}
explicit Widget(int i, allocator_type) : i_(i) {}
explicit Widget(const Widget& rhs, allocator_type) : i_(rhs.i_ + 100) {}
int i_;
};
static_assert(std::is_trivially_copyable_v<Widget>);
int main() {
std::pmr::vector<Widget> v = {1,2,3};
assert(v[0].i_ == 101);
v.reserve(v.capacity() + 1);
assert(v[0].i_ == 201);
}
If you accept (2d), then you should accept that v
’s reallocation should be able
to use memcpy. After all, the element type Widget
is trivially copyable, just like int
!
But Widget
’s constructor overload set has been carefully crafted to count the number
of times it’s called via polymorphic_allocator::construct
.
This snippet is by far the most pathological, but it’s also the most clearly well-defined.
We’re not inspecting deallocated memory, nor non-salient state. Widget
clearly
models all the semantic requirements of std::relocatable
— heck, it’s trivially copyable!
So there are relatively few ways we can deal with this.
-
Reject (2c,2d): As in “The memory-stomping deallocator,” say
vector<T,A>
isn’t allowed to reallocate in terms of relocation whenA::construct
orA::destroy
is non-trivial. This implies thatpmr::vector<int>
can’t use memcpy to reallocate its buffer. -
Reject (3), e.g. by using our intimate knowledge of
polymorphic_allocator::construct
to letpmr::vector<T>
use memcpy as long as!uses_allocator<T, allocator_type>
. Even then, (2d) remains out of reach.
Playing around with this example, you’ll find that GCC has trouble with the very first
assert
: GCC 13+ ingeniously constructsWidget(int, allocator_type)
directly intov
’s storage, instead of constructingWidget(int)
into aninitializer_list
and thenWidget(const Widget&, allocator_type)
intov
. That’s now bug #110102.
The rotate
std::pmr::vector<int> vv[] = {
std::pmr::vector<int>({1,2,3}, &mr1),
std::pmr::vector<int>({4,5,6}, &mr2),
};
std::rotate(vv, vv+1, vv+2);
// In practice, equivalent to swap(vv[0], vv[1])
If you accept (3) plus any of (2b,2d,4d), then you accept is_trivially_relocatable_v<pmr::vector<int>>
.
If you then accept (5,5b), you’ll expect this rotate
to use memswap.
But that would swap the arenas of vv[0]
and vv[1]
!
Swapping the arenas would actually be preferable to what this code does today, i.e.,
UB and heap corruption.
But there is a third possible outcome, preferred by Bloomberg:
to copy vv[0]
’s data into mr2
and copy vv[1]
’s data into mr1
, just as if we
had done vv[0] = std::exchange(vv[1], vv[0])
manually.
There are a few ways to deal with this:
-
Reject
is_trivially_relocatable_v<pmr::vector<int>>
; but nobody wants to do that. -
Make the strong claim that the range
vv
violates the preconditions ofrotate
, as if you tried tosort
a range of NaNs. -
Make the weaker claim that the
rotate
is well-defined butget_allocator()
isn’t salient, so it’s actually okay thatvv[0]
switched arenas as a result of the rotate. -
Claim that
rotate(vv, vv+1, vv+2)
should actually be well-defined, e.g. by adopting P0178 “Allocators and swap” (Meredith, February 2016). As a corollary, reject (5,5b): generic algorithms likeswap
,swap_ranges
,rotate
,sort
, etc. simply aren’t allowed to use trivial relocation, even onTC
, because they must act as-if they’re using ADLswap
and they’re no longer allowed to assume that ADLswap
swaps only the values of its arguments. (See LWG 2153 for more details.)
I think this is the biggest disagreement between P1144 and Bloomberg: On the one hand I’m
completely unwilling to give up the performance benefits of (5,5b), and on the other hand
the Bloomberg authors seem unwilling yet to agree either that allocators are non-salient or
that it’s okay for rotate(vv, vv+1, vv+2)
to remain undefined.
Here are a couple of pie-in-the-sky ideas that might lead to answers of their own:
-
Our last claim above complained that the library couldn’t tell whether ADL
swap
was “defaulted.” We could teach the core language aboutswap
and allow it to be defaulted, the same way C++20 allows an ADLoperator==
to be defaulted. A defaulted operation can be queried for triviality, exactly as I proposed in “The Best Type Traits C++ Doesn’t Have” (C++Now 2018). N3746 “A C++1y Swap Operator” (Brown, August 2013) was an early attempt to bringswap
into the core language. We could try again, without the operator syntax (which I think N3746 proposed only because C++11 couldn’t=default
anything that wasn’t a special member function). -
C++’s object model physically ensures that a contiguous range of polymorphic objects (e.g.
Cat[5]
,vector<Cat>
) always have the same dynamic type. (If you can legally use pointer arithmetic to get from one to the next, they must have the same type.) Could we logically mandate somehow that a contiguous range of PMR-style objects must all be associated with the same arena? P2685 “Language Support for Scoped Objects” (Meredith & Berne, May 2023) is related, but I haven’t read more than the introduction yet. (Obvious pitfall: Piecewise contiguous ranges, likedeque<pmr::string>
. Contiguity is not cut-and-dried; in the limit this leads to the strong claim.)
This concludes the set of problematic koans… for now. If I stumble across any more problematic snippets, I’ll update this post.