A case study in not-quite-move-semantic library design
Building on yesterday’s post “Name lookup in multiple base classes” (2024-04-17), Seth Bromberger and I recently talked about the following library-design problem involving (or at least seeming to involve) exactly that issue. The shape of the problem is:
-
We have a class
Graph
whose constructor takes a collection of data and puts the topk
values into a nice graph. Its constructor will be templated to work with any collectionc
, as long asc.topk(10)
yields a result of typevector<Datum>
. -
We have, let’s say, four different collection types that are usable with
Graph
; let’s call themA
,B
,C
, andD
. Let’s say thatA
andB
provide not only.topk(int) const
but also a “consuming” or “pilfering” version oftopk
, which erases the returned elements from the collection.C
provides only the non-mutatingtopk
.D
provides only the pilfering version! -
We would certainly like
Graph
to be able to use the “pilfering” version oftopk
in certain circumstances; but equally surely, constructing aGraph
from some collectionx
shouldn’t mutatex
by default — not unless we opt in somehow. -
We want to write the bulk of
topk
’s implementation just once, e.g. via a base class, instead of having to implement it in each ofA
,B
,C
,D
. And likewise for the pilfering version.
Approach 1: Move semantics
Notice that these requirements have almost exactly the same shape as C++11 move semantics! So one way to implement this would be:
struct Datum {};
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> topk(int k) &&; // pilfering
};
struct BothTopkable {
std::vector<Datum> topk(int k) const&; // non-mutating
std::vector<Datum> topk(int k) &&; // pilfering
};
struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Coll>
explicit Graph(int k, Coll&& c) {
data_ = std::forward<Coll>(c).topk(k);
}
private:
std::vector<Datum> data_;
};
A a;
C c;
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, std::move(a)); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, std::move(c)); // non-mutating
However, this has two potential downsides. One: The way g2
opts into pilfering the top k
elements
of a
is by doing std::move(a)
, which looks like you’re “giving up” the entire object a
. In fact
you’re merely pilfering its top 10 elements; the others remain available; and the resulting object a
is still in a usable state. This feels like a misuse of std::move
terminology.
I have the same ambivalence about the STL’s
set::merge
, which is callable as eithera.merge(b)
ora.merge(std::move(b))
. Both calls have the same predictable effect onb
, but connote different things to the reader, and neither of them is quite what the programmer means. The right signature would have treatedb
as an inout parameter (“Pass out-parameters by pointer”) and been called asa.merge(&b)
; but the C++ STL consistently eschews that rule, so it’s forced into awkwardnesses like this.
Two: g4
“opted in” to pilfering from c
, but in fact c
doesn’t provide the pilfering version of topk
,
so it quietly fell back to the non-mutating version. That’s the right thing for move semantics, but it’s not
what we want here. (Suppose the caller assumed that c
would now be ten elements smaller, and wrote code
based on that assumption!) We want a noisy error if Graph
asks to pilfer from a non-pilferable collection type.
Approach 2: pilfering_topk
The answer to the title question of my C++Now 2021 talk “When Should You Give Two Things the Same Name?” is “only when it’s needed for a technical reason,” i.e., basically never. So let’s apply that guideline here.
struct Datum {};
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> pilfering_topk(int k); // pilfering
};
struct A : PlainTopkable, PilferingTopkable {};
struct B : PlainTopkable, PilferingTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Coll>
static Graph from_collection(int k, const Coll& c) {
return Graph(c.topk(k));
}
template<class Coll>
static Graph pilfered_from_collection(int k, Coll& c) {
return Graph(c.pilfering_topk(k));
}
private:
explicit Graph(std::vector<Datum> v) : data_(std::move(v)) {}
std::vector<Datum> data_;
};
A a;
C c;
Graph g1 = Graph::from_collection(10, a);
Graph g2 = Graph::pilfered_from_collection(10, a);
Graph g3 = Graph::from_collection(10, c);
Graph g4 = Graph::pilfered_from_collection(10, c); // error
I think it’s “more convenient than you think” to provide named factory functions like
pilfered_from_collection
instead of a gigantic constructor overload set. However, I admit
it can be unwieldy sometimes — or impossible, if your Graph
needs to be immovable for some
reason.
Re factory functions, see “Is your constructor an object-factory or a type-conversion?” (2018-06-21). Re dealing with immovable types, see “The Superconstructing Super Elider” (2018-05-17).
So we might say that we have a “good technical reason” to give the two ways
of constructing Graph
s the same name (that is, to put them both into the constructor
overload set). Then we’d have to use a tag type to differentiate them (cf. the STL’s
use of std::allocator_arg
and std::in_place
tags):
struct PilferTag { explicit PilferTag() = default; };
struct Graph {
template<class Coll>
explicit Graph(int k, const Coll& c) {
data_ = c.topk(k);
}
template<class Coll>
explicit Graph(int k, const Coll& c, PilferTag) {
data_ = c.pilfering_topk(k);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
Approach 3: Tags all the way down?
We might look at our last snippet and claim that we have a “good technical reason” to use
the same name all the way down our stack! We’d be wrong, but let’s suppose we claim it
anyway. Then we might try for a solution like the following. Notice the simpler Graph
class. But we also run into exactly the problem from
“Name lookup in multiple base classes” (2024-04-17),
and have to solve it with using
-declarations.
struct Datum {};
struct PilferTag { explicit PilferTag() = default; };
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> topk(int k, PilferTag); // pilfering
};
struct BothTopkable : PlainTopkable, PilferingTopkable {
using PlainTopkable::topk;
using PilferingTopkable::topk;
};
struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Fwd, class... Tags>
explicit Graph(int k, Fwd&& c, Tags... tags) {
data_ = c.topk(k, tags...);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
Here I’m using Fwd&&
to bind to either A&
or const A&
, whatever the
caller happens to pass to me. Here const Coll&
would be fine if I never wanted
to modify the collection; but in fact I might want to modify it, depending on
what Tags...
are. Arguably this is a lazy and unstylish use of a forwarding
reference.
See “Universal reference or forwarding reference?” (2022-02-02) and “Don’t
forward
things that aren’t forwarding references” (2023-05-07).
Approach 4: Preprocessor tricks
I don’t recommend this approach personally, but it can be attractive in real codebases.
We could dispense with inheritance and the using
-declarations that come with it,
and simply reimplement topk
in each of A
, B
, C
, D
.
“But we wanted to write topk
only once!” Sure — we’ll just let the compiler cut-and-paste
our implementation into each class’s body, using either a macro (as here) or an
#include
directive.
#define DEFINE_TOPK \
std::vector<Datum> topk(int k) const { ~~~ }
#define DEFINE_TOPK_PILFERING \
std::vector<Datum> topk(int k, PilferTag) { ~~~ }
struct A {
DEFINE_TOPK;
DEFINE_TOPK_PILFERING;
};
struct B {
DEFINE_TOPK;
DEFINE_TOPK_PILFERING;
};
struct C {
DEFINE_TOPK;
};
struct D {
DEFINE_TOPK_PILFERING;
};
struct Graph {
template<class Fwd, class... Tags>
explicit Graph(int k, Fwd&& c, Tags... tags) {
data_ = c.topk(k, tags...);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
My personal feeling is that preprocessor tricks are fine in moderation
(see the #preprocessor category on this blog),
but designing a single “thing” (topk
) to rely on preprocessor macros
and templates and perfect-forwarding is asking too much of the reader.
Obviously I prefer Approach 2, that being the only one that never gives two things the same name.
See also:
- “Techniques for post-facto multiple inheritance” (2022-12-01)