A case study in not-quite-move-semantic library design
Building on yesterday’s post “Name lookup in multiple base classes” (2024-04-17), Seth Bromberger and I recently talked about the following library-design problem involving (or at least seeming to involve) exactly that issue. The shape of the problem is:
-
We have a class
Graphwhose constructor takes a collection of data and puts the topkvalues into a nice graph. Its constructor will be templated to work with any collectionc, as long asc.topk(10)yields a result of typevector<Datum>. -
We have, let’s say, four different collection types that are usable with
Graph; let’s call themA,B,C, andD. Let’s say thatAandBprovide not only.topk(int) constbut also a “consuming” or “pilfering” version oftopk, which erases the returned elements from the collection.Cprovides only the non-mutatingtopk.Dprovides only the pilfering version! -
We would certainly like
Graphto be able to use the “pilfering” version oftopkin certain circumstances; but equally surely, constructing aGraphfrom some collectionxshouldn’t mutatexby default — not unless we opt in somehow. -
We want to write the bulk of
topk’s implementation just once, e.g. via a base class, instead of having to implement it in each ofA,B,C,D. And likewise for the pilfering version.
Approach 1: Move semantics
Notice that these requirements have almost exactly the same shape as C++11 move semantics! So one way to implement this would be:
struct Datum {};
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> topk(int k) &&; // pilfering
};
struct BothTopkable {
std::vector<Datum> topk(int k) const&; // non-mutating
std::vector<Datum> topk(int k) &&; // pilfering
};
struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Coll>
explicit Graph(int k, Coll&& c) {
data_ = std::forward<Coll>(c).topk(k);
}
private:
std::vector<Datum> data_;
};
A a;
C c;
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, std::move(a)); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, std::move(c)); // non-mutating
However, this has two potential downsides. One: The way g2 opts into pilfering the top k elements
of a is by doing std::move(a), which looks like you’re “giving up” the entire object a. In fact
you’re merely pilfering its top 10 elements; the others remain available; and the resulting object a
is still in a usable state. This feels like a misuse of std::move terminology.
I have the same ambivalence about the STL’s
set::merge, which is callable as eithera.merge(b)ora.merge(std::move(b)). Both calls have the same predictable effect onb, but connote different things to the reader, and neither of them is quite what the programmer means. The right signature would have treatedbas an inout parameter (“Pass out-parameters by pointer”) and been called asa.merge(&b); but the C++ STL consistently eschews that rule, so it’s forced into awkwardnesses like this.
Two: g4 “opted in” to pilfering from c, but in fact c doesn’t provide the pilfering version of topk,
so it quietly fell back to the non-mutating version. That’s the right thing for move semantics, but it’s not
what we want here. (Suppose the caller assumed that c would now be ten elements smaller, and wrote code
based on that assumption!) We want a noisy error if Graph asks to pilfer from a non-pilferable collection type.
Approach 2: pilfering_topk
The answer to the title question of my C++Now 2021 talk “When Should You Give Two Things the Same Name?” is “only when it’s needed for a technical reason,” i.e., basically never. So let’s apply that guideline here.
struct Datum {};
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> pilfering_topk(int k); // pilfering
};
struct A : PlainTopkable, PilferingTopkable {};
struct B : PlainTopkable, PilferingTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Coll>
static Graph from_collection(int k, const Coll& c) {
return Graph(c.topk(k));
}
template<class Coll>
static Graph pilfered_from_collection(int k, Coll& c) {
return Graph(c.pilfering_topk(k));
}
private:
explicit Graph(std::vector<Datum> v) : data_(std::move(v)) {}
std::vector<Datum> data_;
};
A a;
C c;
Graph g1 = Graph::from_collection(10, a);
Graph g2 = Graph::pilfered_from_collection(10, a);
Graph g3 = Graph::from_collection(10, c);
Graph g4 = Graph::pilfered_from_collection(10, c); // error
I think it’s “more convenient than you think” to provide named factory functions like
pilfered_from_collection instead of a gigantic constructor overload set. However, I admit
it can be unwieldy sometimes — or impossible, if your Graph needs to be immovable for some
reason.
Re factory functions, see “Is your constructor an object-factory or a type-conversion?” (2018-06-21). Re dealing with immovable types, see “The Superconstructing Super Elider” (2018-05-17).
So we might say that we have a “good technical reason” to give the two ways
of constructing Graphs the same name (that is, to put them both into the constructor
overload set). Then we’d have to use a tag type to differentiate them (cf. the STL’s
use of std::allocator_arg and std::in_place tags):
struct PilferTag { explicit PilferTag() = default; };
struct Graph {
template<class Coll>
explicit Graph(int k, const Coll& c) {
data_ = c.topk(k);
}
template<class Coll>
explicit Graph(int k, const Coll& c, PilferTag) {
data_ = c.pilfering_topk(k);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
Approach 3: Tags all the way down?
We might look at our last snippet and claim that we have a “good technical reason” to use
the same name all the way down our stack! We’d be wrong, but let’s suppose we claim it
anyway. Then we might try for a solution like the following. Notice the simpler Graph
class. But we also run into exactly the problem from
“Name lookup in multiple base classes” (2024-04-17),
and have to solve it with using-declarations.
struct Datum {};
struct PilferTag { explicit PilferTag() = default; };
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> topk(int k, PilferTag); // pilfering
};
struct BothTopkable : PlainTopkable, PilferingTopkable {
using PlainTopkable::topk;
using PilferingTopkable::topk;
};
struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Fwd, class... Tags>
explicit Graph(int k, Fwd&& c, Tags... tags) {
data_ = c.topk(k, tags...);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
Here I’m using Fwd&& to bind to either A& or const A&, whatever the
caller happens to pass to me. Here const Coll& would be fine if I never wanted
to modify the collection; but in fact I might want to modify it, depending on
what Tags... are. Arguably this is a lazy and unstylish use of a forwarding
reference.
See “Universal reference or forwarding reference?” (2022-02-02) and “Don’t
forwardthings that aren’t forwarding references” (2023-05-07).
Approach 4: Preprocessor tricks
I don’t recommend this approach personally, but it can be attractive in real codebases.
We could dispense with inheritance and the using-declarations that come with it,
and simply reimplement topk in each of A, B, C, D.
“But we wanted to write topk only once!” Sure — we’ll just let the compiler cut-and-paste
our implementation into each class’s body, using either a macro (as here) or an
#include directive.
#define DEFINE_TOPK \
std::vector<Datum> topk(int k) const { ~~~ }
#define DEFINE_TOPK_PILFERING \
std::vector<Datum> topk(int k, PilferTag) { ~~~ }
struct A {
DEFINE_TOPK;
DEFINE_TOPK_PILFERING;
};
struct B {
DEFINE_TOPK;
DEFINE_TOPK_PILFERING;
};
struct C {
DEFINE_TOPK;
};
struct D {
DEFINE_TOPK_PILFERING;
};
struct Graph {
template<class Fwd, class... Tags>
explicit Graph(int k, Fwd&& c, Tags... tags) {
data_ = c.topk(k, tags...);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
My personal feeling is that preprocessor tricks are fine in moderation
(see the #preprocessor category on this blog),
but designing a single “thing” (topk) to rely on preprocessor macros
and templates and perfect-forwarding is asking too much of the reader.
Obviously I prefer Approach 2, that being the only one that never gives two things the same name.
See also:
- “Techniques for post-facto multiple inheritance” (2022-12-01)
