A case study in not-quite-move-semantic library design

Building on yesterday’s post “Name lookup in multiple base classes” (2024-04-17), Seth Bromberger and I recently talked about the following library-design problem involving (or at least seeming to involve) exactly that issue. The shape of the problem is:

We have a class Graph whose constructor takes a collection of data and puts the top k values into a nice graph. Its constructor will be templated to work with any collection c, as long as c.topk(10) yields a result of type vector<Datum>.
We have, let’s say, four different collection types that are usable with Graph; let’s call them A, B, C, and D. Let’s say that A and B provide not only .topk(int) const but also a “consuming” or “pilfering” version of topk, which erases the returned elements from the collection. C provides only the non-mutating topk. D provides only the pilfering version!
We would certainly like Graph to be able to use the “pilfering” version of topk in certain circumstances; but equally surely, constructing a Graph from some collection x shouldn’t mutate x by default — not unless we opt in somehow.
We want to write the bulk of topk’s implementation just once, e.g. via a base class, instead of having to implement it in each of A, B, C, D. And likewise for the pilfering version.

Approach 1: Move semantics

Notice that these requirements have almost exactly the same shape as C++11 move semantics! So one way to implement this would be:

struct Datum {};

struct PlainTopkable {
  std::vector<Datum> topk(int k) const;  // non-mutating
};

struct PilferingTopkable {
  std::vector<Datum> topk(int k) &&;     // pilfering
};

struct BothTopkable {
  std::vector<Datum> topk(int k) const&; // non-mutating
  std::vector<Datum> topk(int k) &&;     // pilfering
};

struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};

struct Graph {
  template<class Coll>
  explicit Graph(int k, Coll&& c) {
    data_ = std::forward<Coll>(c).topk(k);
  }

private:
  std::vector<Datum> data_;
};

A a;
C c;
Graph g1 = Graph(10, a);            // non-mutating
Graph g2 = Graph(10, std::move(a)); // pilfer the top 10
Graph g3 = Graph(10, c);            // non-mutating
Graph g4 = Graph(10, std::move(c)); // non-mutating

However, this has two potential downsides. One: The way g2 opts into pilfering the top k elements of a is by doing std::move(a), which looks like you’re “giving up” the entire object a. In fact you’re merely pilfering its top 10 elements; the others remain available; and the resulting object a is still in a usable state. This feels like a misuse of std::move terminology.

I have the same ambivalence about the STL’s set::merge, which is callable as either a.merge(b) or a.merge(std::move(b)). Both calls have the same predictable effect on b, but connote different things to the reader, and neither of them is quite what the programmer means. The right signature would have treated b as an inout parameter (“Pass out-parameters by pointer”) and been called as a.merge(&b); but the C++ STL consistently eschews that rule, so it’s forced into awkwardnesses like this.

Two: g4 “opted in” to pilfering from c, but in fact c doesn’t provide the pilfering version of topk, so it quietly fell back to the non-mutating version. That’s the right thing for move semantics, but it’s not what we want here. (Suppose the caller assumed that c would now be ten elements smaller, and wrote code based on that assumption!) We want a noisy error if Graph asks to pilfer from a non-pilferable collection type.

Approach 2: `pilfering_topk`

The answer to the title question of my C++Now 2021 talk “When Should You Give Two Things the Same Name?” is “only when it’s needed for a technical reason,” i.e., basically never. So let’s apply that guideline here.

struct Datum {};

struct PlainTopkable {
  std::vector<Datum> topk(int k) const;     // non-mutating
};

struct PilferingTopkable {
  std::vector<Datum> pilfering_topk(int k); // pilfering
};

struct A : PlainTopkable, PilferingTopkable {};
struct B : PlainTopkable, PilferingTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};

struct Graph {
  template<class Coll>
  static Graph from_collection(int k, const Coll& c) {
    return Graph(c.topk(k));
  }

  template<class Coll>
  static Graph pilfered_from_collection(int k, Coll& c) {
    return Graph(c.pilfering_topk(k));
  }

private:
  explicit Graph(std::vector<Datum> v) : data_(std::move(v)) {}
  std::vector<Datum> data_;
};

A a;
C c;
Graph g1 = Graph::from_collection(10, a);
Graph g2 = Graph::pilfered_from_collection(10, a);
Graph g3 = Graph::from_collection(10, c);
Graph g4 = Graph::pilfered_from_collection(10, c); // error

I think it’s “more convenient than you think” to provide named factory functions like pilfered_from_collection instead of a gigantic constructor overload set. However, I admit it can be unwieldy sometimes — or impossible, if your Graph needs to be immovable for some reason.

Re factory functions, see “Is your constructor an object-factory or a type-conversion?” (2018-06-21). Re dealing with immovable types, see “The Superconstructing Super Elider” (2018-05-17).

So we might say that we have a “good technical reason” to give the two ways of constructing Graphs the same name (that is, to put them both into the constructor overload set). Then we’d have to use a tag type to differentiate them (cf. the STL’s use of std::allocator_arg and std::in_place tags):

struct PilferTag { explicit PilferTag() = default; };

struct Graph {
  template<class Coll>
  explicit Graph(int k, const Coll& c) {
    data_ = c.topk(k);
  }

  template<class Coll>
  explicit Graph(int k, const Coll& c, PilferTag) {
    data_ = c.pilfering_topk(k);
  }

private:
  std::vector<Datum> data_;
};

Graph g1 = Graph(10, a);               // non-mutating
Graph g2 = Graph(10, a, PilferTag());  // pilfer the top 10
Graph g3 = Graph(10, c);               // non-mutating
Graph g4 = Graph(10, c, PilferTag());  // error

Approach 3: Tags all the way down?

We might look at our last snippet and claim that we have a “good technical reason” to use the same name all the way down our stack! We’d be wrong, but let’s suppose we claim it anyway. Then we might try for a solution like the following. Notice the simpler Graph class. But we also run into exactly the problem from “Name lookup in multiple base classes” (2024-04-17), and have to solve it with using-declarations.

struct Datum {};
struct PilferTag { explicit PilferTag() = default; };

struct PlainTopkable {
  std::vector<Datum> topk(int k) const;      // non-mutating
};

struct PilferingTopkable {
  std::vector<Datum> topk(int k, PilferTag); // pilfering
};

struct BothTopkable : PlainTopkable, PilferingTopkable {
  using PlainTopkable::topk;
  using PilferingTopkable::topk;
};

struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};

struct Graph {
  template<class Fwd, class... Tags>
  explicit Graph(int k, Fwd&& c, Tags... tags) {
    data_ = c.topk(k, tags...);
  }

private:
  std::vector<Datum> data_;
};

Graph g1 = Graph(10, a);               // non-mutating
Graph g2 = Graph(10, a, PilferTag());  // pilfer the top 10
Graph g3 = Graph(10, c);               // non-mutating
Graph g4 = Graph(10, c, PilferTag());  // error

Here I’m using Fwd&& to bind to either A& or const A&, whatever the caller happens to pass to me. Here const Coll& would be fine if I never wanted to modify the collection; but in fact I might want to modify it, depending on what Tags... are. Arguably this is a lazy and unstylish use of a forwarding reference.

See “Universal reference or forwarding reference?” (2022-02-02) and “Don’t forward things that aren’t forwarding references” (2023-05-07).

Approach 4: Preprocessor tricks

I don’t recommend this approach personally, but it can be attractive in real codebases. We could dispense with inheritance and the using-declarations that come with it, and simply reimplement topk in each of A, B, C, D. “But we wanted to write topk only once!” Sure — we’ll just let the compiler cut-and-paste our implementation into each class’s body, using either a macro (as here) or an #include directive.

#define DEFINE_TOPK \
  std::vector<Datum> topk(int k) const { ~~~ }

#define DEFINE_TOPK_PILFERING \
  std::vector<Datum> topk(int k, PilferTag) { ~~~ }

struct A {
  DEFINE_TOPK;
  DEFINE_TOPK_PILFERING;
};
struct B {
  DEFINE_TOPK;
  DEFINE_TOPK_PILFERING;
};
struct C {
  DEFINE_TOPK;
};
struct D {
  DEFINE_TOPK_PILFERING;
};

struct Graph {
  template<class Fwd, class... Tags>
  explicit Graph(int k, Fwd&& c, Tags... tags) {
    data_ = c.topk(k, tags...);
  }

private:
  std::vector<Datum> data_;
};

Graph g1 = Graph(10, a);               // non-mutating
Graph g2 = Graph(10, a, PilferTag());  // pilfer the top 10
Graph g3 = Graph(10, c);               // non-mutating
Graph g4 = Graph(10, c, PilferTag());  // error

My personal feeling is that preprocessor tricks are fine in moderation (see the #preprocessor category on this blog), but designing a single “thing” (topk) to rely on preprocessor macros and templates and perfect-forwarding is asking too much of the reader.

Obviously I prefer Approach 2, that being the only one that never gives two things the same name.

A case study in not-quite-move-semantic library design

Approach 1: Move semantics

Approach 2: pilfering_topk

Approach 3: Tags all the way down?

Approach 4: Preprocessor tricks

Approach 2: `pilfering_topk`