Default-constructibility is overrated

The Ranges Technical Specification includes very many concept definitions (based on the Concepts TS), including for example Integral and Predicate. It also provides a concept named Regular which implements a variation on the “Regular” concept described by Alexander Stepanov in his paper '’Fundamentals of Generic Programming’’ (1998).

Stepanov writes (pp. 2–3):

The critical insight which produced generic programming is that highly reusable components must be programmed assuming a minimal collection of […] concepts […] Successful production of a generic component is not simply a matter of identifying the minimal requirements of an arbitrary type or algorithm – it requires identifying the common requirements of a broad collection of similar components. The final requirement is that we accomplish this without sacrificing performance relative to programming with concrete structures. A good generic library becomes a repository of highly efficient data structures and algorithms, based on a small number of broadly useful concepts […]

Under the Stepanov definition, a type is “Regular” if and only if it provides all of the following operations with appropriate (that is, non-surprising and self-consistent) semantics:

• Default constructor
• Copy constructor
• Destructor
• Assignment
• Equality (a == b)
• Inequality (a != b)
• Ordering (a < b)

Remember that this (pioneering!) definition was written in 1998. Twenty years later, a programmer looking at this list through “modern C++” glasses should be puzzled by the inclusion of at least two or three of these primitive operations, and the omission of a couple more. First of all, in C++11 and later, we expect to see these operations added:

• Move constructor
• Move assignment
• Swap (swap(a, b))

Note that if we have efficient and nonthrowing move construction, move assignment, and destruction, then we could use the “sensible default” of std::swap instead of requiring our type to provide swap itself; but then again, if we have efficient equality, then we could just as well use the “sensible default” of !(a == b) instead of requiring our type to provide a != b itself. So absolute minimalism in the choice of primitives is clearly not one of Stepanov’s design goals here.

There are two weirder primitives that Stepanov does include in “Regular.” First:

• Ordering (a < b)

This is so weird and out of place that Stepanov himself calls it out on page 4:

The ordering case is interesting. C++ does not define total ordering operations on pointer types […] But this subject is beyond the scope of the current paper.

It is so weird, in fact, that the Ranges TS concept Regular does not require ordering. This is the single solitary place — as far as I know — where Ranges’ Regular deliberately diverges from Stepanov’s “Regular.” I think the authors of the TS made the right call here; being totally ordered is not so intrinsically important or useful that it ought to be part of a Regular type.

In particular, I can imagine making a value-semantic class type representing a playing_card, where we can make a very strong and intuitive case for playing_card(Queen, Spades) != playing_card(Queen, Hearts) but we have no intuitive basis for claiming that there is an intrinsic “ordering” between playing_card(Queen, Spades) and playing_card(Queen, Hearts).

Stepanov’s Regular includes default-constructibility

But I want to talk about the second weird primitive that Stepanov includes in “Regular,” which is:

• Default constructor

Stepanov does not include any caveat in his 1998 paper about default-constructibility, even though it suffers from much the same problem with built-in types:

int i, j;
int *p = &i, *q = &j;

bool b = (p < q);  // This comparison yields undefined behavior: not a good primitive!
// Stepanov called this one out explicitly as problematic.

j = i;  // This assignment yields undefined behavior: not a good primitive!?!?
// Stepanov doesn't call this problematic


The undefined behavior above isn’t the fault of the assignment operator! It comes from reading the value of the default-initialized (a.k.a. uninitialized) int variable i. If we want to eliminate undefined behavior from our code, we should make rules that discourage default-initialization of int variables; and if we want to eliminate undefined behavior from generic code, we should make rules that discourage the default-initialization of T variables.

“Modern” user-defined types should not provide zero-argument construction

Let’s look at that playing_card example again:

struct playing_card {
enum face_kind { Jack=11, Queen=12, King=13 };
enum suit_kind { Clubs, Diamonds, Hearts, Spades } suit;
int rank;
explicit playing_card() = default;
explicit playing_card(int v, suit_kind s) : suit(s), rank(v) {}
friend bool operator==(playing_card a, playing_card b) {
return a.suit == b.suit && a.rank == b.rank;
}
friend bool operator!=(playing_card a, playing_card b) {
return !(a == b);
}
};


This seems like a very clean and “Regular” playing_card (minus the ordering requirement, which we ditched). But this playing_card has subtle traps laid for the unwary!

bool guess(const playing_card& card) {
return card == playing_card(7, Diamonds);
}


This guess function looks nice and safe, right? You give it a card, and it tells you whether your card is the seven of diamonds. What are some ways we could “break” this function and cause it to do unexpected things?

Obviously we could pass in something that is not actually a playing card…

int garbage = 42;
guess(reinterpret_cast<const playing_card&>(garbage));


But this is clearly violating the “rules” of C++. The language’s type-system will prevent us from doing this by accident. Okay, well, we could pass in something of the right type but which has bad behavior at runtime…

playing_card *p = nullptr;
guess(*p);


This is even more obviously undefined behavior. The language’s rules won’t save us at compile time, but we’ve used C++ long enough to know that this code is not quite kosher, either.

playing_card pc;
guess(pc);


This code also has undefined behavior! And this time, it’s not because the programmer is doing something sneaky. It’s the fault of the playing_card class, which is “doing as the ints do” by providing a default constructor that leaves all its fields uninitialized.

We could eliminate this runtime undefined behavior by giving playing_card a “safer” default constructor:

- explicit playing_card() = default;
+ explicit playing_card() : suit(Clubs), rank(2) {}


Now our default-constructed pc will get the value “2 of Clubs”, which means that guess(pc) will have well-defined behavior at runtime.

However, I claim that we should do better than that. We should omit the zero-argument constructor from this class! The zero-argument constructor gives us something like “a playing card with the default value,” but that’s not a thing that makes sense in our domain. Playing cards always have specific suits and ranks. There’s no such thing as a “default” playing card — we can’t pull a card out of thin air! So, just get rid of that constructor. Then we don’t have to worry about avoiding unwanted behavior at runtime; our troublesome code will be rejected by the compiler automatically!

error: no matching constructor for initialization of 'playing_card'
playing_card pc;
^


Zero-argument constructors in the STL

In the standard library, most types do have zero-argument constructors. Even a lot of types you would naturally expect not to be default-constructible:

• std::regex — Can’t make a regex object without a regular expression, right?
• std::unique_lock — Has to be associated with a mutex at construction time, right?
• std::chrono::seconds — Has to be an actual number of seconds, right?

Nope! All implicitly constructible from zero arguments!

std::chrono::seconds default-constructs to garbage, “as the ints do.”

unique_lock and regex are examples of a common pattern. Both are move-enabled types where, for efficiency, the move constructor needs to steal the guts of the right-hand operand. Which means the right-hand operand will be left “gutless”. Which means we need to be able to deal with this “gutless” state; which means we might as well create a default constructor that puts the object into the “gutless” state. I mean we’re not losing anything by doing so (except perhaps the chance to catch some bugs).

class S {
std::mutex m;
int value;
public:
void increment() { std::unique_lock<std::mutex> lk; ++value; }
int get_value() const { std::unique_lock<std::mutex> lk; return value; }
};

// Did you spot the bug?


On the other side of the fence, some types with no zero-argument constructor include:

• lock_guard
• unsynchronized_pool_resource
• regex_error
• filesystem_error

I wish I could think of better examples. lock_guard and unsynchronized_pool_resource are immobile; regex_error and filesystem_error are exception types which means they can’t really be used as examples of “modern C++” style.

However, considering our original playing_card example, and our intuition about types like regex, I’d like to stick with my claim that modern C++ value types generally shouldn’t provide zero-argument constructors.

The STL doesn’t ever need default-constructibility for anything. We don’t even default-construct iterator or pointer types, which are probably the cases where default-constructibility makes the most intuitive sense. Nobody writes… well… heck, I can’t even come up with a plausible strawman here.

template<class InputIt, class T>
InputIt find(InputIt first, InputIt last, const T& value)
{
InputIt result;  // zero-argument constructor called!
for ( ; first != last; ++first) {
if (*first == value) {
result = first;
break;
}
}
return (first == last) ? last : result;
}


No STL algorithm looks like that! It’s just ridiculous. We don’t default-construct things in generic code. And we don’t default-construct things in performant code; notice that the above strawman is basically using a default-constructed iterator as a very poor man’s optional (with a redundant test at the bottom of the function). What we meant was

template<class InputIt, class T>
InputIt find(InputIt first, InputIt last, const T& value)
{
InputIt result = last;
for ( ; first != last; ++first) {
if (*first == value) {
result = first;
break;
}
}
return result;
}


or just, you know,

template<class InputIt, class T>
InputIt find(InputIt first, InputIt last, const T& value)
{
for ( ; first != last; ++first)
if (*first == value) return first;
return last;
}


Not a default-construction in sight.

So why did Stepanov include zero-argument-constructibility in “Regular”?

Many thanks to John Lakos for pointing me to his February 2015 interview with Alexander Stepanov and Daniel Rose, where Stepanov answers this exact question:

The role of a default constructor is that it constructs the object so it can be assigned to or destroyed. Nothing more. EoP calls these “partially formed” objects. Default construction does not guarantee that the initial value is meaningful, or even the same across invocations. Of course, if you write your own default constructor for your own class, you may choose to initialize it with a useful value, but this is not part of the requirement.

In particular, I want a programmer to be able to write

T a;  // value of a is undefined
if (test) a = T(1);
else a = T(2);


This is a very common case, and I don’t want a programmer to struggle with conditional expressions in initialization. Moreover, the idea of partially formed objects is an important one. One could, for example, have an array of partially formed objects generated by a default constructor if one writes

T a[5];  // values of a are undefined


So far, I never encountered a case where it caused difficulty, so I do not see a need for a special concept where default constructor is not available.

Notice that this text is from the year 2015, but it’s describing a distinctly “pre-modern” view of C++. In this decade, I hope that working C++ programmers do not struggle to write

T a(test ? 1 : 2);  // value of a is never remotely questionable


The array case is more thought-provoking. We have to use our imagination to come up with a reason for wanting five T objects in a row like that. That is, we need to exclude possible solutions such as

std::vector<T> a;
for (int i=0; i < 5; ++i)
a.push_back(some_actual_value());


or

T a[5] = { 10, 20, 30, 40, 50 };


Well, maybe we’re simultaneously kicking off five numbered processes that each return a T, and we need places to store their results as they come in, potentially out of order.

T a[5];  // values of a are undefined
for (int i=0; i < 5; ++i) {
async_launch(
some_process, i,
[](auto result) { a[i] = result; }
);
}
wait_for_all();
return reduce(a, a+5);


We could of course write

std::optional<T> a[5];  // all default-constructed to "nullopt"


but that will cost us an extra something-like-40 bytes. And we could of course write

alignas(T) char a[5][sizeof(T)];
for (int i=0; i < 5; ++i) {
async_launch(
some_process, i,
[](auto result) { ::new ((void*)&a[i]) T(result); }
);
}
wait_for_all();
return reduce((T*)a, (T*)(a+5));


except that that’s ridiculous. (Note in passing that we cannot use aligned_union_t<1, T> here; while C++17 guarantees that sizeof(aligned_union_t<1, T>) is greater than or equal to sizeof(T), it may not be exactly equal.)

My alternative solutions here are not very good. Does this mean that Stepanov was right — that most types should be default-constructible, so as to permit arrays like this?

I don’t think so. I think the more likely conclusion is that generic code does not, and should not, create arrays like this. By avoiding such operations, we enable our generic algorithms to work even with types (like playing_card and regex_error) which are value-semantic but deliberately not default-constructible.

One more old-school rationale for default-constructibility

Before I read Stepanov’s interview quoted above, my wild guess as to what he was thinking was: hey, it’s 1993! Look at the 1990s design of iostreams: idiomatic use of std::cin requires that you construct “uninitialized” objects all over the place.

std::string get_user_input() {
std::string input;  // zero-argument construction!
std::cin >> input;
return input;
}

int main() {
std::string name = get_user_input();
std::cout << "Hello, " << name << "!" << std::endl;
}


Since it’s 1993, we don’t get move semantics on that return. In fact, our compiler probably doesn’t even implement copy elision, which became widespread only around 2003 according to my extremely cursory research (namely, a blog post by Stan Lippman from 2004-02-03). So what we really write is:

void get_user_input(std::string& input) {
std::cin >> input;
}

int main() {
std::string name;  // zero-argument construction!
get_user_input(name);
std::cout << "Hello, " << name << "!" << std::endl;
}


In this primordial-soup environment, full of uninitialized variables and two-phase initializations, it might really have seemed “natural” to make sure that every type had a cheap default constructor.

But these days, “modern” days, we don’t write code like that anymore. So I think modern programmers can deal with the idea of types that aren’t default-constructible.

Should Regular not imply default constructibility?

Casey Carter’s P0898R1 “Standard Library Concepts” proposes the following hierarchy of concepts for C++2a:

template<class T>
concept Regular = Semiregular<T> && EqualityComparable<T>;

template<class T>
concept Semiregular = Copyable<T> && DefaultConstructible<T>;

template<class T>
concept Copyable = CopyConstructible<T> && Movable<T> && Assignable<T&, const T&>;

template<class T>
concept Movable = is_object_v<T> && MoveConstructible<T> && Assignable<T&, T> && Swappable<T>;

// ...

template <class T, class... Args>
concept Constructible = Destructible<T> && is_constructible_v<T, Args...>;


This is fine so far (well, did you notice that Constructible implies Destructible?), but I am mildly irked that the nicest name — Regular — is occupied by the 1990s-era concept that I consider undesirable, and that the most intuitively desirable concept for building other value-semantic concepts — that is, Copyable<T> && EqualityComparable<T> — doesn’t get a special name at all.

In practice this might not end up mattering much. It does occur to me that most STL algorithms won’t constrain on Regular; they’ll constrain on very free-reined concepts like EqualityComparableWith<U::value_type> and Predicate<T, U>. But as the algorithms get “higher-level”, the risk increases of a pesky DefaultConstructible sneaking in where it’s unwanted. For example, in N3351 “A Concept Design for the STL” (2012), it was proposed that std::sort ought only to sort Sortable objects, where Sortable implied Semiregular implied DefaultConstructible. However, obviously it is possible to sort an array without ever creating an uninitialized object of the array’s element type! It would be very sad if C++ evolved to disallow this:

struct boxed_integer {
int value;
constexpr boxed_integer(int v) : value(v) {}
constexpr auto operator<=>(const boxed_integer&) = default;
};

int main() {
std::vector<boxed_integer> vec = { 3, 1, 4, 2 };
conceptified::sort(vec);
}


Fortunately, the range-v3 library seems to get this example right. range-v3’s Sortable concept does not require that the element type be DefaultConstructible, or in fact even Copyable; Movable is enough.

TLDR

To the extent that Regular is intended as “what we expect from a well-behaved value type”, it should not imply DefaultConstructible. Or, vice versa, if “what we expect from a well-behaved value type” is essentially Copyable && EqualityComparable, it would be really nice to have a short clean name for that concept. I weakly propose the short clean name “Regular”!

Posted 2018-05-10