Is your constructor an object-factory or a type-conversion?

I’ve been meaning to write this one for a while now, but I keep putting it off because it’s not quite fully baked. However, if I never write it down, it’ll never get baked. So here it is.

The C++ language takes the very dissimilar notions of object factories and type conversions and conflates them into a single (and quite revolutionary) notion, which it calls the constructor. Consider the following examples, for which we will pick on std::vector because it’s an easy target:

std::list<char> lst = ...;

std::vector<char> vec1;
std::vector<char> vec2(5);
std::vector<char> vec3(5, 'a');
std::vector<char> vec4(lst.begin(), lst.end());
std::vector<char> vec5 { 'a', 'b', 'c' };
std::vector<char> vec6(lst);
std::vector<char> vec7(vec1);

(Okay, vec6 won’t compile — you can’t convert a list to a vector like that — but let’s pretend it does, for the sake of this blog post.)

Because of the rules of C++, each of these examples can be rewritten like this:

auto vec1 = std::vector<char>();
auto vec2 = std::vector<char>(5);
auto vec3 = std::vector<char>(5, 'a');
auto vec4 = std::vector<char>(lst.begin(), lst.end());
auto vec5 = std::vector<char>{ 'a', 'b', 'c' };
auto vec6 = std::vector<char>(lst);
auto vec7 = std::vector<char>(vec1);

And four of them — the three that take only a single argument, plus the one that secretly takes an initializer_list — can be rewritten like this:

auto vec2 = static_cast<std::vector<char>>(5);
auto vec5 = static_cast<std::vector<char>>(std::initializer_list<char>{ 'a', 'b', 'c' });
auto vec6 = static_cast<std::vector<char>>(lst);
auto vec7 = static_cast<std::vector<char>>(vec1);

This apparent uniformity belies the fact that we have several very different abstract ideas that we’re trying to express here!

vec1 expresses the idea of “object construction”: “Just give me a new instance of this type so that I can mutate its state.” (I’m not a fan of this one.)

vec2, vec3, and vec4 express varying kinds of “object factory”: “Take these inputs and apply some not-necessarily-simple procedure to produce an output of type vector<int>.” vec5 expresses a particularly useful kind of factory, specific to containers, that takes an explicit sequence of values and builds a container to hold them. (One could argue that vec1 expresses the “object factory” idea as well, but my impression is that when we use the zero-argument constructor, we generally don’t care about the value that we get out; we’re going to overwrite or mutate it pretty quickly. So for now I think the zero-argument constructor is a singular special case.)

Finally, vec6 expresses a new idea: type conversion. We have a value (such as “the sequence a, b, c”) represented in one C++ data type, and we’re saying, “Take this operand and change its type without changing its value. Give me the same value, but represented as a vector<int>.” vec7 expresses the idea of copying, which I think can be viewed as a special case of type conversion; it just happens that the source type and the destination type are exactly the same. Notice also that vec5 can be viewed either as a factory (“Take this sequence of inputs and put them into a container”) or as a type conversion (“Take the value represented by this initializer_list and give me the same value but as a container instead.”)

I think that a perfect programming language would have different syntax for these different notions. C++ smushes both notions together into the weird beast it calls a “constructor.” Each C++ constructor is either an object factory (the sadly usual case) or a type conversion (that is, a “converting constructor”).

When we use a C++ library API, we should strive to make our own code reflect these notions accurately. Consider these obviously “bad” lines of code:

auto v = static_cast<std::vector<char>>(5);  // A
auto p = std::unique_ptr<Widget>(new Widget);  // B
return std::string();  // C

Line A uses “type conversion” notation to invoke a one-argument “object factory” constructor. Line B uses “type conversion” notation when a zero-argument “object factory” function would be more appropriate. Line C uses the singular zero-argument constructor when a one-argument “type conversion” would be more appropriate. Rewritten as “good” lines of code:

std::vector<char> v(5);  // A
auto p = std::make_unique<Widget>();  // B
return "";  // C

If I had control over the STL’s API, I wouldn’t allow line A to compile. In modern C++, there’s no reason for a function with factory semantics to be expressed as an overload of the constructor. We should just write a factory function instead!

auto v = std::vector<char>::with_size(5);

But since I don’t control vector’s API, I’ll settle for making sure my code always uses the “good” version of line A, and never the “bad” version.

We can imagine rewriting all of the problematic constructors above into named factory functions:

std::vector<char> vec1;
auto vec2 = std::vector<char>::with_size(5);
auto vec3 = std::vector<char>::with_repetitions_of(5, 'a');
auto vec4 = std::vector<char>::from_range(lst.begin(), lst.end());
auto vec5 = std::vector<char>::from_sequence('a', 'b', 'c');
auto vec6 = static_cast<std::vector<char>>(lst);
auto vec7 = static_cast<std::vector<char>>(vec1);

Now each line expresses its intent pretty clearly. Type-conversions are expressed with static_cast. Object factories are expressed with named functions. As a bonus, the factories’ names clearly describe what they do. vector<int>::with_repetitions_of(1, 2) can no longer be accidentally confused with vector<int>::from_sequence(1, 2). vector<char>::from_sequence("hello", "world") fails to compile, and vector<char>::from_range("hello", "world") stands out very clearly as a bug. All’s right with the world.

Why doesn’t the STL do this?

Back in C++98, when the STL was designed, you couldn’t make a named function that returned an object without risking expensive copy operations. But in the years since then we’ve gotten copy elision, move semantics, and (in C++17) guaranteed copy elision in cases such as the above. So there are no longer any good reasons for people to be writing factory functions as constructor overloads.

There is still at least one bad reason, though, as explained in my previous posts “The Superconstructing Super Elider” and “The Superconstructing Super Elider, Round 2”. In C++, constructors are privileged above all other functions, because the C++ standard library provides a lot of functionality that works with constructors but not with named functions. Consider:

vec.emplace_back(x, y, z);

This inserts a new element at the back of vec; and it does so by calling one of the element type’s constructors. Not a factory function, or a named member function, or any other kind of function: it specifically wants to call some overload of the constructor. So if you write

vec.emplace_back(Widget::from_inputs(x, y, z));

you’ll first create a Widget object from the inputs x, y, z, and then the library will call some constructor (even though it’s just Widget’s move-constructor) to insert that Widget into the vector. “The Superconstructing Super Elider, Round 2” provides a hacky way to work around this inefficiency. In a perfect language, there would be no inefficiency to be worked around.

I don’t think the C++ standard library will ever do much better; it’s pretty fundamentally based on the constructor as its spiritual center, and I can’t imagine that changing. As they say, “If I wanted to get there, I wouldn’t start from here!” Besides, C++ puts a high value on backwards compatibility, which means that even if the Committee suddenly wanted to abandon the “everything is a constructor” model, they wouldn’t practically be able to do so. So the designers of the standard library will continue wrangling for each new library type — over which constructors should be provided, which ones should be explicit or conditionally explicit, which ones should take tag parameters, whether it’s ever okay to =delete constructors (hint: it’s not), and so on and so forth.

But if you’re writing a new API in C++, one not targeted at getting into the ISO Standard, I can give you no better advice than:

Write constructors only for type-conversions. For object-factories, prefer named functions.

Posted 2018-06-21