Is your constructor an object-factory or a type-conversion?
I’ve been meaning to write this one for a while now, but I keep putting it off because it’s not quite fully baked. However, if I never write it down, it’ll never get baked. So here it is.
The C++ language takes the very dissimilar notions of object factories and
type conversions and conflates them into a single (and quite revolutionary) notion,
which it calls the constructor. Consider the following examples,
for which we will pick on std::vector
because it’s an easy target:
std::list<char> lst = ...;
std::vector<char> vec1;
std::vector<char> vec2(5);
std::vector<char> vec3(5, 'a');
std::vector<char> vec4(lst.begin(), lst.end());
std::vector<char> vec5 { 'a', 'b', 'c' };
std::vector<char> vec6(lst);
std::vector<char> vec7(vec1);
(Okay, vec6
won’t compile — you can’t convert a list
to a vector
like that —
but let’s pretend it does, for the sake of this blog post.)
Because of the rules of C++, each of these examples can be rewritten like this:
auto vec1 = std::vector<char>();
auto vec2 = std::vector<char>(5);
auto vec3 = std::vector<char>(5, 'a');
auto vec4 = std::vector<char>(lst.begin(), lst.end());
auto vec5 = std::vector<char>{ 'a', 'b', 'c' };
auto vec6 = std::vector<char>(lst);
auto vec7 = std::vector<char>(vec1);
And four of them — the three that take only a single argument, plus the one that secretly
takes an initializer_list
— can be rewritten like this:
auto vec2 = static_cast<std::vector<char>>(5);
auto vec5 = static_cast<std::vector<char>>(std::initializer_list<char>{ 'a', 'b', 'c' });
auto vec6 = static_cast<std::vector<char>>(lst);
auto vec7 = static_cast<std::vector<char>>(vec1);
This apparent uniformity belies the fact that we have several very different abstract ideas that we’re trying to express here!
vec1
expresses the idea of “object construction”: “Just give me a new instance of this type
so that I can mutate its state.”
(I’m not a fan of this one.)
vec2
, vec3
, and vec4
express varying kinds of “object factory”: “Take these inputs and apply
some not-necessarily-simple procedure to produce an output of type vector<int>
.”
vec5
expresses a particularly useful kind of factory, specific to containers, that takes an
explicit sequence of values and builds a container to hold them. (One could argue that
vec1
expresses the “object factory” idea as well, but my impression is that when we use
the zero-argument constructor, we generally don’t care about the value that we get out; we’re going
to overwrite or mutate it pretty quickly. So for now I think the zero-argument constructor
is a singular special case.)
Finally, vec6
expresses a new idea: type conversion. We have a value (such as “the sequence
a, b, c
”) represented in one C++ data type, and we’re saying, “Take this operand and change
its type without changing its value. Give me the same value, but represented as a vector<int>
.”
vec7
expresses the idea of copying, which I think can be viewed as a special case of type conversion;
it just happens that the source type and the destination type are exactly the same.
Notice also that vec5
can be viewed either as a factory (“Take this sequence of inputs and
put them into a container”) or as a type conversion (“Take the value represented by this
initializer_list
and give me the same value but as a container instead.”)
I think that a perfect programming language would have different syntax for these different notions. C++ smushes both notions together into the weird beast it calls a “constructor.” Each C++ constructor is either an object factory (the sadly usual case) or a type conversion (that is, a “converting constructor”).
When we use a C++ library API, we should strive to make our own code reflect these notions accurately. Consider these obviously “bad” lines of code:
auto v = static_cast<std::vector<char>>(5); // A
auto p = std::unique_ptr<Widget>(new Widget); // B
return std::string(); // C
Line A
uses “type conversion” notation to invoke a one-argument “object factory” constructor.
Line B
uses “type conversion” notation when a zero-argument “object factory” function would be more appropriate.
Line C
uses the singular zero-argument constructor when a one-argument “type conversion” would be more appropriate.
Rewritten as “good” lines of code:
std::vector<char> v(5); // A
auto p = std::make_unique<Widget>(); // B
return ""; // C
If I had control over the STL’s API, I wouldn’t allow line A
to compile. In modern C++, there’s
no reason for a function with factory semantics to be expressed as an overload of the constructor.
We should just write a factory function instead!
auto v = std::vector<char>::with_size(5);
But since I don’t control vector
’s API, I’ll settle for making sure my code always uses
the “good” version of line A
, and never the “bad” version.
We can imagine rewriting all of the problematic constructors above into named factory functions:
std::vector<char> vec1;
auto vec2 = std::vector<char>::with_size(5);
auto vec3 = std::vector<char>::with_repetitions_of(5, 'a');
auto vec4 = std::vector<char>::from_range(lst.begin(), lst.end());
auto vec5 = std::vector<char>::from_sequence('a', 'b', 'c');
auto vec6 = static_cast<std::vector<char>>(lst);
auto vec7 = static_cast<std::vector<char>>(vec1);
Now each line expresses its intent pretty clearly. Type-conversions are expressed with static_cast
.
Object factories are expressed with named functions. As a bonus, the factories’ names clearly describe
what they do. vector<int>::with_repetitions_of(1, 2)
can no longer be accidentally confused with
vector<int>::from_sequence(1, 2)
. vector<char>::from_sequence("hello", "world")
fails to
compile, and vector<char>::from_range("hello", "world")
stands out very clearly as a bug.
All’s right with the world.
Why doesn’t the STL do this?
Back in C++98, when the STL was designed, you couldn’t make a named function that returned an object without risking expensive copy operations. But in the years since then we’ve gotten copy elision, move semantics, and (in C++17) guaranteed copy elision in cases such as the above. So there are no longer any good reasons for people to be writing factory functions as constructor overloads.
There is still at least one bad reason, though, as explained in my previous posts “The Superconstructing Super Elider” and “The Superconstructing Super Elider, Round 2”. In C++, constructors are privileged above all other functions, because the C++ standard library provides a lot of functionality that works with constructors but not with named functions. Consider:
vec.emplace_back(x, y, z);
This inserts a new element at the back of vec
; and it does so by calling one of the element type’s
constructors. Not a factory function, or a named member function, or any other kind of function:
it specifically wants to call some overload of the constructor. So if you write
vec.emplace_back(Widget::from_inputs(x, y, z));
you’ll first create a Widget
object from the inputs x, y, z
, and then the library will call
some constructor (even though it’s just Widget
’s move-constructor) to insert that Widget
into the vector.
“The Superconstructing Super Elider, Round 2” provides
a hacky way to work around this inefficiency. In a perfect language, there would be no inefficiency
to be worked around.
I don’t think the C++ standard library will ever do much better; it’s pretty fundamentally based
on the constructor as its spiritual center, and I can’t imagine that changing.
As they say, “If I wanted to get there, I wouldn’t start from here!”
Besides, C++ puts a high value on backwards compatibility, which means that even if the Committee
suddenly wanted to abandon the “everything is a constructor” model, they wouldn’t practically
be able to do so. So the designers of the standard library will continue wrangling for each new
library type — over which constructors should be provided,
which ones should be explicit
or conditionally explicit
, which ones should
take tag parameters,
whether it’s ever okay to =delete
constructors
(hint: it’s not),
and so on and so forth.
But if you’re writing a new API in C++, one not targeted at getting into the ISO Standard, I can give you no better advice than:
Write constructors only for type-conversions. For object-factories, prefer named functions.