The “unsigned
for value range” antipattern
unsigned
for value range” antipatternBackground: Signed Integers Are (Not Yet) Two’s Complement
At the WG21 committee meeting which is currently underway in Jacksonville,
JF Bastien will be presenting a proposal
to make C++’s int
data type wrap around on overflow. That is, where today the expression INT_MAX + 1
has undefined behavior, JF would like to see that expression formally defined
to come out equal to INT_MIN
.
I have written, but not submitted, a “conservative” fork
of JF’s proposal, in which I eliminate the INT_MAX + 1 == INT_MIN
part but leave
some of the good stuff (such as -1 << 1 == -2
). I’m not going to talk about the
good stuff right now. (If you’re not a C++ compiler writer or committee member,
you probably assume you’re getting all of the good stuff already, and would be
surprised to learn how much of it is still undefined.)
Anyway. On a mailing list for WG21 Study Group 12, Lawrence Crowl writes
in defense of INT_MAX + 1 ==
undefined, and I agree with him:
If integer overflow is undefined behavior, then it is wrong. Tools can detect wrong programs and report them.
If integer overflow is wrapping, then one never knows whether or not the programmer is relying on wrapper or would be surprised at wrapping. No diagnostic is possible.
Another commenter in the same thread, Myriachan, gave the example of
uint16_t x = 0xFFFF; // 65535
x = (x * x);
In today’s C++, on “typical modern platforms” where int
is 32 bits,
the expression (x * x)
has undefined behavior.
This is because after integral promotion
promotes uint16_t
to int
, the result is equivalent to (int(65535) * int(65535))
,
and the product of 65535 with itself — that is, 4294836225 — is not representable in a signed int
.
So we have signed integer overflow and undefined behavior.
I can think of three ways to fix this:
-
Eliminate the integral promotions entirely.
x * x
becomes simplyuint16_t(4294836225)
, i.e.,uint16_t(1)
. -
Tweak the integral promotions so that they preserve signedness.
x * x
becomesunsigned(x) * unsigned(x)
, i.e.,4294836225u
. -
Adopt something like JF Bastien’s proposal to make integer overflow well-defined.
x * x
becomes well-defined and equal toint(-131071)
.
The “unsigned
for value range” antipattern
Lawrence wrote back:
So the application intended modular arithmetic? I was concerned about the normal case where
unsigned
is used to constrain the value range, not to get modular arithmetic.
Now, in my not-so-humble opinion, if anyone is using unsigned types “to constrain the value range,” they are doing computers wrong. That is not what signed versus unsigned types are for.
As Lawrence himself wrote:
If integer overflow is undefined behavior, then it is wrong. Tools can detect wrong programs and report them.
The contrapositive is: “If the programmer is using a type where integer overflow is well-defined to wrap, then we can assume that the program relies on that wrapping behavior” — because there would otherwise be a strong incentive for the programmer to use a type that detects and reports unintended overflow.
The original design for the STL contained the “unsigned for value range” antipattern.
Consequently, they ran into trouble immediately: for example, std::string::find
returns an index into the string, naturally of type std::string::size_type
.
But size_type
is unsigned! So instead of returning “negative 1” to indicate
the “not found” case, they had to make it return size_type(-1)
, a.k.a.
std::string::npos
— which is a positive value! This means that callers have
to write cumbersome things such as
if (s.find('k') != std::string::npos)
where it would be more natural to write
if (s.find('k') >= 0)
This is sort of parallel to my quotation of Lawrence above:
If every possible value in the domain of a given type is a valid output (e.g. from find
),
then there is no value left over with which the function can signal failure at runtime.
And if every possible value in the domain is a valid input (e.g. to malloc
),
then there is no way for the function to detect incorrect input at runtime.
If it weren’t for the STL’s size_type
snafu continually muddying the waters for
new learners, I doubt people would be falling into the “unsigned for value range”
antipattern anymore.
For more information on the undesirability of “unsigned for value range”
and the general desirability of “signed size_type
” going forward in C++,
see: