1. Introduction
[C11] Integer types allows three representations for signed integral types:
-
Sign-magnitude
-
Ones' complement
-
Two’s complement
See §4 C Signed Integer Wording for full wording.
C++ inherits these three signed integer representations from C. To the author’s knowledge no modern machine uses both C++ and a signed integer representation other than two’s complement (see §5 Survey of Signed Integer Representations). None of [MSVC], [GCC], and [LLVM] support other representations. This means that the C++ that is taught is effectively two’s complement, and the C++ that is written is two’s complement. It is extremely unlikely that there exist any significant code base developed for two’s complement machines that would actually work when run on a non-two’s complement machine.
The C++ that is spec’d, however, is not two’s complement. Signed integers currently allow for trap representations, extra padding bits, integral negative zero, and introduce undefined behavior and implementation-defined behavior for the sake of this extremely abstract machine.
Specifically, the current wording has the following effects.
Arthur proposes to fix the problems which are not struck out.
-
Associativity and commutativity of integers is needlessly obtuse. -
Naïve overflow checks, which are often security-critical, often get eliminated by compilers. This leads to exploitable code when the intent was clearly not to and the code, while naïve, was correctly performing security checks for two’s complement integers. Correct overflow checks are difficult to write and equally difficult to read, exponentially so in generic code. -
Conversions between signed and unsigned types are implementation-defined or worse.
-
There is no portable way to generate an arithmetic right-shift, or to sign-extend an integer, which every modern CPU supports.
-
constexpr
is further restrained by this extraneous undefined behavior.
Notice that atomic integral types are already two’s complement and have no undefined results; therefore even freestanding implementations must already support two’s complement somehow.
Users of C++ who require sign-magnitude or ones'-complement integers would be better served by a pure-library solution (and so would the rest of us).
This proposal leaves C unchanged; it merely restricts further the subset of C which applies to C++. The author will ensure that WG14 is made aware of this paper’s outcome.
The proposal you are now reading is intended to have the following specific effects on integral conversions and integral bit-level operations, where the bit-level representation of integers is relevant:
Expression | Current C++ | Proposed C++ | Relevant Wording |
---|---|---|---|
int8_t(uint8_t(255)) | Implementation-defined value | -1 | [conv.integral] |
int8_t(uint16_t(255)) | Implementation-defined value | -1 | [conv.integral] |
int8_t(uint16_t(511)) | Implementation-defined value | -1 | [conv.integral] |
int8_t(0xFF) | Implementation-defined value | -1 | [conv.integral] |
int64_t(-9223372036854775808) where the literal's type is uint64_t | Implementation-defined value | LLONG_MIN | [conv.integral] |
int64_t(-9223372036854775808) where the literal's type is __int128 | LLONG_MIN | LLONG_MIN | [conv.integral] |
int8_t(e255) where enum class E { e255=255 }; | Unspecified value | -1 | [expr.static.cast] |
E(255) where enum E : int8_t {} | Implementation-defined value | E(-1) | [expr.static.cast] |
E(255) where enum E { eMIN=-127, eMAX=128 } | Undefined behavior | E(-1) | [expr.static.cast] |
(1 << 31) | Implementation-defined value | INT_MIN | [expr.shift] |
(2 << 31) | Undefined behavior | 0 | [expr.shift] |
(-1 << 1) | Undefined behavior | -2 | [expr.shift] |
(-2 >> 1) | Implementation-defined value | -1 | [expr.shift] |
The proposal is intended to have the following specific non-effects on integral arithmetic operations and on overwide bit-shifts:
Expression | Current C++ | This Proposal | Under P0907R0 |
---|---|---|---|
INT_MAX + 1 | Undefined behavior | Undefined behavior | INT_MIN |
1000'000 * 1000'000 | Undefined behavior | Undefined behavior | -727379968
(assuming 32-bit int ) |
uint16_t(65535) * uint16_t(65535) | Undefined behavior | Undefined behavior | -131071
(assuming 32-bit int ) |
INT_MIN / -1 | Undefined behavior | Undefined behavior | Undefined behavior |
INT_MIN % -1 | Undefined behavior | Undefined behavior | Undefined behavior |
(1 << 32) | Undefined behavior | Undefined behavior | Undefined behavior |
(1 << -1) | Undefined behavior | Undefined behavior | Undefined behavior |
(1 << 32)
and even (1 << -1)
to produce
unspecified values, rather than undefined behavior; but the proposal you are now reading does not attempt to
propose that change.)
2. Proposed Wording
Modify Fundamental types [basic.fundamental] ❡1:
Objects declared as characters (char
) shall be large enough to store any member of the implementation’s basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether achar
object can hold negative values. Characters can be explicitly declaredunsigned
orsigned
. Plainchar
,signed char
, andunsigned char
are three distinct types, collectively called narrow character types. Achar
, asigned char
, and anunsigned char
occupy the same amount of storage and have the same alignment requirements (6.11); that is, they have the same object representation. For narrow character types, all bits of the object representation participate in the value representation. [Note: A bit-field of narrow character type whose length is larger than the number of bits in the object representation of that type has padding bits; see 6.9. —end note]For unsigned narrow character types, each possible bit pattern of the value representation represents a distinct number. These requirements do not hold for other types.In any particular implementation, a plainchar
object can take on either the same values as asigned char
or anunsigned char
; which one is implementation-defined. For each value i of typeunsigned char
in the range 0 to 255 inclusive, there exists a value j of typechar
such that the result of an integral conversion (7.8) from i tochar
is j, and the result of an integral conversion from j tounsigned char
is i.
Modify Fundamental types [basic.fundamental] ❡4 onwards:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.fn For unsigned integral types, each possible bit pattern of the value representation represents a distinct number.
[Footnote: This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type. —footnote]
Type
wchar_t
is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales. Typewchar_t
shall have the same size, signedness, and alignment requirements as one of the other integral types, called its underlying type. Typeschar16_t
andchar32_t
denote distinct types with the same size, signedness, and alignment asuint_least16_t
anduint_least32_t
, respectively, in<cstdint>
, called the underlying types.Values of type
bool
are eithertrue
orfalse
. [Note: There are nosigned
,unsigned
,short
, orlong bool
types or values. —end note] Values of typebool
participate in integral promotions.Types
bool
,char
,char16_t
,char32_t
,wchar_t
, and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type. The representations of integral types shall define values by use of apure binary numeration systemfntwo's complement representation.
[Footnote: A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.) —footnote][Note: Addition, subtraction, and multiplication on signed and unsigned integral values with the same object representations, when the signed result is defined and representable, produce results with the same object representation. —end note]
Modify Integral conversions [conv.integral] ❡1 onwards:
A prvalue of an integer type can be converted to a prvalue of another integer type. A prvalue of an unscoped enumeration type can be converted to a prvalue of an integer type.
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [Note:
In a two’s complement representation, thisThis conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note]If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise,
the value is implementation-defined.the object representation remains the same if the source and destination have the same size, or the least-significant source bits are retained if the destination is smaller than the source.
Modify Static cast [expr.static.cast] ❡1 onwards:
A value of a scoped enumeration type can be explicitly converted to an integral type. When that type is cv
bool
, the resulting value isfalse
if the original value is zero andtrue
for all other values. For the remaining integral types, the value is unchanged if the original value can be represented by the specified type. Otherwise, theresulting value is unspecifiedthe object representation remains the same if the source and destination have the same size, or the least-significant source bits are retained if the destination is smaller than the source. A value of a scoped enumeration type can also be explicitly converted to a floating-point type; the result is the same as that of converting from the original value to the floating-point type.A value of integral or enumeration type can be explicitly converted to a complete enumeration type. If the enumeration type has a fixed underlying type, the value is first converted to that type by integral conversion, if necessary, and then to the enumeration type. If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values, and otherwise,
the behavior is undefinedthe object representation remains the same if the source and destination have the same size, or the least-significant source bits are retained if the destination is smaller than the source. A value of floating-point type can also be explicitly converted to an enumeration type. The resulting value is the same as converting the original value to the underlying type of the enumeration (conv.fpint), and subsequently to the enumeration type.
Modify Shift operators [expr.shift] ❡1 onwards:
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
The value of
E1 << E2
isE1
left-shiftedE2
bit positions; vacated bits are zero-filled. IfE1
has an unsigned type, the value of the result is E1×2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, ifE1
has a signed typeand non-negative value,and E1×2E2 is representable in the corresponding unsigned type of the result type, thenthat value,the value ofE1
is converted to the corresponding unsigned type of the result type; an unsigned result is computed using unsigned arithmetic as above; and the unsigned result, converted to the result type, is the resulting value; otherwise, the behavior is undefined.The value of
E1 >> E2
isE1
right-shiftedE2
bit positions. IfE1
has an unsigned type or ifE1
has a signed type and a non-negative value, the value of the result is the integral part of the quotient of E1/2E2. IfE1
has a signed type and a negative value, the resulting valueis implementation-defined.the negative of the integral part of the quotient of E1/2E2. [Note: This implies that right-shift on signed integral types is an arithmetic right shift, and performs sign-extension. —end note]
Modify Enumeration declarations [dcl.enum] ❡8:
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, for an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values in the range bmin to bmax, defined as follows:
Let K be 1 for a two’s complement representation and 0 for a ones' complement or sign-magnitude representation.bmax is the smallest value greater than or equal to max(|emin| -K1,|emax|) and equal to 2M-1, where M is a non-negative integer. bmin is zero if emin is non-negative and -(bmax+K1) otherwise. The size of the smallest bit-field large enough to hold all the values of the enumeration type is max(M,1) if bmin is zero and M+1 otherwise. It is possible to define an enumeration that has values not defined by any of its enumerators. If the enumerator-list is empty, the values of the enumeration are as if the enumeration had a single enumerator with value 0.
Modify Class template ratio
[ratio.ratio] ❡1:
If the template argument
D
is zero or the absolute values of either of the template argumentsN
andD
is not representable by typeintmax_t
, the program is ill-formed. [Note: These rules ensure that infinite ratios are avoided and that for any negative input, there exists a representable value of its absolute value which is positive.In a two’s complement representation, tThis excludes the most negative value. —end note]
3. Out of Scope
This proposal focuses on the representation of signed integers. It is out of scope for this proposal to deal with related issues which have more to them than simply the representation of signed integers.
A non-comprehensive list of items purposely left out:
-
Left and right shift with a right-hand-side equal to or wider than the bit-width of the left-hand-side. (This must remain implementation-defined or worse, because of x86's behavior.)
-
Integral division or modulo by zero.
-
Integral division or modulo of the signed minimum integral value for a particular integral type by minus one.
-
Overflow of pointer arithmetic.
-
Library solution for ones'-complement integers.
-
Library solution for sign-magnitude integers.
-
Library solution for two’s-complement integers with trapping or undefined overflow sepantics.
-
Language support for explicit signed overflow truncation such as Swift’s (
&+
,&-
, and&*
), or complementary trapping overflow operators. -
Library or language support for saturating arithmetic.
-
Mechanism to let the compiler assume that integers, signed or unsigned, do not experience signed or unsigned wrapping for:
-
A specific integral variable.
-
All integral variables (à la
-ftrapv
,-fno-wrapv
, and-fstrict-overflow
). -
A specific loop’s induction variable.
-
-
Mechanism to have the compiler list places where it could benefit from knowing that overflow cannot occur (à la
-Wstrict-overflow
). -
Endianness of integral storage (or endianness in general).
-
Bits per bytes, though we all know there are eight.
These items could be tackled in separate proposals.
3a. Arthur's Comments
This proposal is merely pruned down from JF Bastien's P0907R0, based on the work JF did to identify wording in need of changing. Arthur has not done his own independent review of the entire Standard document.
Arthur noticed that the current wording in [dcl.enum] appears wrong for negative enum values; this has already been filed as [CWG1636]. This proposal includes that (practically editorial) fix, which has already been adopted in practice by every compiler vendor.
A non-comprehensive list of related items purposely left out of this paper, yet on which Arthur happens to have reasonably strongly held opinions that may or may not agree with the reader's:
-
Atomic integral types should not be a special case; signed atomic integral types should preferably have "native" overflow behavior, not wrapping two's-complement behavior. However, Arthur doesn't know the rationale behind the current behavior of atomic.
-
Left and right shift with a right-hand-side equal to or wider than the bit-width of the left-hand-side should produce an "unspecified result", rather than having completely undefined behavior (or even unspecified behavior).
-
Left and right shift with a negative right-hand-side should produce an "unspecified result", rather than having completely undefined behavior (or even unspecified behavior).
-
Bits per byte should be fixed at eight.
-
The object representation of a signed integral type should not be allowed to have visible padding bits. (Implementations should remain free to use padding or parity bits "behind the scenes", but these bits if present should not interact with the representation of integral types according to the C++ abstract machine.)
4. C Signed Integer Wording
The following is the wording on integers from the C11 Standard.
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits;
signed char
shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value −(2M) (two’s complement);
the sign bit has the value −(2M − 1) (ones’ complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.
If the implementation supports negative zeros, they shall be generated only by:
the
&
,|
,^
,~
,<<
, and>>
operators with operands that produce such a value;the
+
,-
,*
,/
, and%
operators where one operand is a negative zero and the result is zero;compound assignment operators based on the above cases.
It is unspecified whether these cases actually generate a negative zero or a normal zero, and whether a negative zero becomes a normal zero when stored in an object.
If the implementation does not support negative zeros, the behavior of the
&
,|
,^
,~
,<<
, and>>
operators with operands that would produce such a value is undefined.The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
The precision of an integer type is the number of bits it uses to represent values, excluding any sign and padding bits. The width of an integer type is the same but including any sign bit; thus for unsigned integer types the two values are the same, while for signed integer types the width is one greater than the precision.
5. Survey of Signed Integer Representations
Here is a non-comprehensive history of signed integer representations:
-
Two’s complement
-
John von Neumann suggested use of two’s complement binary representation in his 1945 First Draft of a Report on the EDVAC proposal for an electronic stored-program digital computer.
-
The 1949 EDSAC, which was inspired by the First Draft, used two’s complement representation of binary numbers.
-
Early commercial two’s complement computers include the Digital Equipment Corporation PDP-5 and the 1963 PDP-6.
-
The System/360, introduced in 1964 by IBM, then the dominant player in the computer industry, made two’s complement the most widely used binary representation in the computer industry.
-
The first minicomputer, the PDP-8 introduced in 1965, uses two’s complement arithmetic as do the 1969 Data General Nova, the 1970 PDP-11.
-
-
Ones' complement
-
Many early computers, including the CDC 6600, the LINC, the PDP-1, and the UNIVAC 1107.
-
Successors of the CDC 6600 continued to use ones' complement until the late 1980s.
-
Descendants of the UNIVAC 1107, the UNIVAC 1100/2200 series, continue to do so, although ClearPath machines are a common platform that implement either the 1100/2200 architecture (the ClearPath IX series) or the Burroughs large systems architecture (the ClearPath NX series). Everything is common except the actual CPUs, which are implemented as ASICs. In addition to the IX (1100/2200) CPUs and the NX (Burroughs large systems) CPU, the architecture had Xeon (and briefly Itanium) CPUs. Unisys' goal was to provide an orderly transition for their 1100/2200 customers to a more modern architecture.
-
-
Sign-magnitude
-
The IBM 700/7000 series scientific machines use sign/magnitude notation, except for the index registers which are two’s complement.
-
Wikipedia offers more details and has comprehensive sources for the above.
In short, the only machine the author could find using non-two’s complement are made by Unisys. Nowadays they emulate their old architecture using x86 CPUs for customers who have legacy applications which they’ve been unable to migrate. These applications are unlikely to be well served by modern C++, signed integers are the least of their problem. Post-modern C++ should focus on serving its existing users well, and incoming users should be blissfully unaware of integer esoterica.