History of non-standard-layout class layouts

Thanks to Jody Hagins for inspiring this blog post, for digging up some old wordings, and for reviewing a draft of this post. Also thanks to Pal Balog for P1847; its historical details were indispensable in writing this post.

In C++, we have the notion of “standard layout.” A standard-layout class type is basically guaranteed to be laid out in memory the same way as a plain old C struct: the first data member goes at offset zero, and subsequent members are laid out at increasing addresses in declaration order (possibly with some gaps for padding).

struct A { int i; int j; } a;
struct B { int m; int n; };

Both C and C++ have always guaranteed that (int*)&a == &a.i, and even that offsetof(A, j) == offsetof(B, n). These guarantees are due to the fact that A is a standard-layout type that’s layout-compatible with B. The rules for what kinds of classes count as standard-layout have drifted a bit over the years (see cppreference), certainly anything that would have compiled in C89 will count as standard-layout in C++ forever.

However, there is a second layout guarantee in C++ that applies to all class types, regardless of whether they’re standard-layout or not! Consider the following class type (Godbolt):

struct C {
    int a;
    int b;
private:
    int c;
public:
    int d;
} c;
static_assert(&c.a < &c.b);

Ever since C++98, it’s been guaranteed that &c.a < &c.b. C++98’s [expr.rel] (page 86):

If two pointers point to nonstatic data members of the same object, or to subobjects or array elements of such members, recursively, the pointer to the later declared member compares greater provided the two members are not separated by an access-specifier label and provided their class is not a union.

This is consistent with the idea that a.i must precede a.j in the memory layout of the POD type A, but it’s even stronger — because it also governs the layout of non-POD types like C.

Notice that in C++98, there was no guarantee that &c.b < &c.c, nor that &c.b < &c.d, because those members’ declarations are separated by access-specifier labels. C++98 guaranteed no reordering within a single access specifier’s “block” of data members, but still permitted reordering among those “blocks.” For example, the compiler might (if it wished) lay out C’s members in the order a b d c (private members at the back); or c a b d; or even d c a b (blocks ordered back to front). A really evil implementation might even order them as a c d b, since that technically conforms to the guarantee.

Between C++98 and C++11, the notion of “POD type” was replaced with the conjunction of “standard-layout type” and “trivial type,” and in the process the layout guarantee was strengthened to enable the programmer to interrupt a run of public members with some private member functions, or vice versa, without breaking ABI. That is, C++11 forbid the compiler to swap two “blocks” of data members with the same access. Since C’s members a, b, d are all public, C++11 required the compiler to lay them out in exactly that order — but still didn’t restrict the placement of c relative to the public members.

Finally, in C++23, P1847R4 “Make declaration order mandated” removes all of the compiler’s freedom to reorder the layout of data members… at least, for data members declared directly within the same class. I believe (and Clang agrees) that the standard still says nothing about inheritance situations such as:

struct Base { int i; };
struct Derived : Base { int j; } d;
static_assert(&d.i < &d.j); // Not guaranteed even in C++23

Here’s a table summarizing the history of the standard’s layout guarantees for non-standard-layout types like C. Please keep in mind that this table’s “Allowed” column is 100% purely hypothetical — it’s what the DeathStation 9000 might have chosen to do. No actual shipping compiler has ever done any of these things, as far as I know (except for one near miss, mentioned in the numbered note below).

C++ version [expr.rel] [class.mem] Permitted for C Forbidden for C
C++98 (ISO/IEC 14882:1998)

C++03 (ISO/IEC 14882:2003)
If two pointers point to nonstatic data members of the same object, or to subobjects or array elements of such members, recursively, the pointer to the later declared member compares greater provided the two members are not separated by an access-specifier label and provided their class is not a union. Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified. abcd abdc[1]
acbd acdb
adbc adcb
cabd cadb
cdab dabc
dacb dcab
bacd badc
bcad bcda
bdac bdca
cbad cbda
cdba dbac
dbca dcba
[1] — According to P1847, EDG's frontend does support this layout (all public members first, then protected, then private), but only under a build-time configuration flag that no customer of theirs has ever used.
CWG 568 "Definition of POD is too strict"
N2342 "PODs Revisited" (i.e. splitting up "POD" into "standard-layout and trivial")
C++11 (N3337) If two pointers point to non-static data members of the same object, or to subobjects or array elements of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union. Nonstatic data members of a (non-union) class with the same access control are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified. abcd
abdc
acbd
cabd
Everything else
CWG 1512 "Pointer comparison vs qualification conversions"
N3624 "Pointer comparison vs qualification conversions"
C++14 (N4140)

C++17 (N4659)
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union. Nonstatic data members of a (non-union) class with the same access control are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified. abcd
abdc
acbd
cabd
Everything else
P840R2 "Language support for empty objects" (i.e., [[no_unique_address]])
CWG 2404 "[[no_unique_address]] and allocation order"
Editorial: Clarify auxiliary partial ordering
C++20 (N4868) If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control, neither member is a subobject of zero size, and their class is not a union. Note: Non-static data members of a (non-union) class with the same access control and non-zero size are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified. abcd
abdc
acbd
cabd
Everything else
P1847R4 "Make declaration order mandated"
C++2b If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided neither member is a subobject of zero size and their class is not a union. Note: Non-variant non-static data members of non-zero size are allocated so that later members have higher addresses within a class object. abcd Everything else
Posted 2022-03-04