Removing an empty base class can break ABI
Earlier this week Eric Fiselier pointed out something that came as an unpleasant surprise
(to me, at least): Even though C++11 deprecated inheritance-from-std::unary_function
,
and C++14 deprecated inheritance-from-std::iterator
, the actual library vendors
cannot remove those deprecated relationships from their libraries without taking an ABI break!
This is eerily apropos to Bryce Adelstein Lelbach’s closing keynote at C++Now 2021, in which ABI breakage was a major theme.
Background
C++98 didn’t have auto
, so it was useful for “function objects” like std::plus<T>
to expose a member typedef result_type
, so that you could write nice generic code like
template<class F, class T>
typename F::result_type
apply_over(F f, T x, T y, T z) {
return f(f(x,y),z);
}
Also first_argument_type
and second_argument_type
. Writing these three member typedefs over
and over was a bit tedious, so the C++98 STL provided a base class template
std::binary_function
and mandated that plus
(and less
and so on) should inherit from that.
N1905 [lib.base] was literally
this simple:
template<class Arg1, class Arg2, class Result>
struct binary_function {
typedef Arg1 first_argument_type;
typedef Arg2 second_argument_type;
typedef Result result_type;
};
template<class T>
struct plus : binary_function<T,T,T> {
T operator()(const T& x, const T& y) const
{ return x + y; }
};
The same pattern occurs with the standard iterator model. Every iterator, if you want it
to work with std::iterator_traits
, must provide five member typedefs: value_type
, difference_type
,
pointer
, reference
, and iterator_category
. (C++20 finally eliminates the requirement
to provide pointer
and reference
!) Writing these five lines was a bit tedious, so C++98
provided a base class template std::iterator
and mandated that the standard iterator adaptors should inherit from it.
template<class Category, class T, class Distance = ptrdiff_t,
class Pointer = T*, class Reference = T&>
struct iterator {
typedef T value_type;
typedef Distance difference_type;
typedef Pointer pointer;
typedef Reference reference;
typedef Category iterator_category;
};
template<class Iterator>
class reverse_iterator : public iterator<
typename iterator_traits<Iterator>::iterator_category,
typename iterator_traits<Iterator>::value_type,
typename iterator_traits<Iterator>::difference_type,
typename iterator_traits<Iterator>::pointer,
typename iterator_traits<Iterator>::reference>
{
protected:
Iterator current;
public:
reverse_iterator();
// and so on
};
Notice in passing that reverse_iterator
suffers from the same addiction
to protected
members that plagues std::queue
and std::insert_iterator
.
Anyway, that was the situation in C++98.
C++11 gave us auto
and decltype
, and also lambdas. Giving
a bunch of member typedefs to every function object suddenly seemed like
a dumb idea. So C++11 deprecated the std::unary_function
and std::binary_function
base class templates (and also introduced std::function
, which was
not a base class), and removed them as base classes of std::plus
et al.
Notice that C++11 did not say that std::plus
must not
inherit from binary_function
! As far as I know, standard types may inherit
from whatever types they want. Heck, std::vector<int>
could inherit from std::regex
if it wanted to.
So, vendors continued to make plus
inherit from binary_function
, since it
was mandated in C++98 mode and not actively harmful in C++11 mode.
C++11 preserved std::iterator
as-is, since nothing was particularly
changing in that area.
A very late-breaking issue, LWG2438,
modified C++14 to remove iterator
as a base class of reverse_iterator
et al;
but did not actually deprecate iterator
.
Then in C++17 we almost got Concepts, which made it conspicuously awkward that std::iterator
was sitting on a great library name. And Ranges was starting to refactor which
of those typedefs you even needed. So
P0174 “Deprecating Vestigial Library Parts in C++17”
deprecated iterator
— but unfortunately not speedily enough for Concepts, which is how we
got stuck with the sesquipedalian std::input_or_output_iterator
in C++20.
Meanwhile, P0090 “Removing result_type
, etc.”
deprecated std::plus
’s result_type
, first_argument_type
, and second_argument_type
member typedefs for C++17, and
P0619 “Reviewing Deprecated Facilities of C++17 for C++20”
formally removed them in C++20.
(Of course vendors are allowed to provide those typedefs even in C++20.)
So the situation in C++20, as I understand it, is:
-
reverse_iterator
must provide those five member typedefs somehow, but not necessarily by inheriting fromiterator
. -
plus
needn’t provide those three member typedefs at all, let alone by inheriting frombinary_function
. (But it might.)
Now for the ABI break
Pretend we’re a standard library vendor. Our existing implementation looks like this (simplified):
template<class Iterator>
class reverse_iterator : public std::iterator< ~~~ > {
Iterator current;
};
template<class T>
class vector {
struct iterator { T *ptr_; ~~~ };
using reverse_iterator = std::reverse_iterator<iterator>;
reverse_iterator rbegin();
};
Consider some user code like
std::vector<int> v;
auto it = std::make_reverse_iterator(v.rbegin());
it
is now a variable of type reverse_iterator<reverse_iterator<vector<int>::iterator>>
.
You might think reversing a
reverse_iterator
should just unwrap it. Maybe. But weird special cases are hard to reason about. Ifmake_reverse_iterator(t)
didn’t always return areverse_iterator<T>
, you know I’d be blogging about some weird pitfall caused by that!
The class layout of decltype(it)
is:
-
Empty base class
iterator<random_access_iterator_tag, T, ptrdiff_t, T*, T&>
-
Member
current
, which is of typevector<int>::reverse_iterator
, i.e.std::reverse_iterator<std::vector<int>::iterator>
The class layout of decltype(it.current)
is:
-
Empty base class
iterator<random_access_iterator_tag, T, ptrdiff_t, T*, T&>
(again!) -
Member
current
, which is of typevector<int>::iterator
and occupiessizeof(T*)
bytes
So it
contains two empty base classes, both of type
iterator<random_access_iterator_tag, T, ptrdiff_t, T*, T&>
. In C++, base-class subobjects
are distinct objects in their own right, and two objects of the same type cannot occupy the same
address; so the EBO applies to
only one of them. (Jonathan Wakely has called this the “empty-base exclusion principle.”)
Our reversed reverse-iterator object ends up occupying 16 bytes, not just 8!
(Godbolt showing the behavior on libstdc++.)
If you try this on libc++, you’ll find that the original
reverse_iterator
occupies 16 bytes already, because libc++’sreverse_iterator
wraps a pair of iterators instead of just one. LWG2360 is related; and sadly the extra pointer can’t be dropped without… say it with me… breaking ABI. Yes, this is insane.
So, on a sane library (not you libc++) that supports C++11, sizeof(it)
is 16 bytes.
Now let’s pretend that the library vendor removes that std::iterator
base class,
as C++14 permits.
template<class Iterator>
class reverse_iterator
#if __cplusplus < 201402L
: public std::iterator< ~~~ >
#endif
{
Iterator current;
};
Well, now the class layout of decltype(it)
doesn’t have those empty base classes
anymore! So sizeof(it)
drops to 8 bytes. And that changes the size of Widget
,
and therefore the calling convention of make_widget
in
(Godbolt):
struct Widget {
std::reverse_iterator<std::reverse_iterator<int*>> rr;
bool b;
};
Widget make_widget() {
static_assert(std::is_trivially_copyable_v<Widget>);
return Widget{ {}, false };
// A trivially copyable 24-byte type is returned on the stack.
// A trivially copyable 16-byte type is returned in registers, instead.
}
Any C++11 caller linking against a C++14 make_widget
, or vice versa, will expect
the Widget
result in the wrong place and occupying the wrong number of bytes.
Conclusion: For a library vendor, removing the deprecated empty base class from
reverse_iterator
counts an ABI break.
UPDATE, 2021-05-08: It’s been brought to my attention that the C++20 Ranges library starts the whole damn cycle over again! Ranges provides a convenience base class
struct view_base {};
which isn’t even a template at all — it’s literally no more than a tag type —
but then the CRTP base class view_interface
inherits from that, and then every range adaptor
inherits from some specialization of view_interface
!
template<class Crtp>
struct view_interface : view_base { ~~~ };
template<class V>
class reverse_view : public view_interface<reverse_view<V>> { ~~~ };
The result is that you would get the same bloated class layout from reverse_view
as you get from reverse_iterator
… except that Ranges
actually does the “optimization” where reversing a reverse_view
produces the original view again! That is, std::views::reverse
is overloaded
so that when its argument x
is a reverse_view
, the result of std::views::reverse(x)
is just x.base()
. (Godbolt.)
If Ranges is still around 10 years from now, I predict at least a couple papers
deprecating inheritance from view_base
.