Removing an empty base class can break ABI
Earlier this week Eric Fiselier pointed out something that came as an unpleasant surprise
(to me, at least): Even though C++11 deprecated inheritance-from-std::unary_function,
and C++14 deprecated inheritance-from-std::iterator, the actual library vendors
cannot remove those deprecated relationships from their libraries without taking an ABI break!
This is eerily apropos to Bryce Adelstein Lelbach’s closing keynote at C++Now 2021, in which ABI breakage was a major theme.
Background
C++98 didn’t have auto, so it was useful for “function objects” like std::plus<T>
to expose a member typedef result_type, so that you could write nice generic code like
template<class F, class T>
typename F::result_type
apply_over(F f, T x, T y, T z) {
return f(f(x,y),z);
}
Also first_argument_type and second_argument_type. Writing these three member typedefs over
and over was a bit tedious, so the C++98 STL provided a base class template
std::binary_function
and mandated that plus (and less and so on) should inherit from that.
N1905 [lib.base] was literally
this simple:
template<class Arg1, class Arg2, class Result>
struct binary_function {
typedef Arg1 first_argument_type;
typedef Arg2 second_argument_type;
typedef Result result_type;
};
template<class T>
struct plus : binary_function<T,T,T> {
T operator()(const T& x, const T& y) const
{ return x + y; }
};
The same pattern occurs with the standard iterator model. Every iterator, if you want it
to work with std::iterator_traits, must provide five member typedefs: value_type, difference_type,
pointer, reference, and iterator_category. (C++20 finally eliminates the requirement
to provide pointer and reference!) Writing these five lines was a bit tedious, so C++98
provided a base class template std::iterator
and mandated that the standard iterator adaptors should inherit from it.
template<class Category, class T, class Distance = ptrdiff_t,
class Pointer = T*, class Reference = T&>
struct iterator {
typedef T value_type;
typedef Distance difference_type;
typedef Pointer pointer;
typedef Reference reference;
typedef Category iterator_category;
};
template<class Iterator>
class reverse_iterator : public iterator<
typename iterator_traits<Iterator>::iterator_category,
typename iterator_traits<Iterator>::value_type,
typename iterator_traits<Iterator>::difference_type,
typename iterator_traits<Iterator>::pointer,
typename iterator_traits<Iterator>::reference>
{
protected:
Iterator current;
public:
reverse_iterator();
// and so on
};
Notice in passing that reverse_iterator suffers from the same addiction
to protected members that plagues std::queue and std::insert_iterator.
Anyway, that was the situation in C++98.
C++11 gave us auto and decltype, and also lambdas. Giving
a bunch of member typedefs to every function object suddenly seemed like
a dumb idea. So C++11 deprecated the std::unary_function and std::binary_function
base class templates (and also introduced std::function, which was
not a base class), and removed them as base classes of std::plus et al.
Notice that C++11 did not say that std::plus must not
inherit from binary_function! As far as I know, standard types may inherit
from whatever types they want. Heck, std::vector<int> could inherit from std::regex
if it wanted to.
So, vendors continued to make plus inherit from binary_function, since it
was mandated in C++98 mode and not actively harmful in C++11 mode.
C++11 preserved std::iterator as-is, since nothing was particularly
changing in that area.
A very late-breaking issue, LWG2438,
modified C++14 to remove iterator as a base class of reverse_iterator et al;
but did not actually deprecate iterator.
Then in C++17 we almost got Concepts, which made it conspicuously awkward that std::iterator
was sitting on a great library name. And Ranges was starting to refactor which
of those typedefs you even needed. So
P0174 “Deprecating Vestigial Library Parts in C++17”
deprecated iterator — but unfortunately not speedily enough for Concepts, which is how we
got stuck with the sesquipedalian std::input_or_output_iterator
in C++20.
Meanwhile, P0090 “Removing result_type, etc.”
deprecated std::plus’s result_type, first_argument_type, and second_argument_type
member typedefs for C++17, and
P0619 “Reviewing Deprecated Facilities of C++17 for C++20”
formally removed them in C++20.
(Of course vendors are allowed to provide those typedefs even in C++20.)
So the situation in C++20, as I understand it, is:
-
reverse_iteratormust provide those five member typedefs somehow, but not necessarily by inheriting fromiterator. -
plusneedn’t provide those three member typedefs at all, let alone by inheriting frombinary_function. (But it might.)
Now for the ABI break
Pretend we’re a standard library vendor. Our existing implementation looks like this (simplified):
template<class Iterator>
class reverse_iterator : public std::iterator< ~~~ > {
Iterator current;
};
template<class T>
class vector {
struct iterator { T *ptr_; ~~~ };
using reverse_iterator = std::reverse_iterator<iterator>;
reverse_iterator rbegin();
};
Consider some user code like
std::vector<int> v;
auto it = std::make_reverse_iterator(v.rbegin());
it is now a variable of type reverse_iterator<reverse_iterator<vector<int>::iterator>>.
You might think reversing a
reverse_iteratorshould just unwrap it. Maybe. But weird special cases are hard to reason about. Ifmake_reverse_iterator(t)didn’t always return areverse_iterator<T>, you know I’d be blogging about some weird pitfall caused by that!
The class layout of decltype(it) is:
-
Empty base class
iterator<random_access_iterator_tag, T, ptrdiff_t, T*, T&> -
Member
current, which is of typevector<int>::reverse_iterator, i.e.std::reverse_iterator<std::vector<int>::iterator>
The class layout of decltype(it.current) is:
-
Empty base class
iterator<random_access_iterator_tag, T, ptrdiff_t, T*, T&>(again!) -
Member
current, which is of typevector<int>::iteratorand occupiessizeof(T*)bytes
So it contains two empty base classes, both of type
iterator<random_access_iterator_tag, T, ptrdiff_t, T*, T&>. In C++, base-class subobjects
are distinct objects in their own right, and two objects of the same type cannot occupy the same
address; so the EBO applies to
only one of them. (Jonathan Wakely has called this the “empty-base exclusion principle.”)
Our reversed reverse-iterator object ends up occupying 16 bytes, not just 8!
(Godbolt showing the behavior on libstdc++.)
If you try this on libc++, you’ll find that the original
reverse_iteratoroccupies 16 bytes already, because libc++’sreverse_iteratorwraps a pair of iterators instead of just one. LWG2360 is related; and sadly the extra pointer can’t be dropped without… say it with me… breaking ABI. Yes, this is insane.
So, on a sane library (not you libc++) that supports C++11, sizeof(it) is 16 bytes.
Now let’s pretend that the library vendor removes that std::iterator base class,
as C++14 permits.
template<class Iterator>
class reverse_iterator
#if __cplusplus < 201402L
: public std::iterator< ~~~ >
#endif
{
Iterator current;
};
Well, now the class layout of decltype(it) doesn’t have those empty base classes
anymore! So sizeof(it) drops to 8 bytes. And that changes the size of Widget,
and therefore the calling convention of make_widget in
(Godbolt):
struct Widget {
std::reverse_iterator<std::reverse_iterator<int*>> rr;
bool b;
};
Widget make_widget() {
static_assert(std::is_trivially_copyable_v<Widget>);
return Widget{ {}, false };
// A trivially copyable 24-byte type is returned on the stack.
// A trivially copyable 16-byte type is returned in registers, instead.
}
Any C++11 caller linking against a C++14 make_widget, or vice versa, will expect
the Widget result in the wrong place and occupying the wrong number of bytes.
Conclusion: For a library vendor, removing the deprecated empty base class from
reverse_iterator counts an ABI break.
UPDATE, 2021-05-08: It’s been brought to my attention that the C++20 Ranges library starts the whole damn cycle over again! Ranges provides a convenience base class
struct view_base {};
which isn’t even a template at all — it’s literally no more than a tag type —
but then the CRTP base class view_interface
inherits from that, and then every range adaptor
inherits from some specialization of view_interface!
template<class Crtp>
struct view_interface : view_base { ~~~ };
template<class V>
class reverse_view : public view_interface<reverse_view<V>> { ~~~ };
The result is that you would get the same bloated class layout from reverse_view
as you get from reverse_iterator… except that Ranges
actually does the “optimization” where reversing a reverse_view
produces the original view again! That is, std::views::reverse is overloaded
so that when its argument x is a reverse_view, the result of std::views::reverse(x)
is just x.base(). (Godbolt.)
If Ranges is still around 10 years from now, I predict at least a couple papers
deprecating inheritance from view_base.
