[[trivial_abi]]
101
[[trivial_abi]]
101Finally, a blog post on [[trivial_abi]]
!
This is a brand-new feature in Clang trunk, new as of about February 2018. It is a vendor extension to the C++ language — it is not standard C++, it isn’t supported by GCC trunk, and there is no active WG21 proposal to add it to the standard C++ language, as far as I know.
Full disclosure: I am totally not involved in the implementation of this
feature. I’m just watching its patches go by on the cfe-commits
mailing list
and applauding quietly to myself. But this is such a cool feature that I think
everyone should know about it.
Okay, first of all, since this is a non-standard attribute, Clang trunk doesn’t
actually support it under the standard attribute spelling [[trivial_abi]]
.
Instead, you must spell it old-style as one of the following:
__attribute__((trivial_abi))
__attribute__((__trivial_abi__))
[[clang::trivial_abi]]
Also, being an attribute, the compiler will be super picky about where you put it — and passive-aggressively quiet if you accidentally put it in the wrong place (because unrecognized attributes are supposed to be quietly ignored). This is one of those “it’s a feature, not a bug!” situations. So the proper syntax, all in one place, is:
#define TRIVIAL_ABI __attribute__((trivial_abi))
class TRIVIAL_ABI Widget {
// ...
};
What is the problem being solved?
Remember my blog post from 2018-04-17
where I showed two versions of a class (there called Integer
):
struct Foo {
int value;
~Foo() = default; // trivial
};
struct Bar {
int value;
~Bar() {} // deliberately non-trivial
};
In that post’s particular code snippet, the compiler produced
worse codegen for Foo
than it did for Bar
. This was worth blogging
about because it was surprising. Programmers intuitively
expect that the “trivial” code will do better than the “non-trivial” code.
In most situations, this is true. Specifically, this is true when we go to
do a function call or return:
template<class T>
T incr(T obj) {
obj.value += 1;
return obj;
}
incr<Foo>
compiles into the following code:
leal 1(%rdi), %eax
retq
(leal
is x86-speak for “add”.)
We can see that our 4-byte obj
will be passed in to incr<Foo>
in the %edi
register; and then we’ll add 1 to its value and return it
in %eax
. Four bytes in, four bytes out, easy peasy.
Now look at incr<Bar>
(the case with the non-trivial destructor).
movl (%rsi), %eax
addl $1, %eax
movl %eax, (%rsi)
movl %eax, (%rdi)
movq %rdi, %rax
retq
Here, obj
is not being passed in a register, even though it’s the same 4 bytes
with all the same semantics. Here, obj
is being passed and returned by address.
So our caller has set up some space for the return value and given us a pointer
to that space in %rdi
; and our caller has given us a pointer to the value of obj
in the next argument register %rsi
. We fetch the value from (%rsi)
,
add 1 to it, store it back into (%rsi)
(so as to update the value of obj
itself),
and then (trivially) copy the 4 bytes of obj
into the return slot pointed to
by %rdi
. Finally, we copy the caller’s original pointer %rdi
into %rax
,
because the x86-64 ABI document
(page 22) says we have to.
The reason Bar
behaves so differently from Foo
is that Bar
has a non-trivial
destructor, and the x86-64 ABI document
(page 19) says specifically:
If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer […]).
The later Itanium C++ ABI document defines a term of art:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
[…]
A type is considered non-trivial for the purposes of calls if:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
So that explains it: Bar
gets worse codegen because it is passed by
invisible reference. It is passed by invisible reference because of the
unfortunate conjunction of two independent premises:
- the ABI document says that things with non-trivial destructors are passed by invisible reference, and
Bar
has a non-trivial destructor.
By the way, this is a classical syllogism:
the first bullet point above is the major premise, and the second is the
minor premise. The conclusion is “Bar
is passed by invisible reference.”
Suppose someone presents us with the syllogism
- All men are mortal.
- Socrates is a man.
- Therefore Socrates is mortal.
If we wish to quibble with the conclusion “Socrates is mortal”, we must rebut one of the premises: either rebut the major premise (maybe some men aren’t mortal) or rebut the minor premise (maybe Socrates isn’t a man).
To get Bar
to be passed in registers (just like Foo
), we must rebut
one or the other of our two premises. The standard-C++ way to do it is
simply to give Bar
a trivial destructor, negating the minor premise.
But there is another way!
How [[trivial_abi]]
solves the problem
Clang’s new trivial_abi
attribute negates the major premise above.
Clang extends the ABI document to say essentially the following:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
[…]
A type is considered non-trivial for the purposes of calls if it has not been marked
[[trivial_abi]]
AND:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
That is, even a class type with a non-trivial move constructor or destructor
will be considered trivial for the purposes of calls, if it has been
marked by the programmer as [[trivial_abi]]
.
So now (using Clang trunk) we can go back and write this:
#define TRIVIAL_ABI __attribute__((trivial_abi))
struct TRIVIAL_ABI Baz {
int value;
~Baz() {} // deliberately non-trivial
};
and compile incr<Baz>
, and we get the same code
as incr<Foo>
!
Caveat #1: [[trivial_abi]]
is sometimes a no-op
I would hope that we could make “trivial-for-purposes-of-calls” wrappers around standard library types like this:
template<class T, class D>
struct TRIVIAL_ABI trivial_unique_ptr : std::unique_ptr<T, D> {
using std::unique_ptr<T, D>::unique_ptr;
};
Unfortunately, this doesn’t work. If your class has any base classes
or non-static data members which are themselves “non-trivial for purposes
of calls”, then Clang’s extension as currently written will make your class
sort of “irreversibly non-trivial” — the attribute will have no effect.
(It will not be diagnosed. This means you can use [[trivial_abi]]
on a
class template such as optional
and have it be “conditionally trivial”,
which is sometimes a useful feature. The downside, of course, is that you
might mark a class trivial and then find out later that the compiler was
giving you the silent treatment.)
The attribute will also be silently ignored if your class has virtual bases or virtual member functions. In these cases it probably won’t even fit in a register anyway, and I don’t know what you’re doing passing it around by value, but, just so you know.
So, as far as I know, the only ways to use TRIVIAL_ABI
on “standard utility types”
such as optional<T>
, unique_ptr<T>
, and shared_ptr<T>
are
- implement them from scratch yourself and apply the attribute, or
- break into your local libc++ and apply the attribute by hand there.
(In the open-source world, these are essentially the same thing anyway.)
Caveat #2: Destructor responsibility
In our Foo
/Bar
example, the class had a no-op destructor. Suppose we
gave our class a really non-trivial destructor?
struct Up1 {
int value;
Up1(Up1&& u) : value(u.value) { u.value = 0; }
~Up1() { puts("destroyed"); }
};
This should look familiar; it’s unique_ptr<int>
stripped to its
bare essentials, and with printf
standing in for delete
.
Without TRIVIAL_ABI
, incr<Up1>
looks just like incr<Bar>
:
movl (%rsi), %eax
addl $1, %eax
movl %eax, (%rdi)
movl $0, (%rsi)
movq %rdi, %rax
retq
With TRIVIAL_ABI
added, incr<Up2>
looks much bigger and scarier!
pushq %rbx
leal 1(%rdi), %ebx
movl $.L.str, %edi
callq puts
movl %ebx, %eax
popq %rbx
retq
Under the traditional calling convention, types with non-trivial destructors
are always passed by invisible reference, which means that the callee (incr
in our case) always receives a pointer to a parameter object that it does not own.
The caller owns the parameter object. This is what makes copy elision
work!
When a type with [[trivial_abi]]
is passed in registers, we are essentially
making a copy of the parameter object. There is only one return register on
x86-64 (handwave), so the callee has no way to give that object back to us
when it’s finished. The callee must take ownership of the parameter object
we gave it! Which means that the callee must call the destructor of the
parameter object when it’s finished with it.
In our previous Foo
/Bar
/Baz
examples, this destructor call was happening,
but it was a no-op, so we didn’t notice. Now in incr<Up2>
we see the additional
code that is produced by a callee-side destructor.
It is conceivable that this extra code could add up, in certain use-cases.
However, counterpoint: this destructor call is not appearing out of nowhere!
It is being called in incr
because it is not being called in incr
’s caller.
So in general the costs and benefits might be expected to balance out.
Caveat #3: Destructor ordering
The destructor for the trivial-abi parameter will be called by the callee, not the caller (Caveat 2). Richard Smith points out that this means it will be called out of order with respect to the other parameters’ destructors.
struct TRIVIAL_ABI alpha {
alpha() { puts("alpha constructed"); }
~alpha() { puts("alpha destroyed"); }
};
struct beta {
beta() { puts("beta constructed"); }
~beta() { puts("beta destroyed"); }
};
void foo(alpha, beta) {}
int main() {
foo(alpha{}, beta{});
}
This code prints
alpha constructed
beta constructed
alpha destroyed
beta destroyed
when TRIVIAL_ABI
is defined as [[clang::trivial_abi]]
, and prints
alpha constructed
beta constructed
beta destroyed
alpha destroyed
when TRIVIAL_ABI
is defined away. Only the latter — with destruction in
reverse order of construction — is C++-standard-conforming.
Relation to “trivially relocatable” / “move-relocates”
None… well, some?
As you can see, there is no requirement that a [[trivial_abi]]
class type
should have any particular semantics for its move constructor, its destructor,
or its default constructor. Any given class type will likely be trivially
relocatable, simply because most class types are trivially relocatable by
accident.
We can easily design an offset_ptr
which is
super duper non-trivially relocatable:
template<class T>
class TRIVIAL_ABI offset_ptr {
intptr_t value_;
public:
offset_ptr(T *p) : value_((const char*)p - (const char*)this) {}
offset_ptr(const offset_ptr& rhs) : value_((const char*)rhs.get() - (const char*)this) {}
T *get() const { return (T *)((const char *)this + value_); }
offset_ptr& operator=(const offset_ptr& rhs) {
value_ = ((const char*)rhs.get() - (const char*)this);
return *this;
}
offset_ptr& operator+=(int diff) {
value_ += (diff * sizeof (T));
return *this;
}
};
int main() {
offset_ptr<int> top = &a[4];
top = incr(top);
assert(top.get() == &a[5]);
}
With TRIVIAL_ABI
defined, Clang trunk passes this test at -O0
or -O1
, but at -O2
(i.e., as soon
as it tries to inline the calls to trivial_offset_ptr::operator+=
and
the copy constructor) it fails the assertion.
So there’s a caveat here too. If your type is doing something crazy with
the this
pointer, you probably don’t want to be passing it in registers.
Filed 37319, essentially a documentation request.
In this case, it turns out there’s just no way to make the code do what the programmer intends.
We’re saying that the
value of value_
should depend on the value of the this
pointer; but
at the caller–callee boundary, the object is in a register and there is
no this
pointer! So when the callee spills it back to memory and gives it
a this
pointer again, how should the callee compute the correct value to
put into value_
? Maybe the better question is,
how does it even work at -O0
? It shouldn’t work at all.
So anyway, if you’re going to use [[trivial_abi]]
, you must avoid having
member functions (not just special member functions, but any member functions)
that significantly depend on the object’s own address (for some hand-wavy value of
“significantly”).
The intuition here is that when a thing is marked [[trivial_abi]]
, then any time
you expect a copy you might actually get a copy plus memcpy
: The “put it in a register
and then take it back out” operation is essentially tantamount to memcpy
. And similarly
when you expect a move you might actually get a move plus memcpy
.
Whereas, when a type is “trivially relocatable” (according to my definition from
this C++Now talk),
then any time you expect a copy and destroy you might actually get a memcpy
. And similarly
when you expect a move and destroy you might actually get a memcpy
. You actually lose calls
to special member functions when you’re talking about “trivial relocation”; whereas with the
Clang [[trivial_abi]]
attribute you never lose calls. You just get (as if) memcpy
in addition to the
calls you expected. This (as if) memcpy
is the price you pay for a faster, register-based calling
convention.