[[trivial_abi]] 101
[[trivial_abi]] 101Finally, a blog post on [[trivial_abi]]!
This is a brand-new feature in Clang trunk, new as of about February 2018. It is a vendor extension to the C++ language — it is not standard C++, it isn’t supported by GCC trunk, and there is no active WG21 proposal to add it to the standard C++ language, as far as I know.
Full disclosure: I am totally not involved in the implementation of this
feature. I’m just watching its patches go by on the cfe-commits mailing list
and applauding quietly to myself. But this is such a cool feature that I think
everyone should know about it.
Okay, first of all, since this is a non-standard attribute, Clang trunk doesn’t
actually support it under the standard attribute spelling [[trivial_abi]].
Instead, you must spell it old-style as one of the following:
__attribute__((trivial_abi))__attribute__((__trivial_abi__))[[clang::trivial_abi]]
Also, being an attribute, the compiler will be super picky about where you put it — and passive-aggressively quiet if you accidentally put it in the wrong place (because unrecognized attributes are supposed to be quietly ignored). This is one of those “it’s a feature, not a bug!” situations. So the proper syntax, all in one place, is:
#define TRIVIAL_ABI __attribute__((trivial_abi))
class TRIVIAL_ABI Widget {
// ...
};
What is the problem being solved?
Remember my blog post from 2018-04-17
where I showed two versions of a class (there called Integer):
struct Foo {
int value;
~Foo() = default; // trivial
};
struct Bar {
int value;
~Bar() {} // deliberately non-trivial
};
In that post’s particular code snippet, the compiler produced
worse codegen for Foo than it did for Bar. This was worth blogging
about because it was surprising. Programmers intuitively
expect that the “trivial” code will do better than the “non-trivial” code.
In most situations, this is true. Specifically, this is true when we go to
do a function call or return:
template<class T>
T incr(T obj) {
obj.value += 1;
return obj;
}
incr<Foo> compiles into the following code:
leal 1(%rdi), %eax
retq
(leal is x86-speak for “add”.)
We can see that our 4-byte obj will be passed in to incr<Foo>
in the %edi register; and then we’ll add 1 to its value and return it
in %eax. Four bytes in, four bytes out, easy peasy.
Now look at incr<Bar> (the case with the non-trivial destructor).
movl (%rsi), %eax
addl $1, %eax
movl %eax, (%rsi)
movl %eax, (%rdi)
movq %rdi, %rax
retq
Here, obj is not being passed in a register, even though it’s the same 4 bytes
with all the same semantics. Here, obj is being passed and returned by address.
So our caller has set up some space for the return value and given us a pointer
to that space in %rdi; and our caller has given us a pointer to the value of obj
in the next argument register %rsi. We fetch the value from (%rsi),
add 1 to it, store it back into (%rsi) (so as to update the value of obj itself),
and then (trivially) copy the 4 bytes of obj into the return slot pointed to
by %rdi. Finally, we copy the caller’s original pointer %rdi into %rax,
because the x86-64 ABI document
(page 22) says we have to.
The reason Bar behaves so differently from Foo is that Bar has a non-trivial
destructor, and the x86-64 ABI document
(page 19) says specifically:
If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer […]).
The later Itanium C++ ABI document defines a term of art:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
[…]
A type is considered non-trivial for the purposes of calls if:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
So that explains it: Bar gets worse codegen because it is passed by
invisible reference. It is passed by invisible reference because of the
unfortunate conjunction of two independent premises:
- the ABI document says that things with non-trivial destructors are passed by invisible reference, and
Barhas a non-trivial destructor.
By the way, this is a classical syllogism:
the first bullet point above is the major premise, and the second is the
minor premise. The conclusion is “Bar is passed by invisible reference.”
Suppose someone presents us with the syllogism
- All men are mortal.
- Socrates is a man.
- Therefore Socrates is mortal.
If we wish to quibble with the conclusion “Socrates is mortal”, we must rebut one of the premises: either rebut the major premise (maybe some men aren’t mortal) or rebut the minor premise (maybe Socrates isn’t a man).
To get Bar to be passed in registers (just like Foo), we must rebut
one or the other of our two premises. The standard-C++ way to do it is
simply to give Bar a trivial destructor, negating the minor premise.
But there is another way!
How [[trivial_abi]] solves the problem
Clang’s new trivial_abi attribute negates the major premise above.
Clang extends the ABI document to say essentially the following:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
[…]
A type is considered non-trivial for the purposes of calls if it has not been marked
[[trivial_abi]]AND:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
That is, even a class type with a non-trivial move constructor or destructor
will be considered trivial for the purposes of calls, if it has been
marked by the programmer as [[trivial_abi]].
So now (using Clang trunk) we can go back and write this:
#define TRIVIAL_ABI __attribute__((trivial_abi))
struct TRIVIAL_ABI Baz {
int value;
~Baz() {} // deliberately non-trivial
};
and compile incr<Baz>, and we get the same code
as incr<Foo>!
Caveat #1: [[trivial_abi]] is sometimes a no-op
I would hope that we could make “trivial-for-purposes-of-calls” wrappers around standard library types like this:
template<class T, class D>
struct TRIVIAL_ABI trivial_unique_ptr : std::unique_ptr<T, D> {
using std::unique_ptr<T, D>::unique_ptr;
};
Unfortunately, this doesn’t work. If your class has any base classes
or non-static data members which are themselves “non-trivial for purposes
of calls”, then Clang’s extension as currently written will make your class
sort of “irreversibly non-trivial” — the attribute will have no effect.
(It will not be diagnosed. This means you can use [[trivial_abi]] on a
class template such as optional and have it be “conditionally trivial”,
which is sometimes a useful feature. The downside, of course, is that you
might mark a class trivial and then find out later that the compiler was
giving you the silent treatment.)
The attribute will also be silently ignored if your class has virtual bases or virtual member functions. In these cases it probably won’t even fit in a register anyway, and I don’t know what you’re doing passing it around by value, but, just so you know.
So, as far as I know, the only ways to use TRIVIAL_ABI on “standard utility types”
such as optional<T>, unique_ptr<T>, and shared_ptr<T> are
- implement them from scratch yourself and apply the attribute, or
- break into your local libc++ and apply the attribute by hand there.
(In the open-source world, these are essentially the same thing anyway.)
Caveat #2: Destructor responsibility
In our Foo/Bar example, the class had a no-op destructor. Suppose we
gave our class a really non-trivial destructor?
struct Up1 {
int value;
Up1(Up1&& u) : value(u.value) { u.value = 0; }
~Up1() { puts("destroyed"); }
};
This should look familiar; it’s unique_ptr<int> stripped to its
bare essentials, and with printf standing in for delete.
Without TRIVIAL_ABI, incr<Up1> looks just like incr<Bar>:
movl (%rsi), %eax
addl $1, %eax
movl %eax, (%rdi)
movl $0, (%rsi)
movq %rdi, %rax
retq
With TRIVIAL_ABI added, incr<Up2> looks much bigger and scarier!
pushq %rbx
leal 1(%rdi), %ebx
movl $.L.str, %edi
callq puts
movl %ebx, %eax
popq %rbx
retq
Under the traditional calling convention, types with non-trivial destructors
are always passed by invisible reference, which means that the callee (incr
in our case) always receives a pointer to a parameter object that it does not own.
The caller owns the parameter object. This is what makes copy elision
work!
When a type with [[trivial_abi]] is passed in registers, we are essentially
making a copy of the parameter object. There is only one return register on
x86-64 (handwave), so the callee has no way to give that object back to us
when it’s finished. The callee must take ownership of the parameter object
we gave it! Which means that the callee must call the destructor of the
parameter object when it’s finished with it.
In our previous Foo/Bar/Baz examples, this destructor call was happening,
but it was a no-op, so we didn’t notice. Now in incr<Up2> we see the additional
code that is produced by a callee-side destructor.
It is conceivable that this extra code could add up, in certain use-cases.
However, counterpoint: this destructor call is not appearing out of nowhere!
It is being called in incr because it is not being called in incr’s caller.
So in general the costs and benefits might be expected to balance out.
Caveat #3: Destructor ordering
The destructor for the trivial-abi parameter will be called by the callee, not the caller (Caveat 2). Richard Smith points out that this means it will be called out of order with respect to the other parameters’ destructors.
struct TRIVIAL_ABI alpha {
alpha() { puts("alpha constructed"); }
~alpha() { puts("alpha destroyed"); }
};
struct beta {
beta() { puts("beta constructed"); }
~beta() { puts("beta destroyed"); }
};
void foo(alpha, beta) {}
int main() {
foo(alpha{}, beta{});
}
This code prints
alpha constructed
beta constructed
alpha destroyed
beta destroyed
when TRIVIAL_ABI is defined as [[clang::trivial_abi]], and prints
alpha constructed
beta constructed
beta destroyed
alpha destroyed
when TRIVIAL_ABI is defined away. Only the latter — with destruction in
reverse order of construction — is C++-standard-conforming.
Relation to “trivially relocatable” / “move-relocates”
None… well, some?
As you can see, there is no requirement that a [[trivial_abi]] class type
should have any particular semantics for its move constructor, its destructor,
or its default constructor. Any given class type will likely be trivially
relocatable, simply because most class types are trivially relocatable by
accident.
We can easily design an offset_ptr which is
super duper non-trivially relocatable:
template<class T>
class TRIVIAL_ABI offset_ptr {
intptr_t value_;
public:
offset_ptr(T *p) : value_((const char*)p - (const char*)this) {}
offset_ptr(const offset_ptr& rhs) : value_((const char*)rhs.get() - (const char*)this) {}
T *get() const { return (T *)((const char *)this + value_); }
offset_ptr& operator=(const offset_ptr& rhs) {
value_ = ((const char*)rhs.get() - (const char*)this);
return *this;
}
offset_ptr& operator+=(int diff) {
value_ += (diff * sizeof (T));
return *this;
}
};
int main() {
offset_ptr<int> top = &a[4];
top = incr(top);
assert(top.get() == &a[5]);
}
With TRIVIAL_ABI defined, Clang trunk passes this test at -O0 or -O1, but at -O2 (i.e., as soon
as it tries to inline the calls to trivial_offset_ptr::operator+= and
the copy constructor) it fails the assertion.
So there’s a caveat here too. If your type is doing something crazy with
the this pointer, you probably don’t want to be passing it in registers.
Filed 37319, essentially a documentation request.
In this case, it turns out there’s just no way to make the code do what the programmer intends.
We’re saying that the
value of value_ should depend on the value of the this pointer; but
at the caller–callee boundary, the object is in a register and there is
no this pointer! So when the callee spills it back to memory and gives it
a this pointer again, how should the callee compute the correct value to
put into value_? Maybe the better question is,
how does it even work at -O0? It shouldn’t work at all.
So anyway, if you’re going to use [[trivial_abi]], you must avoid having
member functions (not just special member functions, but any member functions)
that significantly depend on the object’s own address (for some hand-wavy value of
“significantly”).
The intuition here is that when a thing is marked [[trivial_abi]], then any time
you expect a copy you might actually get a copy plus memcpy: The “put it in a register
and then take it back out” operation is essentially tantamount to memcpy. And similarly
when you expect a move you might actually get a move plus memcpy.
Whereas, when a type is “trivially relocatable” (according to my definition from
this C++Now talk),
then any time you expect a copy and destroy you might actually get a memcpy. And similarly
when you expect a move and destroy you might actually get a memcpy. You actually lose calls
to special member functions when you’re talking about “trivial relocation”; whereas with the
Clang [[trivial_abi]] attribute you never lose calls. You just get (as if) memcpy in addition to the
calls you expected. This (as if) memcpy is the price you pay for a faster, register-based calling
convention.
