[[trivial_abi]] 101

Finally, a blog post on [[trivial_abi]]!

This is a brand-new feature in Clang trunk, new as of about February 2018. It is a vendor extension to the C++ language — it is not standard C++, it isn’t supported by GCC trunk, and there is no active WG21 proposal to add it to the standard C++ language, as far as I know.

Full disclosure: I am totally not involved in the implementation of this feature. I’m just watching its patches go by on the cfe-commits mailing list and applauding quietly to myself. But this is such a cool feature that I think everyone should know about it.

Okay, first of all, since this is a non-standard attribute, Clang trunk doesn’t actually support it under the standard attribute spelling [[trivial_abi]]. Instead, you must spell it old-style as one of the following:

  • __attribute__((trivial_abi))
  • __attribute__((__trivial_abi__))
  • [[clang::trivial_abi]]

Also, being an attribute, the compiler will be super picky about where you put it — and passive-aggressively quiet if you accidentally put it in the wrong place (because unrecognized attributes are supposed to be quietly ignored). This is one of those “it’s a feature, not a bug!” situations. So the proper syntax, all in one place, is:

#define TRIVIAL_ABI __attribute__((trivial_abi))

class TRIVIAL_ABI Widget {
    // ...
};

What is the problem being solved?

Remember my blog post from 2018-04-17 where I showed two versions of a class (there called Integer):

struct Foo {
    int value;
    ~Foo() = default; // trivial
};

struct Bar {
    int value;
    ~Bar() {} // deliberately non-trivial
};

In that post’s particular code snippet, the compiler produced worse codegen for Foo than it did for Bar. This was worth blogging about because it was surprising. Programmers intuitively expect that the “trivial” code will do better than the “non-trivial” code. In most situations, this is true. Specifically, this is true when we go to do a function call or return:

template<class T>
T incr(T obj) {
    obj.value += 1;
    return obj;
}

incr<Foo> compiles into the following code:

leal   1(%rdi), %eax
retq

(leal is x86-speak for “add”.) We can see that our 4-byte obj will be passed in to incr<Foo> in the %edi register; and then we’ll add 1 to its value and return it in %eax. Four bytes in, four bytes out, easy peasy.

Now look at incr<Bar> (the case with the non-trivial destructor).

movl   (%rsi), %eax
addl   $1, %eax
movl   %eax, (%rsi)
movl   %eax, (%rdi)
movq   %rdi, %rax
retq

Here, obj is not being passed in a register, even though it’s the same 4 bytes with all the same semantics. Here, obj is being passed and returned by address. So our caller has set up some space for the return value and given us a pointer to that space in %rdi; and our caller has given us a pointer to the value of obj in the next argument register %rsi. We fetch the value from (%rsi), add 1 to it, store it back into (%rsi) (so as to update the value of obj itself), and then (trivially) copy the 4 bytes of obj into the return slot pointed to by %rdi. Finally, we copy the caller’s original pointer %rdi into %rax, because the x86-64 ABI document (page 22) says we have to.

The reason Bar behaves so differently from Foo is that Bar has a non-trivial destructor, and the x86-64 ABI document (page 19) says specifically:

If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer […]).

The later Itanium C++ ABI document defines a term of art:

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

[…]

A type is considered non-trivial for the purposes of calls if:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

So that explains it: Bar gets worse codegen because it is passed by invisible reference. It is passed by invisible reference because of the unfortunate conjunction of two independent premises:

  • the ABI document says that things with non-trivial destructors are passed by invisible reference, and
  • Bar has a non-trivial destructor.

By the way, this is a classical syllogism: the first bullet point above is the major premise, and the second is the minor premise. The conclusion is “Bar is passed by invisible reference.”

Suppose someone presents us with the syllogism

  • All men are mortal.
  • Socrates is a man.
  • Therefore Socrates is mortal.

If we wish to quibble with the conclusion “Socrates is mortal”, we must rebut one of the premises: either rebut the major premise (maybe some men aren’t mortal) or rebut the minor premise (maybe Socrates isn’t a man).

To get Bar to be passed in registers (just like Foo), we must rebut one or the other of our two premises. The standard-C++ way to do it is simply to give Bar a trivial destructor, negating the minor premise. But there is another way!

How [[trivial_abi]] solves the problem

Clang’s new trivial_abi attribute negates the major premise above. Clang extends the ABI document to say essentially the following:

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

[…]

A type is considered non-trivial for the purposes of calls if it has not been marked [[trivial_abi]] AND:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

That is, even a class type with a non-trivial move constructor or destructor will be considered trivial for the purposes of calls, if it has been marked by the programmer as [[trivial_abi]].

So now (using Clang trunk) we can go back and write this:

#define TRIVIAL_ABI __attribute__((trivial_abi))

struct TRIVIAL_ABI Baz {
    int value;
    ~Baz() {} // deliberately non-trivial
};

and compile incr<Baz>, and we get the same code as incr<Foo>!

Caveat #1: [[trivial_abi]] is sometimes a no-op

I would hope that we could make “trivial-for-purposes-of-calls” wrappers around standard library types like this:

template<class T, class D>
struct TRIVIAL_ABI trivial_unique_ptr : std::unique_ptr<T, D> {
    using std::unique_ptr<T, D>::unique_ptr;
};

Unfortunately, this doesn’t work. If your class has any base classes or non-static data members which are themselves “non-trivial for purposes of calls”, then Clang’s extension as currently written will make your class sort of “irreversibly non-trivial” — the attribute will have no effect. (It will not be diagnosed. This means you can use [[trivial_abi]] on a class template such as optional and have it be “conditionally trivial”, which is sometimes a useful feature. The downside, of course, is that you might mark a class trivial and then find out later that the compiler was giving you the silent treatment.)

The attribute will also be silently ignored if your class has virtual bases or virtual member functions. In these cases it probably won’t even fit in a register anyway, and I don’t know what you’re doing passing it around by value, but, just so you know.

So, as far as I know, the only ways to use TRIVIAL_ABI on “standard utility types” such as optional<T>, unique_ptr<T>, and shared_ptr<T> are

  • implement them from scratch yourself and apply the attribute, or
  • break into your local libc++ and apply the attribute by hand there.

(In the open-source world, these are essentially the same thing anyway.)

Caveat #2: Destructor responsibility

In our Foo/Bar example, the class had a no-op destructor. Suppose we gave our class a really non-trivial destructor?

struct Up1 {
    int value;
    Up1(Up1&& u) : value(u.value) { u.value = 0; }
    ~Up1() { puts("destroyed"); }
};

This should look familiar; it’s unique_ptr<int> stripped to its bare essentials, and with printf standing in for delete.

Without TRIVIAL_ABI, incr<Up1> looks just like incr<Bar>:

movl   (%rsi), %eax
addl   $1, %eax
movl   %eax, (%rdi)
movl   $0, (%rsi)
movq   %rdi, %rax
retq

With TRIVIAL_ABI added, incr<Up2> looks much bigger and scarier!

pushq  %rbx
leal   1(%rdi), %ebx
movl   $.L.str, %edi
callq  puts
movl   %ebx, %eax
popq   %rbx
retq

Under the traditional calling convention, types with non-trivial destructors are always passed by invisible reference, which means that the callee (incr in our case) always receives a pointer to a parameter object that it does not own. The caller owns the parameter object. This is what makes copy elision work!

When a type with [[trivial_abi]] is passed in registers, we are essentially making a copy of the parameter object. There is only one return register on x86-64 (handwave), so the callee has no way to give that object back to us when it’s finished. The callee must take ownership of the parameter object we gave it! Which means that the callee must call the destructor of the parameter object when it’s finished with it.

In our previous Foo/Bar/Baz examples, this destructor call was happening, but it was a no-op, so we didn’t notice. Now in incr<Up2> we see the additional code that is produced by a callee-side destructor.

It is conceivable that this extra code could add up, in certain use-cases.

However, counterpoint: this destructor call is not appearing out of nowhere! It is being called in incr because it is not being called in incr’s caller. So in general the costs and benefits might be expected to balance out.

Caveat #3: Destructor ordering

The destructor for the trivial-abi parameter will be called by the callee, not the caller (Caveat 2). Richard Smith points out that this means it will be called out of order with respect to the other parameters’ destructors.

struct TRIVIAL_ABI alpha {
    alpha() { puts("alpha constructed"); }
    ~alpha() { puts("alpha destroyed"); }
};
struct beta {
    beta() { puts("beta constructed"); }
    ~beta() { puts("beta destroyed"); }
};
void foo(alpha, beta) {}
int main() {
    foo(alpha{}, beta{});
}

This code prints

alpha constructed
beta constructed
alpha destroyed
beta destroyed

when TRIVIAL_ABI is defined as [[clang::trivial_abi]], and prints

alpha constructed
beta constructed
beta destroyed
alpha destroyed

when TRIVIAL_ABI is defined away. Only the latter — with destruction in reverse order of construction — is C++-standard-conforming.

Relation to “trivially relocatable” / “move-relocates”

None… well, some?

As you can see, there is no requirement that a [[trivial_abi]] class type should have any particular semantics for its move constructor, its destructor, or its default constructor. Any given class type will likely be trivially relocatable, simply because most class types are trivially relocatable by accident.

We can easily design an offset_ptr which is super duper non-trivially relocatable:

template<class T>
class TRIVIAL_ABI offset_ptr {
    intptr_t value_;
public:
    offset_ptr(T *p) : value_((const char*)p - (const char*)this) {}
    offset_ptr(const offset_ptr& rhs) : value_((const char*)rhs.get() - (const char*)this) {}
    T *get() const { return (T *)((const char *)this + value_); }
    offset_ptr& operator=(const offset_ptr& rhs) {
        value_ = ((const char*)rhs.get() - (const char*)this);
        return *this;
    }
    offset_ptr& operator+=(int diff) {
        value_ += (diff * sizeof (T));
        return *this;
    }
};

int main() {
    offset_ptr<int> top = &a[4];
    top = incr(top);
    assert(top.get() == &a[5]);
}

Here’s the full code.

With TRIVIAL_ABI defined, Clang trunk passes this test at -O0 or -O1, but at -O2 (i.e., as soon as it tries to inline the calls to trivial_offset_ptr::operator+= and the copy constructor) it fails the assertion.

So there’s a caveat here too. If your type is doing something crazy with the this pointer, you probably don’t want to be passing it in registers.

Filed 37319, essentially a documentation request. In this case, it turns out there’s just no way to make the code do what the programmer intends. We’re saying that the value of value_ should depend on the value of the this pointer; but at the caller–callee boundary, the object is in a register and there is no this pointer! So when the callee spills it back to memory and gives it a this pointer again, how should the callee compute the correct value to put into value_? Maybe the better question is, how does it even work at -O0? It shouldn’t work at all.

So anyway, if you’re going to use [[trivial_abi]], you must avoid having member functions (not just special member functions, but any member functions) that significantly depend on the object’s own address (for some hand-wavy value of “significantly”).

The intuition here is that when a thing is marked [[trivial_abi]], then any time you expect a copy you might actually get a copy plus memcpy: The “put it in a register and then take it back out” operation is essentially tantamount to memcpy. And similarly when you expect a move you might actually get a move plus memcpy.

Whereas, when a type is “trivially relocatable” (according to my definition from this C++Now talk), then any time you expect a copy and destroy you might actually get a memcpy. And similarly when you expect a move and destroy you might actually get a memcpy. You actually lose calls to special member functions when you’re talking about “trivial relocation”; whereas with the Clang [[trivial_abi]] attribute you never lose calls. You just get (as if) memcpy in addition to the calls you expected. This (as if) memcpy is the price you pay for a faster, register-based calling convention.

Further reading

Posted 2018-05-02