When can the C++ compiler devirtualize a call?

Someone recently asked me about devirtualization optimizations: when do they happen? when can we rely on devirtualization? do different compilers do devirtualization differently? As usual, this led me down an experimental rabbit-hole. The answer seems to be: Modern compilers devirtualize calls to final methods pretty reliably. But there are many interesting corner cases — including some I haven’t thought of, I’m sure! — and different compilers do catch different subsets of those corner cases.

First, let’s observe that devirtualization can (probably?) be done more effectively via LTO, using whole-program analysis. I don’t know anything about the state of the art in link-time devirtualization, and it’s hard to experiment with on Compiler Explorer, so I’m not going to talk about LTO at all. We’re looking purely at what the compiler itself can do.

There are basically two situations where the compiler knows enough to devirtualize. They don’t have much in common:

When we know the instance’s dynamic type

The archetypical case here is

void test() {
    Apple o;
    o.f();
}

It doesn’t matter if Apple::f is virtual; all virtual dispatch ever does is invoke the method on the actual dynamic type of the object, and here we know the actual dynamic type is exactly Apple. Static and dynamic dispatch should give us the same result in this case.

A sufficiently smart compiler will use dataflow analysis to optimize non-trivial cases such as

Derived d;
Base *p = &d;
p->f();

It turns out that even this simple dodge is enough to fool MSVC and ICC. The next test case is

Derived da, db;
Base *p = cond ? &da : &db;
p->f();

This is too much for Clang, but GCC actually manages to survive it… until you move the conversions to Base* inside the conditional! Here is where even GCC’s analysis fails (Godbolt):

Derived da, db;
Base *p = cond ? (Base*)&da : (Base*)&db;
p->f();

When we know a “proof of leafness” for its static type

Okay, let’s suppose that we’re receiving a pointer from somewhere else in the system. We know its static type (e.g. Derived*), but we don’t know the actual dynamic type of the object instance to which it points. Still, the compiler can devirtualize a call to Derived::f if it can somehow prove that no type in the entire program can ever override Derived::f.

Proof-by-`final`

The simplest “proof of leafness” is if you’ve marked Derived as final.

struct Base {
    virtual int f();
};
struct Derived final : public Base {
    int f() override { return 2; }
};
int test(Derived *p) {
    return p->f();
}

A pointer of type Derived* must point to an object instance that is “at least Derived” — i.e., Derived or one of its children. Since Derived is final, it isn’t allowed to have children; therefore the dynamic type of the instance must be exactly Derived, and the compiler can devirtualize this call.

Or, you can mark the specific method Derived::f as final.

The same analysis should apply no matter whether Derived::f is declared in Derived itself, or inherited from Base. So for example the compiler should be equally able to devirtualize

struct Base {
    virtual int f() { return 1; }
};
struct Derived final : public Base {};
int test(Derived *p) {
    return p->f();
}

GCC, Clang, and MSVC pass this test (Godbolt, case one); ICC 21.1.9 is fooled.

An utterly bizarre proof-of-leafness is to observe that when class C’s destructor is final, C must be childless — because if C had a child, the child would have to have a destructor (since you can’t make a class without a destructor), which would then override C’s destructor, which isn’t allowed. Clang actually both warns on final destructors, and optimizes on them. Every other vendor considers this situation very silly and doesn’t dignify it with a codepath as far as I can tell.

Proof-by-internal-linkage

A class whose name has internal linkage cannot be named outside the current translation unit. Therefore, it cannot be derived from outside the current translation unit, either! As long as it has no children in the current TU — or at least no children that override its methods — calls to its virtual functions are devirtualizable.

namespace {
    class BaseImpl : public Base {};
}
int test(Base *p) {
    return static_cast<BaseImpl*>(p)->f();
}

If p really does point to an object instance that is “at least BaseImpl,” then the compiler can prove that the instance must be exactly BaseImpl. (And if p doesn’t point to an instance that is “at least BaseImpl,” the program has undefined behavior anyway.)

This strikes me as a case that might actually come up pretty commonly in real codebases. It’s common to have a base class exposed publicly in the header file, and then one or more derived implementations scoped tightly to a single .cpp file. If you go the extra mile and put those derived implementations into anonymous namespaces, you might be helping out the compiler’s devirtualization logic. Of course, by definition, any such benefit will be limited to that single .cpp file!

Another way a type’s name can get internal linkage is when it’s a class template instantiation where one of the template parameters involves a name with internal linkage. If the name T has internal linkage, then E<T> also has internal linkage, even if E itself has external linkage — because you can’t name E<T> without naming T. (Notice that here T must be a “true name”; we’re not talking about type aliases.)

It’s also possible to make a type whose name has external linkage, but where the compiler can prove that the type must be incomplete in every other TU. For example,

namespace {
    class Internal {};
}
class External { Internal m; };

Any other TU is allowed to forward-declare class External; as an incomplete type, but those TUs can’t ever complete the type because they can’t name the types of its data members. You can’t derive from an incomplete type. So all types derived from External (if any) must be present in this TU; if there are none here, well, that’s a proof-of-leafness! GCC alone detects this situation.

Table of results

Godbolt for the known-dynamic-type case; Godbolt for the proof-of-leafness case. In the latter, I’ve made separate tests for Derived::f-defined-directly-in-Derived and Derived::g-inherited-from-Base. GCC frequently gets f right but fails to devirtualize g. I’ve filed GCC bug #99093 about that.

Test case	Point	GCC	Clang	MSVC	ICC
`one`	trivial	✓	✓	✓	✓
`two`	cast to `Base*`	✓	✓
`three`	conditional, then cast	✓
`four`	cast, then conditional
`one`	final class	✓	✓	✓	`f`
`two`	final method	✓	✓	✓	✓
`three`	silly final destructor		✓
`four`	silly old-school trick
`five`	I.L. class	`f`
`six`	I.L. template parameter	`f`
`seven`	I.L. base	`f`
`eight`	I.L. member	`f`
`nine`	I.L. with child	`f`
`ten`	local class	`f`

Steve Dewhurst taught me four’s “silly old-school trick”: Virtual bases are always constructed in the context of the most-derived class. So, if class C has a virtual base all of whose constructors are private, then no child of C will ever be able to construct itself, and therefore no child of C can exist. (Of course that virtual base will have to list C among its friends, in order for C itself to be constructible.) I think this trick is foolproof, and thus constitutes a (very silly) proof-of-leafness for C; but of course no compiler cares to trace through this logical tangle, even if it is foolproof.

Can you think of some way to construct a “proof-of-leafness” that I’ve missed? Tell me about it!

Posted 2021-02-15