In 2022 we saw a lot of interest (finally!) in the costs of std::move
and
std::forward
. For example, in April Richard Smith landed
-fbuiltin-std-forward
in Clang; in September Vittorio Romeo lamented
“The sad state of debug performance in C++”;
and in December the MSVC team landed [[msvc::intrinsic]]
.
Recall that std::forward<Arg>(arg)
should be used only on forwarding references,
and that when you do, it’s exactly equivalent to static_cast<Arg&&>(arg)
, or equivalently decltype(arg)(arg)
.
But historically std::forward
has been vastly more expensive to compile, because as far as the compiler is concerned,
it’s just a function template that needs to be instantiated, codegenned, inlined, and so on.
Way back in March 2015 — seven and a half years ago! — Louis Dionne did a little compile-time benchmark
and found that he could win “an improvement of […] about 13.9%” simply by search-and-replacing all of
Boost.Hana’s std::forward
s into static_cast
s. So he did that.
Now, these days, Clang understands std::forward
just like it understands strlen
. (You can disable that
clever behavior with -fno-builtin-strlen
, -fno-builtin-std-forward
.) As I understand it, this means that
Clang will avoid generating debug info for instantiations of std::forward
, and also inline it into the AST
more eagerly. Basically, Clang can short-circuit some of the compile-time cost of std::forward
. But does
it short-circuit enough of the cost to win back Louis’s 13.9% improvement? Would that patch from 2015
still pass muster today? Let’s reproduce Louis’s benchmark numbers and find out!