Field-testing “Down with lifetime extension!”
I hacked my local Clang to produce a warning every time Sema::checkInitializerLifetime
detected that lifetime extension was necessary.
(My patch is on GitHub here.)
test.cpp:31:16: warning: binding temporary of type 'int' to a reference
relies on lifetime extension [-Wlifetime-extension]
const int& i2 = 42;
^~
Then I ran it over the LLVM codebase by rebuilding Clang with itself (“How to build LLVM from source, monorepo version” (2019-11-09)). Here’s what I found.
False positive #1: for
loops lower to lifetime-extension
The first thing I had to do was insert a special case to prevent the diagnostic from firing
on ranged for
loops.
std::vector<int> getVector();
for (int i : getVector()) { ... }
is lowered by the front-end into the moral equivalent of
auto&& __range1 = getVector();
auto __begin1 = __range1.begin();
auto __end1 = __range1.end();
for ( ; __begin1 != __end1; ++__begin1) { ... }
Since copy elision exists, I cannot think of any reason for the lowering to use auto&&
instead of decltype(auto)
. But the language standard says auto&&
, so that’s what we get.
(EDIT: Mathias Stearn points out that if you wanted to respecify the feature using decltype(auto)
,
you’d also have to wrap the range expression in an extra pair of parens. decltype(auto) r1 = a;
and decltype(auto) r1 = (a);
have different semantics, and we want the latter.)
In this example, the lifetime of the temporary returned by getVector()
is extended to match
the lifetime of __range1
. But this happens “behind the scenes”; we don’t want to discourage
the use of ranged for
-loops in general. So we suppress the diagnostic in this case.
Incidentally, I had hoped that I could detect this case by looking at
VarDecl *VD = dyn_cast<VarDecl>(ExtendingEntity->getDecl());
if (VD && VD->isCXXForRangeDecl()) { ... }
but nope. I don’t know what Clang uses that bit for, but apparently it’s not for marking
lowered variables like __range1
. I ended up just looking for
VD->getNameAsString().substr(0,7) == "__range"
.
False positive #2: std::initializer_list
uses lifetime extension
LLVM’s AMDGPUISelLowering.cpp contains a lot of code like this:
SDValue Ops[2];
SDValue Ops[2] = { Div, Rem };
SDValue Ops[] = {Join, DAG.getNode(ISD::TokenFactor, SL, MVT::Other,
LoLoad.getValue(1), HiLoad.getValue(1))};
And then one instance like this:
auto Ops = {DAG.getConstant(0, SDLoc(), Op.getValueType()), Op.getOperand(0)};
My diagnostic fired on that line. The line turns out to be okay… but can you see why Clang thinks lifetime extension is happening here?
It’s because auto
applied to a braced initializer-list
deduces std::initializer_list
.
What we have here is the equivalent of
std::initializer_list<SDValue> Ops = { sd1, sd2 };
That lowers into something that’s impossible to spell in C++. The initializer_list
object named Ops
is just a pair of pointers. Those pointers point into a backing array of type const SDValue[2]
…
but that array is actually a temporary. (It cannot be static data, because the initial values
of sd1
and sd2
may vary at runtime.) The temporary array undergoes lifetime extension so that its
lifetime will match the lifetime of Ops
.
I consider that line of code in AMDGPUISelLowering.cpp to be poorly written; it would be better if it
said SDValue Ops[2] = ...
instead of auto Ops = ...
. I suspect that it is almost
always unintended for anyone ever to create a local variable of type std::initializer_list
.
But that’s a job for a different diagnostic.
False positive #3: Clang’s GSL diagnostics interfere with lifetime-extension bookkeeping
I observed that a line like
std::reverse_iterator<int*> rit = std::reverse_iterator<int*>();
would trigger my diagnostic, if and only if the <array>
header was included. I eventually tracked
down the cause to Sema::inferGslPointerAttribute
,
which adds some kind of GSL-related ownership-tracking attribute to this class if and only if it’s
known to be the aliasee of std::array<int, 1>::reverse_iterator
. And then that attribute gets used
somehow in Sema::checkInitializerLifetime
.
Anyway, I hacked around that by removing all the code in Sema::inferGslPointerAttribute
.
True positive #1: Redundant lifetime extension on iterators
LLVM contains lots of instances of this pattern:
const auto &Begin = Indices.begin();
const auto &End = Indices.end();
I think where this comes from is that people see
for (const auto &Elt : Indices)
and then when they need to break it out into a more complicated loop for some algorithmic reason,
they propagate the const&
-ness of the “loop variables” without thinking about it.
Essentially, I call this fuzzy thinking about iterators versus references. C++’s lifetime extension means that you can often use these two constructs interchangeably:
const auto &it = vec.front();
use(it);
const auto &it = vec.begin(); // lifetime-extended
use(*it);
But should you use them interchangeably? I say no.
I count this one tentatively as a success story for -Wlifetime-extension
: it found silly code that was technically
correct but could be made more readable by eliminating the use of lifetime extension.
True positive #2: Redundant lifetime extension on APInt
This is an interesting one, because it is a true positive that nevertheless can’t be improved by refactoring. LLVM contains lots of instances of this pattern:
const APInt &StartInt = StartC->getAPInt();
const APInt &StartRem = StartInt.urem(StepInt);
const APInt &C = SC->getAPInt();
const APInt &D = extractConstantWithoutWrapping(*this, C, Step);
In each of these cases, the first variable is initialized with SC->getAPInt()
, which returns
const APInt&
. The second variable is initialized with the result of some function that returns
a new APInt
by value. So the one of these declarations involves lifetime extension, and the
other doesn’t. Yet they look parallel.
My kneejerk reaction is that this is more of a win than a lose. It is nice that we can make both declarations look the same, because we’re going to use them pretty much the same way. The compiler takes care of the details in the most efficient way possible. This is good.
As in the for-loop case, we could rewrite it to use decltype(auto)
and stay just as efficient
and just as parallel:
decltype(auto) StartInt = StartC->getAPInt(); // by reference
decltype(auto) StartRem = StartInt.urem(StepInt); // by value
But this syntax hides the fact that both of them are APInt
s, so it would never fly
with LLVM coding style, nor with me.
I count this as a good use of lifetime extension. I can’t see how avoiding lifetime extension could possibly improve this code.
True positive #3: Conditionally redundant lifetime extension in template code
In SetOperations.h we find the following template:
template <class S1Ty, class S2Ty>
void set_intersect(S1Ty &S1, const S2Ty &S2) {
for (typename S1Ty::iterator I = S1.begin(); I != S1.end();) {
const auto &E = *I;
++I;
if (!S2.count(E)) S1.erase(E); // Erase element if not in S2
}
}
If we’re intersecting std::set<std::string>
, it certainly makes sense to take a reference to
*I
, instead of copying it. But LLVM also provides sets like
llvm::SmallPtrSet<llvm::Value*, 4>
which are optimized for storing small trivially copyable elements while avoiding template bloat,
and in those cases, *I
actually returns by value!
const PtrTy operator*() const {
[...]
return PtrTraits::getFromVoidPointer(const_cast<void*>(*Bucket));
}
When *I
returns a value, that temporary value is bound to E
and lifetime-extended.
The spurious
const
at the front of that declaration is leftover cruft from a July 2007 commit that sprinkledconst
randomly, in utter mental confusion betweenT const*
andT *const
. For guidelines about when to useconst
, see “const
is a contract” (2019-01-03). This digression is completely irrelevant to lifetime extension.
We could avoid lifetime extension in set_intersect
by using decltype(auto) E
instead of const auto &E
.
However, const auto&
gives the reader the additional information that we do not plan to modify E
.
There is no such thing as decltype(const auto)
.
Also, the line const auto& E = *I;
is completely idiomatic C++, and it would be silly to change it
(A) just to avoid lifetime extension (B) in the presence of wacky iterator types.
If any line of code is at fault here, it’s SmallPtrSetIterator
for having a reference
typedef
that’s not a reference type.
I count this as an unavoidable use of lifetime extension,
although I would be moderately sympathetic to a compiler warning that fired in this case
and forced someone to grapple with the arguably unidiomatic design of llvm::SmallPtrSetIterator
.
True positive #4: Refactoring-induced lifetime extension on std::string
LLVM contains many instances of this pattern:
const std::string &LibPath =
std::string(GCCInstallation.getParentLibPath());
This looked silly enough that I went searching for the reason. It turns out that this line used to look saner:
const std::string &LibPath =
GCCInstallation.getParentLibPath();
Prior to November 2011, GCCInstallation.getParentLibPath()
actually returned const std::string&
, and so
this line did exactly what it looked like, and it was efficient.
But in November 2011, getParentLibPath()
was changed
to return llvm::StringRef
instead of const std::string&
. Since StringRef
was implicitly convertible to
std::string
, this line still compiled. But now it was quietly making a copy of the referenced string
and lifetime-extending that temporary copy.
Then, in January 2020, they decided that having an implicit conversion from StringRef
to string
was a bad idea.
(std::string_view
doesn’t have such a conversion, either.) So
a machine-generated refactoring
added explicit casts to std::string
around all the places where a StringRef
was being implicitly converted.
And thus we end up with the silly-looking line above, which gets a StringRef
, copies its contents into a
temporary std::string
, and then lifetime-extends the temporary by binding it to a reference variable.
What we actually want here is simply
StringRef LibPath =
GCCInstallation.getParentLibPath();
I count this one as an unqualified success story for -Wlifetime-extension
: it found silly code that was technically
correct but could be made more readable and more efficient by eliminating the use of lifetime extension.
True positive #5: Examples of utterly bogus lifetime extension
I don’t want to leave you with the impression that every usage of lifetime extension in LLVM is “interesting.” Here are some snippets where using lifetime extension is (technically correct but) obviously wrong:
auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {
return tryToVectorize(I, R);
};
AliasAnalysis.h.
This was caused by refactoring; Loc
used to be a by-reference parameter.
ModRefInfo getModRefInfo(const Instruction *I,
const Optional<MemoryLocation> &OptLoc) {
const MemoryLocation &Loc = OptLoc.getValueOr(MemoryLocation());
Lint.cpp.
Prior to a February 2013 refactoring
that made getAttributes()
return by value instead of by const reference, this code correctly avoided a copy.
After the refactoring, this code relies on lifetime extension:
const AttributeList &PAL = CI->getAttributes();
Conclusion
If -Wlifetime-extension
existed as a clang-tidy diagnostic, I would use it. Lifetime extension is usually
unintentional, and therefore a diagnostic that calls it out will likely call out poorly understood areas of the
code (as in our false positive #2 and true positives #1, #4, and #5). But I have seen some good uses of
lifetime extension too (as in true positives #2, #3, and arguably by enabling #4).
There are certainly a lot of cases of lifetime extension in the wild.