What breaks without implicit T*-to-bool conversion?

Pop quiz: Does this program return 1 or 2?

int f(bool) { return 1; }
int f(std::string_view) { return 2; }
int main() { return f("hello"); }

It returns 1. Overload resolution sees that we have a choice of converting "hello" (effectively a const char*) to either bool or string_view. The former is a built-in standard conversion, while the latter is a user-defined conversion (which calls the constructor of the user-defined string_view class type), so overload resolution considers the former to be a better match.

However, this is surprising and usually unwanted. C++20 took a small step by deciding (via P1957) that a T*-to-bool conversion should be considered narrowing, and thus forbidden in list-initialization. But could we go further and make T*-to-bool conversion non-implicit?

The idea is to forbid implicit conversions—

void take(bool);
bool give(int *p) {
  bool b = p;  // OUTLAWED
  take(p);     // OUTLAWED
  return p;    // OUTLAWED
}

while still permitting explicit and contextual conversions—

bool okay(int *p) {
  bool b(p);            // OK
  if (p) blah();        // OK
  bool c = (p && p);    // OK
  bool d = (p || p);    // OK
  int e = (p ? 1 : 2);  // OK
}

So, I patched my local copy of Clang to forbid implicit conversion from T* to bool. See the patch here. Here’s all the things that broke when I compiled LLVM/Clang itself with my patched compiler.

Unless you’re an LLVM maintainer, you might want to skip to the end. The majority of this post is a catalog of bugs and unfortunate code patterns, similar to what you’ll find on the PVS-Studio blog.

The “hypocritical operator bool” pattern

As we know, you should always mark your operator bool as explicit, so that it will be available for contextual conversions but not implicit ones. (That is, your class type will have the behavior that T* today fails to have.) But it seems to be very common that a programmer writing such an operator bool will rely on implicit conversion in the return statement! “Rules for thee but not for me,” you know.

struct MyPtr {
  int *ptr_;
  explicit operator bool() const { return ptr_; }
};

The Clang/LLVM codebase itself contains 30 instances of this “hypocritical operator bool” pattern. To eliminate the implicit T*-to-bool conversion here, I recommend return ptr_ != nullptr, return bool(ptr_), or return !!ptr_.

(This audit incidentally turned up incorrectly non-explicit operator bools here here, here, here; and incorrectly non-const-qualified operator bools here, here. A good linter would flag all of these.)

Classes convertible to T* but not bool

Consider:

std::atomic<int*> a;
if (a) {

My patched Clang complains: no viable conversion from 'std::atomic<int *>' to 'bool'. That’s because the conversion from atomic<T*> to bool happens in two steps: first the user-defined conversion via atomic<T*>::operator T*() const, and then a standard conversion from T* to bool (which we chose to outlaw). The workaround here is to write explicitly if (a.load()).

Operations on atomics (even loads and stores) should always be explicit. Limit yourself to one atomic operation per source line.

The Clang/LLVM codebase contains two instances of this pattern with std::atomic, and nine instances involving user-defined “smart-pointer-like” types such as class TypedTrackingMDRef and class MDOperand which provide a converting operator T* but no dedicated operator bool.

Class types which provide operator T* should probably also provide explicit operator bool. However, your class type almost certainly should not provide operator T* to begin with. Prefer a named method such as .get(), .value(), or .load().

Also, classes whose operator T* is never-null should probably not provide an operator bool, because that would just mask misuses; see “Probable bug #1” below.

assert-like functions

The standard library’s assert macro is defined to wrap its argument in a contextual conversion to bool. But if you’ve written your own assert-like function which is not a macro and which takes bool as its first parameter…

void my_assert(bool cond);
int main(int, char **argv) {
  assert(argv);     // OK, no problem
  my_assert(argv);  // OUTLAWED
}

The LLVM/Clang codebase has one such function, used twice.

A surprising T *isFoo() antipattern

The usual convention in LLVM/Clang — and I hope in general — is to have paired accessors like this:

bool hasFoo() const;
const Foo *getFoo() const;

But in this one place the programmer has “cleverly” combined both accessors into one:

const FunctionDecl *isLocalClass() const;

Since pointers implicitly convert to bools, this lets the programmer write:

if (RD->isLocalClass())
  const FunctionDecl *FD = RD->isLocalClass();

But it means that our patched compiler complains here:

error: assigning to 'bool' from incompatible type 'FunctionDecl *'
 6902 |       NeedInstantiate = RD->isLocalClass();
      |                         ~~~~^~~~~~~~~~~~~~

The same pattern recurs here in LLVM.

Intentional use in bool initializers and assignments

LLVM/Clang relies on pointer-to-boolean conversion pretty heavily, for things like this:

bool EHa = M->getModuleFlag("eh-asynch");
const bool HaveSpace = ::strchr(ArgV[I], ' ');
bool IgnoreErrors = Errs;

This kind of thing comes up about 73 times in LLVM/Clang. Each instance is trivial (if tedious) to patch:

bool EHa = (M->getModuleFlag("eh-asynch") != nullptr);
bool HaveSpace = (::strchr(ArgV[I], ' ') != nullptr);
bool IgnoreErrors = (Errs != nullptr);

A special case comes up at least 6 times, e.g. here and here:

isInvariantLoad = LI->getMetadata(LLVMContext::MD_invariant_load);

would be better written as

isInvariantLoad = LI->hasMetadata(LLVMContext::MD_invariant_load);

This is why these boolean accessors exist, after all!

Actual bug #1

Consider this code:

MachineBasicBlock &Src = *MI.getParent();
for (auto &Succ : Src.successors())
  addInsertPoint(Src, Succ);

The patched compiler complains (in part):

llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:850:7:
error: no matching member function for call to 'addInsertPoint'
  850 |       addInsertPoint(Src, Succ);
      |       ^~~~~~~~~~~~~~
llvm/include/llvm/CodeGen/GlobalISel/RegBankSelect.h:375:10:
note: candidate function not viable: no known conversion from 'llvm::MachineBasicBlock *'
to 'MachineBasicBlock &' for 2nd argument; dereference the argument with *
  375 |     void addInsertPoint(MachineBasicBlock &Src, MachineBasicBlock &Dst);
      |          ^                                      ~~~~~~~~~~~~~~~~~~~~~~
llvm/include/llvm/CodeGen/GlobalISel/RegBankSelect.h:371:10:
note: candidate function not viable: no known conversion from 'llvm::MachineBasicBlock *'
to 'bool' for 2nd argument
  371 |     void addInsertPoint(MachineBasicBlock &MBB, bool Beginning);
      |          ^                                      ~~~~~~~~~~~~~~

In vanilla C++, it’s unambiguously that second overload of addInsertPoint that gets called. But circumstantial evidence suggests that the programmer meant to write

MachineBasicBlock &Src = *MI.getParent();
for (auto *Succ : Src.successors())
  addInsertPoint(Src, *Succ);

to call the first overload instead.

Actual bug #2

Here:

error: no matching member function for call to 'shouldSignReturnAddress'
  125 |     if (!MFI->shouldSignReturnAddress(MF))
      |          ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
note: candidate function not viable: no known conversion from 'const MachineFunction *'
to 'const MachineFunction' for 1st argument; dereference the argument with *
  551 |   bool shouldSignReturnAddress(const MachineFunction &MF) const;
      |        ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~
note: candidate function not viable: no known conversion from 'const MachineFunction *'
to 'bool' for 1st argument
  552 |   bool shouldSignReturnAddress(bool SpillsLR) const;
      |        ^                       ~~~~~~~~~~~~~

Again, in vanilla C++ MF would unambiguously convert to bool and call the second overload. The programmer wrote MF when he meant *MF.

Don’t gratuitously give two different functions the same name. These functions should have been named distinctly, e.g. shouldSignReturnAddressOf(MF) and shouldSignReturnAddressGivenSpill(bool). But even within an unavoidable overload set, you should avoid overloading on a bool argument, because of how promiscuously things convert to bool. Prefer to take an enum or maybe even an int, rather than a bool.

Actual bug #3

Here:

if (!UnrollRuntimeLoopRemainder(L, Count, /*AllowExpensiveTripCount*/ false,
        /*UseEpilogRemainder*/ true,
        UnrollRemainder, /*ForgetAllSCEV*/ false,
        LI, SE, DT, AC, TTI, true,
        SCEVCheapExpansionBudget, EpilogueLoop)) {

The patched compiler complains:

 error: no matching function for call to 'UnrollRuntimeLoopRemainder'
 note: candidate function not viable: no known conversion
 from 'Loop **' to 'bool' for 14th argument

When you see the compiler talking about a function’s “14th argument,” you know you’re in for a fun time. The intended callee’s signature is:

LLVM_ABI bool UnrollRuntimeLoopRemainder(
    Loop *L, unsigned Count, bool AllowExpensiveTripCount,
    bool UseEpilogRemainder, bool UnrollRemainder, bool ForgetAllSCEV,
    LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
    const TargetTransformInfo *TTI, bool PreserveLCSSA,
    unsigned SCEVExpansionBudget, bool RuntimeUnrollMultiExit,
    Loop **ResultLoop = nullptr);

Default function arguments are the devil. In this case, it looks like the programmer forgot to pass a boolean value for the 14th argument RuntimeUnrollMultiExit, and so his intended 15th argument, ResultLoop, was quietly coerced to bool instead. This should be fixed by eliminating the default function argument and passing all 15 arguments explicitly at each call-site.

Probable bug #1: Testing never-null raw pointers

LLVM/Clang is full of places where a function takes a simple boolean, but the programmer apparently thought something more structured was required. For example, here the programmer writes Builder.getTrue() when a simple true would suffice. Builder.getTrue() returns a pointer to a compiler-internal data structure representing truthiness — which, since it is a non-null pointer, implicitly converts to true! Note that Builder.getFalse() would also evaluate to true.

error: cannot initialize a parameter of type 'bool' with an rvalue of type 'ConstantInt *'
 1678 |       Builder.CreateFlagStore(Builder.getTrue(), NRVOFlag);
      |                               ^~~~~~~~~~~~~~~~~

Likewise here and here:

error: cannot initialize a parameter of type 'bool' with an lvalue of type 'llvm::IntegerType *'
 4581 |                                  ? EmitScalarExpr(Filter, CGM.Int32Ty)
      |                                                           ^~~~~~~~~~~

And here (along a codepath where Value is invariably non-null):

error: cannot initialize a parameter of type 'bool' with an lvalue of type 'APValue *'
 7061 |         MaterializedType.isConstantStorage(getContext(), /*ExcludeCtor*/ Value,
      |                                                                          ^~~~~

Probable bug #2: Testing never-null smart pointers

Here, here, and here:

error: no viable conversion from 'DirectoryEntryRef' to 'bool'
 10255 |       if (Cache->OrigEntry && Cache->OrigEntry->getDir()) {
       |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~
note: candidate function
   87 |   operator const DirectoryEntry *() const { return &getDirEntry(); }
      |   ^

This is the same pattern as “Classes convertible to T* but not bool” above; but in this case the existing operator T* invariably returns non-null. So this if test was never testing anything.

This bug was detectable only because DirectoryEntryRef defined operator T* without operator bool. It certainly should not define either of those conversions (and my understanding is that that’s the maintainers’ long-term plan, too).

Surprising usage in checkEnvVar

Here the Clang programmer has done something very weird. Essentially, it’s:

template<class T>
T checkEnvVar(const char *varname) {
    const char *p = ::getenv(varname);
    if (p == nullptr) return T();
    return value;
}

The checkEnvVar template is instantiated in only two ways. checkEnvVar<std::string> returns "" if the variable is unset, or the contents of the variable if it’s set. checkEnvVar<bool> returns false if the variable is unset, or true if it’s set (even if it’s set to the empty string, or to 0). The same template code happens to implement both behaviors. I was surprised to see this, but it really does seem to be intentional. The workaround is to change return value to return T(value).

The final tally

In the LLVM/Clang codebase as of this writing — I very roughly estimate 3.8 million lines of code — here’s the total count of places that rely on implicit T*-to-bool conversion in various ways:

Pattern Count
getMetadata for hasMetadata 6
checkEnvVar 1
Actual bug #1 1
Actual bug #2 1
Actual bug #3 1
Probable bug #1 3
Probable bug #2 3
bool x = p; 48
Other x = p; 19
atomic<T*> 2
Other operator T* without operator bool 9
Hypocritical operator bool 30
Other return p; 117
Hash.AddBoolean(p) 21
Diag(Loc, diag::err_riscv_type_requires_extension, D) 8
assert_with_loc 2
Other f(p) 50
Total 322

I also audited another, proprietary, C++17 codebase — very roughly 1.2 million lines of code. Its base rate of T*-to-bool conversions per line is only 30% of LLVM/Clang’s. The audit turned up one non-explicit operator bool, and zero bugs.

Pattern Count
bool x = getenv(s); 2
Other bool x = p; 3
atomic<T*> 3
Hypocritical operator bool 2
Other return p; 16
f(p) 4
Total 30
Posted 2025-10-05