How “static storage for initializers” did at Varna

It’s still on my to-do list to write a proper “How my papers did at Varna” post. But here’s my top story: P2752 “Static storage for braced initializers” was adopted for C++26! (Vendors are expected to DR it back to C++11.) This permits the compiler to store the backing array for {1,2,3} in static read-only data, instead of having to put it on the stack.


During the Varna meeting, the following interesting scenario was discovered:

struct S {
  constexpr S(int i) : i(i) {}
  mutable int i;
};

void f(std::initializer_list<S> il) {
  assert(il.begin()->i == 1); // OK
  il.begin()->i = 4;
}

int main() {
  for (int i = 0; i < 2; ++i) {
    f({1,2,3});
  }
}

It is not legal for a C++ program to modify the contents of a string literal like "abc", because the backing array of that literal is of type const char[4] and it’s not legal to modify an object whose dynamic type is const. But it is perfectly legal to modify the contents of il.begin()->i, because it’s mutable. So the compiler must resist the urge to place the constinit-able const S[3] backing {1,2,3} into static storage. This isn’t difficult for the compiler to do; it’s just difficult to think of.

This exact example is given in P2752R3 §4, so anyone implementing the paper should see it and give their implementation a matching test case.


Speaking of string literals, it was also discussed whether string literals and backing arrays should be allowed to share storage with other objects. (This is now CWG2753.) The status quo is that string literals are explicitly allowed to overlap other string literals; backing arrays (since P2752) are explicitly allowed to overlap other backing arrays; no other combinations are explicitly permitted. It certainly seems desirable to explicitly forbid named objects (such as variables and structured bindings) from overlapping other objects, but we don’t exactly say it straight out anywhere. Clang already exploits that lack of wording to miscompile the following toy hash table (Godbolt):

// Define a second singular value for `const char*`,
// distinct from nullptr and distinct from any string
// that might be added to the hash table.
//
const char *tombstone() {
  static constexpr char dummy[1] = {};
  return dummy;
}

struct HashTable {
  const char *table_[100] = {};
  const char **find(const char *s) {
    for (const char *&p : table_) {
      if (p == nullptr) {
        return nullptr;
      } else if (p == tombstone()) {
        // A value was here, but now it's removed
      } else if (std::strcmp(p, s) == 0) {
        return &p;
      }
    }
    return nullptr;
  }
  bool contains(const char *s) {
    return find(s) != nullptr;
  }
  void erase(const char *s) {
    if (const char **p = find(s)) {
      *p = tombstone();
    }
  }
  void insert(const char *s) {
    for (const char *&p : table_) {
      if (p == nullptr || p == tombstone()) {
        p = s;
        return;
      }
    }
    throw "table is out of space";
  }
};

int main() {
  HashTable ht;
  const char *x = "a";
  assert(!ht.contains(x));
  ht.insert(x);
  assert(ht.contains(x));

  const char *y = "";
  assert(!ht.contains(y));
  ht.insert(y);
  assert(ht.contains(y)); // fails on Clang
}

The problem is that Clang thinks it’s okay to alias the storage of dummy with the storage of the string literal "" pointed to by y (since they both are const char[1] arrays holding a single null byte). This defeats our attempt to get a singular tombstone value, unequal to any string in the program. This is Clang bug #57957. I expect it will be fixed within a year, now that there’s a CWG issue on the subject.


Previously on this blog:

Posted 2023-07-05