Arthur O’Dwyer

`std::is_heap` could be faster

2026-05-11T00:01:00+00:00

The other day I was noodling around with some libc++ unit-test code that looked roughly like this (Godbolt):

template
auto extract_container(A& a) {
  struct UnwrapAdaptor : A { A::container_type& cc = A::c; };
  return UnwrapAdaptor(a).cc;
}

template
void test_push_range(bool is_heapified) {
  int in1[] = {1,3,7};
  int in2[] = {2,4,5,6};
  int expected[] = {1,3,7,2,4,5,6};
  Adaptor a;
  a.push_range(in1);
  a.push_range(in2);
  if (auto c = extract_container(a); is_heapified) {
    assert(std::ranges::is_heap(c));
    assert(std::ranges::is_permutation(c, expected));
  } else {
    assert(std::ranges::equal(c, expected));
  }
}

int main() {
  test_push_range>(false);
  test_push_range>(false);
  test_push_range>(true);
}

I tried to extend main to test also some non-default containers. (Incidentally, did you know stack’s default container is deque, not vector?)

  test_push_range>>(false);
  test_push_range>>(false);

…And suddenly the unit test no longer compiled!

error: no matching function for call to object of type 'const __is_heap'
   22 |     assert(std::ranges::is_heap(c));
      |            ^~~~~~~~~~~~~~~~~~~~
[...]
note: because 'std::list &' does not satisfy 'random_access_range'

That caught me by surprise. Why should testing whether a range is heapified require random access? It’s simple to implement with only forward traversal, and make_heap/is_heap seems to analogize perfectly against sort/is_sorted. is_sorted doesn’t require random access; why should is_heap?

Ranges algorithm	Constraint
`sort`	`random_access_range`
`is_sorted`	`forward_range`
`is_sorted_until`	`forward_range`
`make_heap`	`random_access_range`
`is_heap`	`random_access_range` (could be `forward_range`)
`is_heap_until`	`random_access_range` (could be `forward_range`)
`partition`	`forward_range`
`is_partitioned`	`input_range`
`is_partitioned_until`	`input_range`
`unique`	`forward_range`
`is_uniqued`	`forward_range`
`adjacent_find`	`forward_range`

Note that is_partitioned only needs to view one element at a time, and remember the boolean value of pred(elt). By contrast, is_sorted, adjacent_find, and is_heap need to view two elements at a time; that’s why they can’t handle input_range.

The two “could bes” in the table above seem to indicate a design defect.

Benchmark it!

libc++’s implementation of std::is_heap looks shockingly inefficient: its 24 lines spend a lot of time recomputing subexpressions like __first + __c (to get to the c’th element) and 2 * __p + 1 (to compute the child index from the parent). A straight-line implementation, avoiding all arithmetic and just advancing two iterators in lockstep, would have taken only 18 lines:

template 
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
__is_heap_until(_ForwardIterator __first, _Sentinel __last, _Compare&& __comp) {
  _ForwardIterator __child = __first;
  if (__child == __last) {
    return __child;
  }
  while (true) {
    ++__child;
    if (__child == __last || __comp(*__first, *__child))
      break;
    ++__child;
    if (__child == __last || __comp(*__first, *__child))
      break;
    ++__first;
  }
  return __child;
}

Surprisingly, libstdc++ and MS STL also spend time adding and dividing (or shifting) when they don’t need to.

Still, just because a piece of code looks inefficient doesn’t always mean that it is inefficient. So I whipped up a simple benchmark:

void BM_vector_half(benchmark::State& state) {
  auto n = state.range(0);
  auto v = std::vector(n);
  std::mt19937 g;
  std::generate(v.begin(), v.end(), g);
  std::make_heap(v.begin(), v.begin() + (n / 2));
  for (auto _ : state) {
    benchmark::DoNotOptimize(v);
    auto it = std::is_heap_until(v.begin(), v.end());
    benchmark::DoNotOptimize(it);
  }
}

vector_full is the same but with make_heap(v.begin(), v.end()). deque_{half,full} are the same but with deque instead of vector.

Here’s the benchmark result before and after applying the patch above to libc++. In this case, our eyes don’t lie: what looks inefficient is inefficient! (Note that our comparator here is less, which is cheap. An expensive comparator could dwarf the cost of arithmetic, making the benefit of this patch less perceptible.)

Benchmark	`n`	CPU time (before)	CPU time (after)	%
`vector_half`	1K	4155 ns	2658 ns	−36%
`vector_half`	100K	311560 ns	263807 ns	−15%
`vector_half`	10M	32487789 ns	26089364 ns	−19%
`vector_full`	1K	8813 ns	5294 ns	−39%
`vector_full`	100K	644535 ns	535987 ns	−16%
`vector_full`	10M	59771100 ns	52807359 ns	−11%
`deque_half`	1K	4186 ns	2662 ns	−36%
`deque_half`	100K	375844 ns	264856 ns	−29%
`deque_half`	10M	31999750 ns	26152052 ns	−18%
`deque_full`	1K	9002 ns	5271 ns	−41%
`deque_full`	100K	619876 ns	523727 ns	−15%
`deque_full`	10M	65697333 ns	52488343 ns	−20%

Even with the current specification of is_heap, we can achieve this performance today. The algorithm can remain constrained on random_access_range (thus doing nothing to solve the motivating use-case that introduced this post) while internally using nothing more than forward-iterator operations. But once the algorithm uses nothing more than forward-iterator operations… wouldn’t it be nice for the Standard to say so?

Conclusions

libc++ should improve its is_heap implementation along the lines above, reaping the performance benefit.
So should libstdc++ and Microsoft STL, I expect.
WG21 should consider relaxing the constraint on is_heap and is_heap_until from “random access” to “forward,” incidentally solving my original use-case. I’ll probably bring a paper to this effect at some point. Note that if such a paper were adopted, all three vendors would have to change their implementations to the faster one.

Two-Minute _Iolanthe_

2026-05-08T00:01:00+00:00

The other day I came across Connie Kleinjans’ page of “two-minute versions” of G&S shows. She’s got two versions of Gondoliers and one each of Iolanthe and Ruddigore. The technique is the same as in blackout poetry: take the whole work and black out all but the most important and/or funniest bits.

Kleinjans’ short scripts are funny in their own rights, but I wanted audio versions; so I made one. Presenting “Two-Minute Iolanthe in five minutes”.

The original recording I “blacked out” for this video is a TV broadcast of a 1976 production at the Sydney Opera House featuring Rosemary Gunn (Iolanthe), Heather Begg (Fairy Queen), Dennis Olsen (the Lord Chancellor), June Bronhill (Phyllis), Lyndon Terracini (Strephon), Graeme Ewer (Mountararat), Ronald Maconaghie (Tolloller), and Alan Light (Private Willis). This is a low-quality VHS rip of an excellent performance.

The VHS rip on YouTube is missing a chunk of the finale, which prompted some alterations to the script; I made a few other alterations for pacing. Even after those cuts, this “two-minute” Iolanthe is almost five minutes long; watch at 2.3x speed for a true two-minute experience.

To create the video, I used ffmpeg to clip and concat the snippets. When concatenating 169 tiny snippets, your biggest problem will be “timestamp drift” between the audio and video channels. I spent a long time cajoling ChatGPT into giving me new permutations of command-line switches to try before finally settling on the programs linked below.

Step 1 was to get the original video (2.4 gigabytes, saved as input.mkv):

brew install ffmpeg yt-dlp
yt-dlp 'https://www.youtube.com/watch?v=DLYSZN2IqeU'

Step 2 was to snip the constituent bits via ffmpeg commands. I factored out the common arguments into environment variables so I didn’t have to keep typing or tabbing over them.

PREFIX="-y"
SUFFIX="-i input.mkv -c:v libx264 -preset veryfast -crf 20 -c:a aac"
ffmpeg $PREFIX -ss 00:07:07.2 -to 00:07:10.5 $SUFFIX part001.mkv
ffmpeg $PREFIX -ss 00:07:45.0 -to 00:07:48.6 $SUFFIX part002.mkv
~~~~

ffmpeg turns out to be supremely sensitive to whether you put the input (-i input.mkv) before, or after, the -ss and -to switches. With -i input.mkv as part of the SUFFIX, my whole script.txt runs in 61 seconds; as part of the PREFIX, it takes 98 minutes.

To concatenate all those clips and re-encode a “preview” video, we can do this:

rm list.txt
for i in part*.mkv ; do echo "file $i" >>list.txt ; done
ffmpeg -y -f concat -safe 0 -i list.txt \
  -c:v libx264 -crf 20 -preset veryfast -c:a aac -ar 48000 output.mp4

Step 3 was to fight timestamp desynchronization. My solution here was generated entirely by blind fumbling and incantations, with input from ChatGPT. It seems that there are basically two ways to get ffmpeg to “supercut” a video as we’re doing here: either clip out the clips into temporary files and then concatenate all those little files (as we did above — this way causes a lot of drift), or do one big “filter” operation to take just the frames you care about in a single ffmpeg invocation. That looks like this, except with 169 clips instead of 3:

ffmpeg -y -i input.mkv -filter_complex \
 "[0:v]trim=start=427.2:end=430.5,setpts=PTS-STARTPTS[v0];
  [0:a]atrim=start=427.2:end=430.5,asetpts=PTS-STARTPTS[a0];
  [0:v]trim=start=465.0:end=468.6,setpts=PTS-STARTPTS[v1];
  [0:a]atrim=start=465.0:end=468.6,asetpts=PTS-STARTPTS[a1];
  [0:v]trim=start=616.5:end=618.7,setpts=PTS-STARTPTS[v2];
  [0:a]atrim=start=616.5:end=618.7,asetpts=PTS-STARTPTS[a2];
  [v0][a0][v1][a1][v2][a2]concat=n=3:v=1:a=1[v][a]"
  -map [v] -map [a] \
  -c:v libx264 -crf 20 -preset veryfast \
  -c:a aac -b:a 192k -ar 48000 \
  output.mp4

That’s horribly slow; it seems to have quadratic behavior as the number of clips increases. Even worse is trying to use the non-“complex” filter options -vf and -af:

ffmpeg -y -i input.mkv \
  -vf "select=between(t,427.2,430.5)+between(t,465.0,468.6)+between(t,616.5,618.7),setpts=N/FRAME_RATE/TB" \
  -af "aselect=between(t,427.2,430.5)+between(t,465.0,468.6)+between(t,616.5,618.7),asetpts=N/SR/TB" \
  -c:v libx264 -crf 20 -preset veryfast \
  -c:a aac -b:a 192k -ar 48000 \
  output.mp4

That just makes ffmpeg run out of memory before you’ve even hit 50 clips. So I ended up using a hybrid approach: I used -filter_complex to produce nine intermediate concatenations of 20 clips at a time, and then used

ffmpeg -y -f concat -i list.txt -c copy output.mp4

to paste those nine files together. I’ve saved my programs for posterity: script.txt runs as a Bash script (in just over 1 minute on my machine) to create that “draft preview” video output; its textual contents also serve as input to script.py, which creates the final product (in about 9 minutes) using the two-level -filter_complex approach. The finished output.mp4 is about 42 megabytes in size.

C++ Alignment Chart

2026-05-06T00:01:00+00:00

ELF’s ways to combine potentially non-unique objects

2026-05-05T00:01:00+00:00

Previously I wrote:

[Template parameter objects of array type] are permitted to overlap or be coalesced, just like initializer_lists and string literals. Clang trunk isn’t smart enough to coalesce potentially non-unique objects [but] GCC, once it implements define_static_array, will presumably make them the same.

Well, GCC 16 has an experimental implementation of define_static_array (compile with g++ -std=c++26 -freflection), and it does not coalesce template parameter objects of array type in the way I expected. Digging deeper into why not, I learned that there are at least three ways compilers and linkers (on ELF — that is, non-Windows — platforms) conspire to “merge” potentially non-unique objects:

Merging at the compiler level (for initializer_list backing arrays)
Sections with SHF_MERGE (for string literals and backing arrays)
Sections with SHF_GROUP, a.k.a. COMDAT sections (for inline variables)

Sadly, no combination of these facilities quite achieves ideal behavior for define_static_array. Let’s take a look.

The compiler can merge similar data

GCC itself merges similar initializer_list backing arrays. For example (Godbolt):

void f(std::initializer_list);
int main() {
  f({1,2,3}); // C.0.0
  f({1,2,3}); // C.1.1
}

turns into the assembly directives

  .section .rodata.cst16,"aM",@progbits,16
  .align 8
  .type C.0.0, @object
  .size C.0.0, 12
C.0.0:
  .long 1
  .long 2
  .long 3
  .zero 4
  .set C.1.1,C.0.0

The symbols C.0.0 and C.1.1 are set to the same memory address, because GCC itself can see that the two initializer_list objects should have the same backing array.

This is the most powerful approach, but at the same time the least elegant, because it requires ad-hoc “smarts” built directly into the compiler. For example, we could imagine GCC generating code that merges one list into the tail of another:

void f(std::initializer_list);
int main() {
  f({1,2,3}); // C.0.0
  f({2,3});   // C.2.2
}

C.0.0:
  .long 1
  .long 2
  .long 3
  .zero 4
  .set C.2.2,C.0.0 + 4

but in fact GCC doesn’t generate that code, because nobody has taught GCC that specific trick. Nor will GCC 16 merge the backing arrays of {1,2,3} and {1u,2u,3u} using this technique, again because it hasn’t been taught to.

Merging things at the compiler level also, by definition, works only within a single translation unit (a single .cpp file). If you want to merge things between different TUs, you’ll need help from the linker. Which brings us to…

`SHF_MERGE` sections

I said GCC 16 wouldn’t merge {1,2,3} and {1u,2u,3u} in the compiler. But if you try this program (Godbolt), you’ll see that the backing arrays are indeed merged at runtime — the same pointer value is printed twice:

template
void f(std::initializer_list... ils) {
  (printf("%p\n", (const void*)ils.begin()) , ...);
}
int main() {
  f({1,2,3},     // C.0.0
    {1u,2u,3u}); // C.1.1
}

The same pointer value is printed twice, despite that we can see GCC emitting two different objects into the assembly file:

  .section .rodata.cst16,"aM",@progbits,16
  .align 8
  .type C.0.0, @object
  .size C.0.0, 12
C.0.0:
  .long 1
  .long 2
  .long 3
  .zero 4
  .align 8
  .type C.1.1, @object
  .size C.1.1, 12
C.1.1:
  .long 1
  .long 2
  .long 3
  .zero 4

The trick here is in the .section directive, which creates an ELF section in the object file with the name .rodata.cst16, the section flag SHF_MERGE (that’s the M in "aM"), and a sh_entsize of 16 bytes. After concatenating every object file’s .rodata.cst16 sections as usual, the linker is permitted (but not required) to treat the contents of this section as an array of 16-byte elements, and to deduplicate any identical elements it finds. Since the 16-byte region starting at C.1.1 matches the 16-byte region starting at C.0.0, the linker is allowed to eliminate the 16 bytes at C.1.1 and point the label C.1.1 at C.0.0 (or vice versa).

The SHF_MERGE trick works across TUs, and even across types. For example, GCC 14+ makes the following four initializer lists share a single backing array at runtime by putting them all into .rodata.cst16 sections with the SHF_MERGE flag set. This works even if the lists appear in different TUs! (Godbolt.)

{1,2,3,4}
{1u,2u,3u,4u}
{0x200000001, 0x400000003} // given little-endian int64
{4.2439915824e-314, 8.4879831654e-314} // ditto

The major optimization-related downside of the SHF_MERGE approach — as it is sketched in the System V ABI document (2001) and as it is implemented in GNU ld as far as I know — is that you can’t use it to merge data elements of different sizes or alignments. GCC 16 won’t merge {1,2} with {1,2,3} because GCC puts the former in section .rodata.cst8 and the latter in .rodata.cst16. (GCC 14 and 15 put the latter in plain old unmergeable .rodata instead, because its size — 12 bytes — isn’t precisely 16 bytes. GCC 16 fixed that.) Basically, GCC has to precommit every data element to a specific “bucket”; the linker will consider merging it only with other elements in its own bucket.

And SHF_MERGE cannot merge parts of elements; it can merge only full elements. So while you might think a “sufficiently smart linker” could merge {2,3} across the conjunction of {1,2} and {3,4}, the SHF_MERGE algorithm by itself will never do that. Some users might even rely on that guarantee (somehow), so I don’t imagine that any linker will ever gain the smarts to do that.

`SHF_MERGE | SHF_STRINGS`

When an ELF section specifies both SHF_MERGE and SHF_STRINGS, then instead of chunking the section into elements of size sh_entsize bytes, the linker chunks the section into variable-length elements each of which is a null-terminated C-style string. It then deduplicates those strings.

As in the previous subsection, it seems that no linker will merge "ello" into the tail of "Hello"; the SHF_MERGE|SHF_STRINGS algorithm alone will merge only full elements. (In this case, full null-terminated strings.)

The SHF_MERGE|SHF_STRINGS algorithm finds the “elements” of the section by simplemindedly scanning for null bytes; no additional metadata is involved. Therefore, such a section must never contain strings with embedded null bytes. Try the following on your machine:

cat >x.c <
int main() {
  puts("p");
  puts(&"qxp"[2]);
  puts("r");
}
EOF
gcc -S x.c -o - | sed 's/qxp/q\\0p/' > x.s
gcc x.s
./a.out

The assembly file x.s should end up containing something like this:

  .section .rodata.str1.1,"aMS",@progbits
.LC0:
  .string "p"
.LC1:
  .string "q\0p"  # Danger!
.LC2:
  .string "r"

and when executed, will print not p p r but rather p r r, because the SHF_MERGE algorithm understands "q\0p" as "q" "p" and eliminates the second "p" as a duplicate. Therefore, a C++ compiler must go out of its way never to store a string literal containing embedded null bytes into such a section.

const char *p1 = "hello world"; // literal in .rodata.str1.1 (SHF_MERGE)
const char *p2 = "hell\0world"; // literal in .rodata (not SHF_MERGE)

In theory, a compiler could place the backing array for an initializer_list into a mergeable string section, and potentially merge the backing arrays for {'x','y','z','\0'} and "xyz". In practice, neither GCC nor Clang does this (yet).

The big disadvantage of SHF_MERGE merging from the C++ compiler’s point of view — the thing that makes it unsuitable for certain use-cases such as merging the duplicate definitions of template parameter objects — is that it is completely optional. It’s legal for a dumb linker to just ignore the SHF_MERGE flag. It would be unwise to rely on SHF_MERGE to take care of merging objects that the C++ standard requires us to merge, such as the duplicate definitions of inline variables or template parameter objects.

And, of course, it would be wrong to place an inline variable into an SHF_MERGE section anyway, because the standard (and common sense) forbids us to merge unrelated inline variables just because they happen to have the same value!

inline constexpr char a[] = "hello world";
inline constexpr char b[] = "hello world";

Here &a == &b is guaranteed to be false; an implementation that made it true would be non-conforming. (Even gcc -fmerge-all-constants will make it true, non-conformingly, only if you remove the inline keyword in both places.)

To handle inline variables, which are deduplicated according to their symbol names rather than their data contents, we need the next approach, which is…

`SHF_GROUP`, a.k.a. COMDAT sections

For the past twenty-some years, C and C++ compilers have traditionally compiled inline functions, inline variables, and implicit instantiations of function and variable templates into what are called “COMDAT sections.” This feature came late to ELF; the name “COMDAT” apparently comes from Windows NT. For more than you ever wanted to know about COMDAT, see “COMDAT and section group” (Fangrui Song, July 2021).

ELF’s version of COMDAT was basically designed to do exactly what a C++ compiler needs in order to implement inline functions. The compiler can take C++ code such as

inline int f() {
  static int i = 42;
  return ++i;
}

and turn it into a whole group of sections (text, data, rodata, whatever else it needs) — basically a whole mini object file of its own — something like this:

  .section .text._Z1fv,"axG",@progbits,fgroup,comdat
  .globl _Z1fv
_Z1fv:
  movl _ZZ1fvE1i(%rip), %eax
  addl $1, %eax
  movl %eax, _ZZ1fvE1i(%rip)
  ret

  .section .data._ZZ1fvE1i,"awG",@progbits,fgroup,comdat
  .globl _ZZ1fvE1i
_ZZ1fvE1i:
  .long 42

The assembler emits an ELF section of type SHT_GROUP representing the group of sections with “group identifier” fgroup. The linker, at link time, will pick one object file’s fgroup group and throw away the rest.

Now, I simplified that codegen quite a bit. Really, GCC doesn’t make such a human-friendly section group; it dumps each section into its own individual section group (so in this example there will be two different group identifiers, not just one); and GCC also marks both f and i as .weak symbols rather than global. I’m not sure why GCC does these things; I conjecture “intermediate codegen targeting an object format less powerful than ELF” and “compatibility with very old ELF linkers lacking SHT_GROUP support” respectively, but I don’t know. Email and tell me!

COMDAT sections are exactly what you need to implement the definitions of (1) inline functions; (2) implicit instantiations of function templates; (3) the static local variables of inline functions and implicitly instantiated function templates; (4) implicit instantiations of variable templates; (5) inline variables and static inline data members of classes; and probably a few more things I’m forgetting. All of these are entities with names, and C++ requires them to be properly deduplicated by name: &myInlineFunc must have the same pointer value no matter what translation unit you’re in.

Another kind of entity we talked about the other day in “Things define_static_array can’t do” (2026-04-24): template parameter objects of class type. Code like this:

template
const A *f() { return &t; }

will produce a template parameter object “variable” in its own COMDAT section, like this (Godbolt):

  .section .rodata._ZTAXtl1AEE,"aG",@progbits,_ZTAXtl1AEE,comdat
  .weak _ZTAXtl1AEE
_ZTAXtl1AEE:
  .zero 1

That’s exactly the same strategy the compiler would use for a simple inline variable like

inline const A v;

Now, when the linker deduplicates COMDAT sections, it looks at the group identifier (a symbol name) to decide if two sections are “duplicates” or not. It doesn’t care whether they have the same bytewise contents. That makes sense, because usually the text sections corresponding to instantiations of the same inline function in different TUs won’t be byte-for-byte identical (especially if the two TUs were compiled with different optimization levels). For inline functions, that’s exactly what we need: deduping by name, not by contents.

You might imagine abusing COMDAT sections to do content-addressed deduplication à la SHF_MERGE. We just have to put each element in its own individual section group, with a group identifier based on (the hash of) its contents. For example, instead of

  .section .rodata.str1.1,"aMS",@progbits
.LC0:
  .string "hello"
.LC1:
  .string "world"
.LC2:
  .string "hello"

we could emit

  .section .rodata.str1.hello,"aG",@progbits,group_hello,comdat
  .globl str_hello
str_hello:
  .string "hello"
  .section .rodata.str1.world,"aG",@progbits,group_world,comdat
  .globl str_world
str_world:
  .string "world"

But that would be vastly increase the linker’s burden — not just processing all those tiny sections, but keeping track of all those new global symbols. (We couldn’t use local, internal-linkage symbols anymore, because the linker wouldn’t be helping us to repoint one symbol’s relocations at another.) And the compiler would have to dedupe within the TU: we could no longer refer to "hello" by local symbols like .LC0 and .LC2, but at the same time we couldn’t emit the global symbol str_hello twice in the same TU.

By the way, making symbol names that incorporate hashes of user-provided data is just asking for trouble. See “Hash-colliding string literals on MSVC” (2022-12-31). A hash collision is disastrous; and when someone discovers how to generate hash collisions and starts writing blog posts like the above, you can’t switch out your hash function for a better one because that would break ABI.

So it’s very good that we have different, essentially custom-fitted, tools for deduping inline functions versus string literals.

Duplicate inline functions must be merged (we forbid false negatives); non-duplicates must not be merged (we forbid false positives). Dupe-ness is decided by symbol name; contents can be different. Duplication need be detected only across TUs; duplication within a single TU is impossible. Use COMDAT sections.
Duplicate strings can be left unmerged (we permit false negatives), although we still forbid false positives. Dupe-ness is decided by bytewise contents. Duplication should be detected both within and across TUs. Use SHF_MERGE.

Templates and inline variables can also use COMDAT. A downside is that we’d like to merge the definitions of e.g. f and f when they’re identical (if nothing in the program compares their addresses), but if dupe-ness is decided by symbol name and those two have different symbols, we’re out of luck. MSVC has a thing called “Identical Comdat Folding” that helps with that. (MSVC’s ICF is a non-conforming optimization, because it doesn’t check the parenthetical above. This can break code: Godbolt).

Initializer-list backing arrays can use SHF_MERGE too. A downside is that we’d really like to merge {2,3,4} into the tail of {1,2,3,4}, and the SHF_MERGE algorithm will never do that. At least the compiler can do that, if someone teaches it to. That compiler optimization “composes” properly with SHF_MERGE. The compiler could even merge {1,2}, {3,4}, and {2,3} into a single 16-byte element {1,2,3,4} with three embedded labels, which could then merge with {1u,2u,3u,4u} at link time:

  .section .rodata.cst16,"aM",@progbits,16
.C.0.0:
  .long 1
.C.1.1:
  .long 2
.C.2.2:
  .long 3
  .long 4

Of course it would be better if the linker could work this out itself; the linker has more information than the compiler and therefore can do a better job of merging elements. It would also be nice if we could do something like SHF_MERGE for literals larger than 32 bytes, and/or literals of lengths that aren’t powers of two. GCC bug #119153 is related.

Contrariwise, C++26 std::define_static_array seemingly cannot use SHF_MERGE, because it cannot tolerate false negatives: C++26 at present requires us to merge all copies of define_static_array(std::array{1,2,3}) into the same place in memory. That’s too bad, because forcing it to use COMDAT (name-addressed) instead of SHF_MERGE (data-addressed) means we cannot take advantage of the latitude C++26 gives us to merge define_static_array(array{1,2,3}) with define_static_array(array{1u,2u,3u}).

Well, we could merge define_static_array(array{1,2,3}) with define_static_array(array{1u,2u,3u}) if we used group identifiers based on a hash of the data! When we tried that on string data above we paid a huge performance penalty as well as a correctness penalty. Here we’d pay only the correctness penalty. It’s still not worth it, IMHO.

Conclusion

Figuring out the best way to “compress” static data using only the tools we’ve got — single-TU ad-hoc compiler smarts, data-addressed SHF_MERGE, and name-addressed COMDAT sections — is a very hard problem. The implementation won’t be optimum in every case.

Maybe we could give future linkers even better tools. For example, now that “potentially non-unique object” is a term of art in C++, maybe we could just dump all PNU objects’ initializers into a single section (.rodata.pnu?) and in one more special section (.pnu_symtab, storing just a list of indices into the real .symtab?) specify their starts and sizes — I think that’s all the information the linker needs in order to overlap them any way it sees fit and repair .symtab accordingly.

Something like that might already exist. If it does, I’d certainly like to hear about it. And if it doesn’t, I’d certainly like someone to build it!

_Adventure:_ Is there light in the cobble crawl?

2026-04-30T00:01:00+00:00

The original Colossal Cave Adventure consists basically of a Fortran source file and a textual data file. These files would often travel from one installation to another via paper printouts: printed out at one site, typed in by hand at another.

The lines of WOOD0350’s Fortran source (intentionally or not) never exceed 80 columns regardless of your tab stop:

$ detab -4 advent.for | awk '{print length}' | sort -n | uniq -c | tail -4
77
78
79
80
$ detab -8 advent.for | awk '{print length}' | sort -n | uniq -c | tail -4
77
78
79
80

But the data file fits within 80 columns only with a tab stop of four. With an eight-space tab stop, four lines of the data file exceed 80 columns:

$ detab -4 advent.dat | awk '{print length}' | sort -n | uniq -c | tail -4
71
72
73
74
$ detab -8 advent.dat | awk '{print length}' | sort -n | uniq -c | tail -4
76
77
78
82

These are the four offending lines. The first comes from section 3 (travel table) and the rest from section 9 (bit flags).

   95556   43      45      46      47      48      49      50      29      30
     1       2       3       4       5       6       7       8       9       10
     42      43      44      45      46      47      48      49      50      51
     52      53      54      55      56      80      81      82      86      87

If your version of Adventure has suffered truncation at the 80th column at any point in its pedigree, you’d expect to see (1) going DOWN from Witt’s End is impossible; (2) the Cobble Crawl does not have light; (3) entering maze room 51 or 87 resets the maze hint counter.

I first became aware of this possibility in December 2025, when Mike Willegal sent me a fan-fold printout of HORV0350 (a SEL32/RTM port of WOOD0350 most recently touched by Ned Horvath) that he’d kept since March 1979. In that fan-fold printout, all four of the lines above are truncated and thus missing the last number.

Now, I have no evidence that HORV0350’s own data file was actually truncated. But anyone retyping the game from that printout could easily have assumed that the Cobble Crawl was meant to be dark, and that Witt’s End was meant to have no DOWN exit.

I see evidence that such truncation really did happen between LONG0501 and ANON0501/OSKA0501.

ANON0501 reflows line (1), preserving the exits from Witt’s End, but truncates (2) and (3).
OSKA0501 reflows (1) but truncates (2) and (3).
MCDO0551 reflows (1) but truncates (2) and (3).
ROBE0665 reflows all three.

My decompilation of the recovered LONG0751’s data file indicates that it did not truncate any of these lines; and my playtesting of the recovered LONG0501 indicates that it did not truncate (1) or (2) either. (It’s inconvenient to test (3) by playtesting.)

These observations are consistent with the hypothesis that first LONG0501 reflowed (1) but left (2) and (3) untouched; then ANON0501/OSKA0501 and MCDO0551 descended from truncated paper printouts of LONG0501 (or via a now-lost common ancestor that was itself descended from LONG0501 by truncation). Meanwhile LONG0751 and ROBE0665 descended, independently, from non-truncated copies of LONG0501.

_Adventure:_ Walking on the ceiling

2026-04-28T00:01:00+00:00

On 2012-12-01 I wrote to Don Woods (in a postscript to a production update on Colossal Cave: The Board Game):

By the way, I just noticed last week that in “Adventure”, in the Hall of the Mountain King, the directions NORTH and LEFT are synonyms, as are SOUTH and RIGHT… as are WEST and FORWARD! West being forward makes sense, if the Hall of Mists is back to the east; but for the rest I suppose the adventurer must be walking on the ceiling. :) This little mixup is present all the way back to Crowther’s code. I just thought it was funny that nobody had commented on it before, as far as I know.

Don Woods wrote back the same day:

Yes, I vaguely recall finding that at some point. It was wrong in my version 1 but got fixed somewhere between there and my version 2.5.

Indeed, where WOOD0350 (circa 1977) has:

  311028  45  36   # 45=N 36=LEFT
  311029  46  37   # 46=S 37=RIGHT
  311030  44  7    # 44=W  7=FORWA

WOOD0430 (his expanded Adventure 2.5, circa 1995) has:

  311028  45  37   # 45=N 37=RIGHT
  311029  46  36   # 46=S 36=LEFT
  311030  44  7    # 44=W  7=FORWA

The original inconsistency is preserved, without comment, in KNUT0350, GIBI0375 (Original Adventure), LUPI0440, PLAT0550, LONG0751, and SMIT0370 (Georgia Tech FunAdv).

ROBE0665 (Wellesley Adventure) and ARNA0770 eliminate the inconsistency — albeit not in an obviously purposeful way — by simply not recognizing LEFT, RIGHT, and FORWARD as exits from the Hall of the Mountain King.

ROBE0665 has a superficially similar slip-up at Three-Opening Arch:

   To the east stands a wide dark arch opening into three
   passages.  All lead eastwards; but the left-handed passage
   plunges down, while the right-hand climbs up, and the middle
   way seems to run on, smooth and level but very narrow.
   To the north of the great arch stands a stone door, half open.
   To the west the passage fades into darkness.

   0 66   LEFT  NE UP
   0 34   RIGHT SE DOWN
   0 273  EAST

Here LEFT correctly matches NE, but goes UP where the room description says “plunges down”; and RIGHT correctly matches SE, but goes DOWN where the room description says “climbs up.” I reported that bug to Eric Roberts on 2025-04-23, although at that time I don’t think I’d noticed that LEFT and RIGHT actually worked correctly in that location, and that it was only the UP and DOWN directions that were wrong.

Things C++26 `define_static_array` can’t do

2026-04-24T00:01:00+00:00

We’ve seen previously that it’s not possible to create a constexpr global variable of container type, when that container holds a pointer to a heap allocation. It’s fine to create a global constexpr std::array, or even a std::string that uses only its SSO buffer; but you can’t create a global constexpr std::vector or std::list (unless it’s empty) because it would have to hold a pointer to a heap allocation.

Think of constexpr evaluation as taking place “in the compiler’s imagination.” Since C++20 it’s fine to use new and delete at constexpr time; but there’s a firewall between constexpr evaluation and real, material runtime existence. You can’t, at runtime, get a pointer to a heap allocation that was made only “in the compiler’s imagination,” any more than you can get a pointer to a local variable of a stack frame that was made only “in the compiler’s imagination.” So none of these snippets will compile:

constexpr int *f() { int i = 42; return &i; }
constinit int *p = f(); // error

constexpr int *f() { return new int(42); }
constinit int *p = f(); // error

constexpr std::vector f() { return {1,2,3}; }
constinit std::vector p = f(); // error

But if you can compute a std::vector at constexpr time, then you can persist its contents into a global constexpr std::array of the appropriate size. The appropriate size is just the .size() of the vector you computed, of course. So we have what’s become known as the “constexpr two-step” (Godbolt):

constexpr std::vector f() { return {1,2,3}; }

constinit auto a = []() {
  std::array a;
  std::ranges::copy(f(), a.begin());
  return a;
}();

Thanks to Barry Revzin’s P3491 (June 2025) and Jason Turner’s “Understanding the Constexpr 2-Step” (C++ On Sea 2024) for the term “constexpr two-step.” Jason’s talk deals with a specific formula in which instead of repeating — and repeatedly evaluating — f() in the body of the lambda, we factor it out into a template argument (Godbolt):

constexpr std::vector f() { return {1,2,3}; }

template
consteval auto to_array() {
  // MAGIC NUMBER WARNING!
  constexpr auto v = B() | std::ranges::to>();
  std::array a;
  std::ranges::copy(v, a.begin());
  return a;
}

constinit auto a = to_array<[]() { return f(); }>();

C++26 will introduce a new and improved tool for this kind of compile-time array generation. It’s spelled std::define_static_array. In C++26 you can just write this (Godbolt):

constexpr std::vector f() { return {1,2,3}; }
constinit std::span sp = std::define_static_array(f());

This call to define_static_array returns a span over a static-storage constant array of three ints. Basically this is asking the compiler to take the data it’s come up with “in its imagination” and write down a copy of it in the object file. This is much cleaner and more compile-time-efficient than the “two-step”!

Unfortunately, if I understand it correctly, C++26 define_static_array does not (yet?) support several things that you can do using the “two-step.” Here are a few such things.

1. Non-structural types

std::define_static_array is defined in terms of std::meta::reflect_constant(e), which C++26 defines as std::meta::template_arguments_of(^^TCls)[0] for some invented template TCls. That is, reflect_constant (and thus define_static_array) is defined only for structural types. int is a structural type, and thus we can write the code above. But we cannot write

using OInt = std::optional;
constexpr std::vector f() { return {1,2,3}; }
std::span sp = std::define_static_array(f());

because optional is not a structural type. Nor are string, string_view, span itself… There are many types that can’t be materialized using define_static_array, even though they work fine with the “constexpr two-step” (Godbolt).

2. Pointers to string literals

Because reflect_constant is defined in terms of TCls, not only must the type of e be structural, but each particular value e in the array must be suitable for use as a template argument. const char* is a structural type, but if that pointer points to a string literal, then it’s not suitable for use as a template argument. So we can use define_static_array to make an array of null pointers:

constexpr std::vector f() { return {nullptr, nullptr, nullptr}; }
std::span sp = std::define_static_array(f());

but it cannot make an array of pointers to literals:

constexpr std::vector f() { return {"a", "b", "c"}; }
std::span sp = std::define_static_array(f());

On the other hand, the “constexpr two-step” has no problem with string literals (Godbolt).

3. Move-only types

In order to create a template parameter object representing e, we must make a copy of e ([temp.arg.nontype]/4). Therefore NTTP types must be copyable. You can (with care) use the two-step to create a static array of move-only type:

constexpr auto a = []() {
  std::array a;
  std::ranges::copy(f() | std::views::as_rvalue, a.begin());
  return a;
}();

but you cannot do the same with define_static_array. (Godbolt.)

The above snippet, like all my other examples of the “two-step,” never actually uses move-construction; it uses default construction followed by assignment. This is unsatisfying, and prevents the two-step from creating e.g. an array of reference_wrapper. define_static_array, on the other hand, does not use default-construction (Godbolt). Can we rework the two-step to eliminate the default-constructibility requirement? I imagine we can, but at the moment I don’t see how.

4. Make the array mutable

define_static_array allocates its array in rodata and gives you a span over it. This allows the compiler to do cool things, like point multiple invocations of define_static_array at the same backing array (Godbolt). In fact, the compiler is actually required to do that, because reflect_constant is defined in terms of a template parameter object which for all intents and purposes behaves like an inline variable: there is guaranteed to be only one template parameter object with a given type and value in the whole program (Godbolt).

Treating template parameter objects as inline variables means the compiler must combine such objects when they have the same type and value (optimization! hooray!) but sadly also forbids an otherwise sufficiently smart compiler from combining such objects when their types are merely similar. Godbolt:

template auto tpo() { return std::span(V); }
template auto tpo2() { return std::span(V); }

const void *p1 = tpo{1,2,3}>().data();
const void *p2 = tpo2{1,2,3}>().data();
const void *p3 = tpo{1,2,3}>().data();
const void *p4 = tpo{1,2,3}>().data();

All four of these pointers point to arrays of the three bytes 01 02 03. p1 and p2 are required to point to the same byte; p3 and p4, since they point to std::array objects of different types, are required to point to different arrays. The compiler isn’t allowed to coalesce p3 and p4, the way it’s allowed to coalesce the backing arrays of differently typed initializer_lists (Godbolt).

But (hooray! and thanks to Tim Song for correcting me on this!) there is a special case specifically for the “template parameter objects of array type” created by reflect_constant_array and define_static_array. These objects are permitted ([intro.object]/9.3) to overlap or be coalesced, just like initializer_lists and string literals. Clang trunk isn’t smart enough to coalesce potentially non-unique objects; therefore the Clang reference implementation of C++26 Reflection doesn’t coalesce these array objects either; but it’s not the paper standard’s fault. Godbolt:

const void *p1 = std::define_static_array(std::vector{1,2,3}).data();
const void *p2 = std::define_static_array(std::list{1,2,3}).data();
const void *p3 = std::define_static_array(std::vector{1,2,3}).data();
const void *p4 = std::define_static_array(std::vector{1,2,3}).data();

All four of these pointers point to arrays of the three bytes 01 02 03. p1 and p2 are required to point to the same byte; p3 and p4 are permitted, but not required, to point to different arrays. In practice Clang makes them different; GCC, once it implements define_static_array, will presumably make them the same.

However, template parameter objects are invariably const! Therefore, you cannot use define_static_array to produce a constinit-but-mutable array, the way you can with the “constexpr two-step.” It seems to me perfectly reasonable to want a magic consteval function that says, “Please generate me a mutable array in static storage with these contents” — specified as a constexpr-time vector — “and give me a span over it”:

template
consteval auto define_mutable_static_storage_array(R&& r)
    -> std::span>;

Perfectly reasonable to want such an API; but C++26 define_static_array fundamentally isn’t that API. It can’t produce mutable data: it can’t produce anything except pointers into (potentially non-unique) template parameter objects, which behave like const inline variables.

Conclusion

In short, define_static_array is constitutionally unsuited for some conspicuous use-cases. I’m not sure what this means for the future. I’m sure we don’t want to require people to use the “constexpr two-step” forever; but define_static_array doesn’t seem suited to replace all of its uses — certainly not in C++26, and I don’t see how it could be extended in the future to solve any of the problems I outlined above.

I imagine the answer is not “define_static_array will solve all your problems today,” nor “a new and improved define_static_array will solve all your problems in C++XY,” but rather “C++XY will introduce a new and different facility for manipulating static storage” — possibly related to the as-yet-unstandardized code-generation side of reflection — and we’ll use that new facility to solve some (but perhaps not all) of the above problems.

UPDATE: Actually, problems (1), (2), and (3) all stem from define_static_array’s requirement that each element be usable as an NTTP. Barry Revzin’s P3380R1 “Extending support for class types as NTTPs” (December 2024) lays out a plan that would permit the programmer to mark their own types as explicitly structural, thus (if accepted) addressing all three of those problems. On the other hand, making a user-defined type explicitly structural per P3380R1 seems to involve pretty arcane programming. The “constexpr two-step” stays general by staying above the fray: it simply never requires anything to be encoded as a template argument.

`auto{x} != auto(x)`

2026-04-11T00:01:00+00:00

Recently it was asked: What’s the difference between the expressions auto(x) and auto{x} in C++23?

The construct auto(x) arrived via P0849 “decay-copy in the language”. We could already write direct-initialization to a named type as either a declaration or a cast-expression:

T y(x); // declaration
return T(x); // cast-expression

P0849 just extended this syntax to work for a placeholder type as well:

auto y(x); // declaration
return auto(x); // cast-expression

Both of the latter lines mean “Deduce the type of auto from the type of x (decaying to a non-array object type if necessary) — let’s call that type T — and then explicitly cast x to that type exactly as if the user had written T in place of auto.”

This usually means we’re just making a copy of x using its copy constructor. If x stands for an xvalue expression, we’re calling the move constructor, and if x is a prvalue, we’re probably not doing anything at all. auto(x) is simple.

But auto{x} is more complicated, because curly braces produce an initializer list. This means the same thing as T{x}: “given the list of elements {x}, make me a T with those elements.” As [dcl.init.list]/3.7 shows, that’s not always the same thing as “make me a copy of x.” Godbolt:

auto paren() {
    std::vector v;
    return auto(v);
}

auto curly() {
    std::vector v;
    return auto{v};
}

The former means “make an empty vector, then return a copy of that vector (with no elements).” The latter means “make an empty vector, then return a vector containing that vector (with one element).”

As of this writing, MSVC gets this wrong: it calls the copy constructor in both cases. But [dcl.init.list]/3.7 (last improved by CWG2638) makes it very clear that MSVC is in the wrong.
return (x); implicitly moves from x (see P2266 “Simpler implicit move”), but return auto(x); does not. This makes sense, because return T(x); doesn’t move-from x either. Remember, all auto does here is hold the place of an explicitly specified T.

Consider also these variations:

auto p = (v); // copy
auto c = {v}; // initialize a new vector of 1 element

auto p(v); // copy
auto c{v}; // initialize a new vector of 1 element (MSVC gets this wrong)

Of course vector is a pathological case. Its benefit is that it’s also a simple case using only STL types, in case you ever need to demonstrate the difference between (x) and {x} to anyone else.

The takeaway: As always, you should use curly braces when you have a sequence of elements (such as when initializing an aggregate or a container); if you aren’t in that situation (such as when you’re writing generic code) you should use ordinary parentheses. See “The Knightmare of Initialization in C++” (2019-02-18).

I suspect there is no situation where it ever makes sense to use auto{x} in real code. But I’m glad it exists in the language, for symmetry and consistency with T{x}.

Note that all of these lines—

auto a(1,2,3);
auto a = auto(1,2,3);
auto a{1,2,3};
auto a = auto{1,2,3};

—are invalid C++. You can never direct-initialize auto with multiple arguments. However, both of the following copy-initializations are legal and silly:

auto i = (1,2,3); // comma operator; i is int
auto i = {1,2,3}; // i is initializer_list

The latter is a historical accident which is supported these days, as far as I know, only so that we can specify the behavior of for (int i : {1,2,3}) without having to write a special case into [stmt.ranged].

N3922 “New rules for auto deduction from braced-init-list” is the paper that removed auto a{1,2,3} from the language. N3922 came in 2014, at the height of the “Almost Always Auto” and “Uniform Initialization” fads; it was widely assumed that newbies would write auto a{1,2} and shoot themselves in the foot, but writing auto a = {1,2} wasn’t so attractive to newbies and thus wasn’t treated so urgently as a footgun. At the same time, N3922 changed both auto a{1} and auto a = {1} to deduce int rather than initializer_list. Only auto a = {1,2} remains as a special case inconsistent with the rest of the language. N3912 §1 says this special case will be useful to “advanced users,” which I think in hindsight was a bad justification for keeping it.

The “macro overloading” idiom

2026-04-02T00:01:00+00:00

Here’s a neat trick to create an “overloaded macro” in C, such that M(x) does one thing and M(x, y) does something else. For example, we could make a macro ARCTAN such that ARCTAN(v) calls atan(v) and ARCTAN(y,x) calls atan2(y,x).

#define GET_ARCTAN_MACRO(_1, _2, x, ...) x
#define ARCTAN(...) GET_ARCTAN_MACRO(__VA_ARGS__, atan2, atan)(__VA_ARGS__)

So ARCTAN(1) expands to GET_ARCTAN_MACRO(1, atan2, atan)(1) expands to atan(1), while ARCTAN(2,3) expands to GET_ARCTAN_MACRO(2,3, atan2, atan)(2,3) expands to atan2(2,3).

Or again, to make an “overloaded” HYPOT macro:

#define GET_HYPOT_MACRO(_1, _2, _3, x, ...) x
#define HYPOT(...) GET_HYPOT_MACRO(__VA_ARGS__, hypot3, hypot, )(__VA_ARGS__)

So HYPOT(x) expands to (x), HYPOT(x,y) expands to hypot(x,y), and HYPOT(x,y,z) expands to hypot3(x,y,z).

HYPOT(1,2,3,4) expands to GET_HYPOT_MACRO(1,2,3,4, hypot3, hypot,)(1,2,3,4) expands to 4(1,2,3,4), which is garbage. It’s likely to be ill-formed garbage, though, so that’s not too user-unfriendly.
HYPOT() expands to GET_HYPOT_MACRO(,hypot3,hypot,)() expands to (). ARCTAN() expands to GET_ARCTAN_MACRO(, atan2, atan)() expands to atan(). These are less user-friendly.

If you don’t mind relying on a C23/C++20 preprocessor feature, you can improve the latter experience:

#define GET_ARCTAN_MACRO(_1, _2, x, ...) x
#define ARCTAN(...) GET_ARCTAN_MACRO(__VA_ARGS__ __VA_OPT__(,) atan2, atan)(__VA_ARGS__)

Now ARCTAN() expands to GET_ARCTAN_MACRO(atan2, atan)() which is more cleanly ill-formed. (It has too few macro arguments.)

You might think you could use a well-known GCC extension to write __VA_ARGS__##, — but no, the token-paste operator ## has its special meaning only within ,##__VA_ARGS__, not within __VA_ARGS__##,.

Boost.Preprocessor implements BOOST_PP_VARIADIC_SIZE via a minor variation on this idiom:

#define GET_SIZE_MACRO(_1, _2, _3, _4, _5, x, ...) x
#define SIZE(...) GET_SIZE_MACRO(__VA_ARGS__ __VA_OPT__(,) 5,4,3,2,1,0)

Hat tip to this blog post by “Quarterstar” (March 2026); the technique is also shown by CodeLucky (September 2024), on StackOverflow (2012), and presumably much older places. GitHub search turns up many cases of the pattern, even without considering variations in the GET_*_MACRO naming convention.

Caveat: As of 2026, MSVC’s preprocessor can’t handle this trick by default. You have to tell it to behave conformingly, by adding -Zc:preprocessor to your command line. (This is also how you get it to recognize __VA_OPT__!) Alternatively, MSVC’s old non-conforming preprocessor will accept the code as long as it’s wrapped in an additional layer of indirection. See “The Fundamental Theorem of Software Engineering” (2018-06-18).

// Work around MSVC's non-conforming preprocessor
#define EXPAND(x) x
#define GET_ARCTAN_MACRO(_1, _2, x, ...) x
#define ARCTAN(...) EXPAND(GET_ARCTAN_MACRO(__VA_ARGS__, atan2, atan)(__VA_ARGS__))

Chromium’s `span`-over-initializer-list success story

2026-03-19T00:02:00+00:00

Previously: “span should have a converting constructor from initializer_list” (2021-10-03). This converting constructor was added by P2447 for C++26. Way back in 2024, Peter Kasting added the same constructor to Chromium’s base::span — he emailed me about it at the time — but I was only recently reminded that in the /r/cpp thread about the feature he’d written:

Yup, this change was so useful it led to me doing a ton of reworking of Chromium’s base::span just so I could implement it there.

Speaking of ambiguity: out of context that comment could be taken as sarcasm. What programmer enjoys “doing a ton of reworking just” to implement a single new constructor? Did he mean the change was so useful, or, like, “so useful”? :) So it’s worthwhile to track down pkasting’s actual commit from November 2024 and see all the places he sincerely did clean up as a result.

What follows is a “close reading” of all the client call sites changed in Chromium commit 7a129f92f5.

std::vector> certs(
    {kcer_cert_0, kcer_cert_1, kcer_cert_2, kcer_cert_3, kcer_cert_3,
     kcer_cert_2, kcer_cert_1, kcer_cert_0, kcer_cert_0, kcer_cert_2,
     kcer_cert_3, kcer_cert_1});
CertCache cache(certs);

becomes simply

CertCache cache({kcer_cert_0, kcer_cert_1, kcer_cert_2, kcer_cert_3,
                 kcer_cert_3, kcer_cert_2, kcer_cert_1, kcer_cert_0,
                 kcer_cert_0, kcer_cert_2, kcer_cert_3, kcer_cert_1});

This is the poster-child use-case: the new code directly views a stack-allocated initializer_list, where the old code had wasted time and memory copying the contents of that initializer_list into a heap-allocated vector. This being test code, we don’t really care about the new code’s improved efficiency, but we do care about its improved readability and convenience.

ASSERT_TRUE(ConfigureAppContainerSandbox(
    std::array{&pathA, &pathB}));

becomes simply

ASSERT_TRUE(ConfigureAppContainerSandbox({&pathA, &pathB}));

EXPECT_THAT(MapThenFilterStrings(
                {{"en", "de"}},
                base::BindRepeating(~~~~)),
            IsEmpty());

replaces its double-braces with single-braces.

FetchImagesForURLs(base::span_from_ref(card_art_url),
                   base::span({AutofillImageFetcherBase::ImageSize::kSmall,
                               AutofillImageFetcherBase::ImageSize::kLarge}));

becomes simply

FetchImagesForURLs(base::span_from_ref(card_art_url),
                   {AutofillImageFetcherBase::ImageSize::kSmall,
                    AutofillImageFetcherBase::ImageSize::kLarge});

Notice that these preceding three examples all had the same intent — to view a fixed list of two items — but in the absence of natural syntax they invented three different workarounds to imperfectly express their intent. (Temporary std::array; doubled curly braces; explicit cast to base::span.) All three converged on the natural syntax as soon as it became available. Two of them benefit from P2752, too.

There were two “failure stories” in Peter’s commit, both due to the new constructor’s lack of CTAD. (I still don’t think anyone should ever use CTAD, and LEWG was a little scared of adding it here anyway.) For example Peter rewrote

if (base::span(box.type) == base::span({'f', 't', 'y', 'p'}))

into

if (base::span(box.type) == base::span({'f', 't', 'y', 'p'}))

Now, you might think after P2447 this could have become simply

if (base::span(box.type) == {'f', 't', 'y', 'p'})

but sadly no, for historical reasons a braced initializer list is grammatically disallowed after most C++ operators (the exceptions being co_yield and the assignment operators). I myself would probably have written one of

if (std::string_view(box.type, 4) == "ftyp")

if (memcmp(box.type, "ftyp", 4) == 0)

In the other “failure case,” Peter rewrote

hosts[DnsHostsKey("localhost", ADDRESS_FAMILY_IPV4)] =
    IPAddress({192, 168, 1, 1});

into

hosts[DnsHostsKey("localhost", ADDRESS_FAMILY_IPV4)] =
    IPAddress(base::span({192, 168, 1, 1}));

The trick here is that the old code (Godbolt) wasn’t actually constructing a span at all; it was calling IPAddress’s four-argument converting constructor followed by a redundant explicit cast to IPAddress. Personally I would have preserved the old behavior and improved readability at the same time by simply removing the curly braces:

hosts[DnsHostsKey("localhost", ADDRESS_FAMILY_IPV4)] =
    IPAddress(192, 168, 1, 1);

UPDATE, 2026-03-27: After reading this blog post, Peter improved both of these “failure cases” in the suggested ways; see commit 7107d5e857!

Arthur O’Dwyer

`std::is_heap` could be faster

Benchmark it!

Conclusions

Two-Minute _Iolanthe_

C++ Alignment Chart

ELF’s ways to combine potentially non-unique objects

The compiler can merge similar data

SHF_MERGE sections

SHF_MERGE | SHF_STRINGS

SHF_GROUP, a.k.a. COMDAT sections

Conclusion

_Adventure:_ Is there light in the cobble crawl?

_Adventure:_ Walking on the ceiling

Things C++26 `define_static_array` can’t do

1. Non-structural types

2. Pointers to string literals

3. Move-only types

4. Make the array mutable

Conclusion

`auto{x} != auto(x)`

The “macro overloading” idiom

Chromium’s `span`-over-initializer-list success story

`SHF_MERGE` sections

`SHF_MERGE | SHF_STRINGS`

`SHF_GROUP`, a.k.a. COMDAT sections