One of the satisfactions of recreational mathematics comes from finding better solutions for problems thought to have been already solved in the best possible way. Consider the following digital problem that appears as Number 81 in Henry Ernest Dudeney’s Amusements in Mathematics. (There is a Dover reprint of this 1917 book.) Nine digits (0 is excluded) are arranged in two groups. On the left a three-digit number is to be multiplied by a two-digit number. On the right both numbers have two digits each:
\[158\times 23 = 79\times 46\]In each case the product is the same: 3,634. How, Dudeney asked, can the same nine digits be arranged in the same pattern to produce as large a product as possible, and a product that is identical in both cases? Dudeney’s answer, which he said “is not to be found without the exercise of some judgment and patience,” was 5,568:
\[174\times 32 = 96\times 58\]Victor Meally of Dublin County in Ireland later greatly improved on Dudeney’s answer with 7,008:
\[584\times 12 = 96\times 73\]This remained the record until a Japanese reader found an even better solution. It is believed, although it has not yet been proved, to give the highest possible product. Can you find it without the aid of a computer?
With the aid of a computer (code), it’s easy to confirm that that Japanese reader’s solution is indeed the best of the 11 basic solutions (and Meally’s the runner-up):
134 * 29 = 58 * 67 = 3386
158 * 23 = 46 * 79 = 3634
138 * 27 = 54 * 69 = 3726
174 * 23 = 58 * 69 = 4002
146 * 29 = 58 * 73 = 4234
259 * 18 = 63 * 74 = 4662
186 * 27 = 54 * 93 = 5022
158 * 32 = 64 * 79 = 5056
174 * 32 = 58 * 96 = 5568
584 * 12 = 73 * 96 = 7008
532 * 14 = 76 * 98 = 7448
But now consider the ruminations of another Irishman, James Joyce’s Leopold Bloom, who in Chapter 17 of Ulysses recounts how he became aware of
the existence of a number computed to a relative degree of accuracy to be of such magnitude and of so many places, e.g., the 9th power of the 9th power of 9, that […] 33 closely printed volumes of 1000 pages each of innumerable quires and reams of India paper would have to be requisitioned in order to contain the complete tale of its printed integers […] the nucleus of the nebula of every digit of every series containing succinctly the potentiality of being raised to the utmost kinetic elaboration of any power of any of its powers.
There’s some confusion as to what number Joyce was really talking about (if any); but the mathematical community has apparently settled on \(9^{9^9}\), as seen in e.g. the “Ulysses sequence” OEIS A054382: \(\lceil\log_{10} 1^{1^1}\rceil, \lceil\log_{10} 2^{2^2}\rceil, \lceil\log_{10} 3^{3^3}\rceil,\ldots\)
The ninth element of this sequence — the number of decimal digits in \(9^{9^9}\) — is \(369\,693\,100\). If “closely printed” at the resolution of A Million Random Digits with 100,000 Normal Deviates, printed by the RAND Corporation (no relation) in 1955, the recording of the entire expansion of \(9^{9^9}\) would take up 148 thousand-page volumes. (Or, if you postulate a thousand physical pages, each printed on both sides: 74 volumes.)
Suppose we allow solutions of Dudeney’s problem to contain powers: not merely \(abc\times de = gh\times ij\), but for example \(ab^c\times d^e = g^h\times i^j\). Then there are five more basic solutions possible, the largest of which is
\[48^3 \times 9^1 = 2^7 \times 6^5\]The solutions are:
329 * 8^1 = 47 * 56 = 2632
574 * 9^1 = 63 * 82 = 5166
49^2 * 8^1 = 56 * 7^3 = 19208
1^67 * 8^5 = 2^9 * 4^3 = 32768 (also with 1^76)
48^3 * 9^1 = 2^7 * 6^5 = 995328
Clement Wood — compiler of The Best Irish Jokes (1926) — asserts, in the same Book of Mathematical Oddities (1927) which we previously mined in “Mathematical golf” (2023-03-23), that there are only two solutions to the double equality
29 * 6 = 58 * 3 = 174
39 * 4 = 78 * 2 = 156
Wood is correct, and the administration of powers produces no further solutions to that puzzle.
]]>If you’re a fan of my “trivial relocatability” content, or just want to help relocation’s progress into the mainstream of C++, you might be particularly interested in these three GSoC sponsors:
Ste||ar HPX.
Their ideas list
includes a sequel to last summer’s successful project implementing hpx::uninitialized_relocate
;
this summer’s goal is to parallelize hpx::uninitialized_relocate
(and perhaps also hpx::copy
?) for ranges that overlap. Parallelizable relocation is a building block
for parallel containers such as ParlayLib’s sequence<T>
.
GCC.
Their ideas list doesn’t include anything relocation-related —
but you could propose something! For example: Adding __is_trivially_relocatable(T)
to the compiler
along these lines but better.
Implementing a way to mark types such as deque
trivially relocatable (i.e. [[gnu::trivially_relocatable]]
).
Rewriting libstdc++’s __is_bitwise_relocatable
in terms of your new __is_trivially_relocatable(T)
, so that aggregates containing deques
get the same benefit as deque
itself. (Godbolt.)
LLVM/Clang.
Their ideas list doesn’t include anything relocation-related —
but you could propose something! For example: Clang provides an __is_trivially_relocatable(T)
builtin,
but it often gives wrong answers (#69394,
#77091). Fixing this would allow projects such as
Folly,
Abseil,
Qt
to start conditionally using Clang’s builtin, say, in the Clang 19 timeframe.
You could also implement relocation optimizations in libc++’s vector
and swap_ranges
.
Again, the application window for GSoC participants to submit a proposal application opens on March 18 and closes on April 2.
]]>By “mathematical structure,” I mean something like how we make recipes and the metrics by which we might compare one recipe to another. For example, if our goal is to make Sandwich, we could do it like this:
Wave = Water + Wind
Steam = Fire + Water
Plant = Earth + Water
Sand = Earth + Wave
Tea = Plant + Steam
Sandwich = Sand + Tea
Or like this:
Wave = Water + Wind
Sand = Earth + Wave
Glass = Fire + Sand
Wine = Glass + Water
Sandwich = Sand + Wine
If we diagram these recipes, we find that the first one is shallower, in the sense that we combine only the primitive elements, elements made from the primitives (Wave, Steam, Plant), and elements made from those elements (Sand, Tea). But the second one is terser, in the sense that it is five lines long instead of six.
At first glance, the world of Infinite Craft forms a directed hypergraph with a lot of structure: each directed hyperedge connects a set of one or two vertices to a set of exactly one vertex. But a recipe isn’t merely a “path” on that hypergraph! Mathematicians define the hypergraph analogue of a “path” as a set of incident hyperedges — “incidence” meaning that the edges share at least one vertex. In this sense there is a “path” from the starting elements to Sandwich that consists of only a single hyperedge; namely,
Sandwich = Wind + Grilled Cheese
That definition of “path” doesn’t help us.
I asked MathOverflow, and they pointed me to another example of the same problem: addition chains. An “addition chain” for an integer \(n\) is a sequence starting with 1 and ending with \(n\), such that each element in the sequence is the sum of exactly two previous elements. For example, we might make 31 in any of these three ways:
2 = 1 + 1 2 = 1 + 1 2 = 1 + 1
3 = 1 + 2 3 = 1 + 2 3 = 1 + 2
6 = 3 + 3 4 = 2 + 2 6 = 3 + 3
7 = 1 + 6 8 = 4 + 4 12 = 6 + 6
14 = 7 + 7 12 = 8 + 4 14 = 2 + 12
15 = 1 + 14 16 = 8 + 8 17 = 3 + 14
30 = 15 + 15 28 = 12 + 16 31 = 14 + 17
31 = 1 + 30 31 = 3 + 28
The middle chain is shallowest, but the right-hand one is tersest.
Addition chains are idiomatically written as just an increasing sequence of integers: (1 2 3 6 12 14 17 31). We don’t need to specify how each integer (say, 17) is constructed from the preceding elements, because it’s obvious. We could represent Infinite Craft recipes just as concisely — (Wave Sand Glass Wine Sandwich) — but that wouldn’t be very reader-friendly because it’s not obvious which two preceding elements combine to make, say, Wine.
Finding the tersest addition chain is directly relevant to the world of computer programming.
Suppose we want to calculate the 31st power of an unknown number in register A
,
using only multiplication. Then we can do any of:
mul A, A, B # A^2 mul A, A, B # A^2 mul A, A, B # A^2
mul A, B, C # A^3 mul A, B, C # A^3 mul A, B, C # A^3
mul C, C, D # A^6 mul B, B, D # A^4 mul C, C, D # A^6
mul A, C, E # A^7 mul D, D, E # A^8 mul D, D, E # A^12
mul E, E, F # A^14 mul D, E, F # A^12 mul B, E, F # A^14
mul A, F, G # A^15 mul E, E, G # A^16 mul C, F, G # A^17
mul G, G, H # A^30 mul F, G, H # A^28 mul F, G, R # A^31
mul A, H, R # A^31 mul C, H, R # A^31
Our “shallowness” metric translates into a measure of the data dependencies involved in these computations. The middle program, being the shallowest, is also the fastest on any machine with at least two multiplier units.
Another practically relevant metric for the “goodness” of a chain is its width: the number of registers it uses in its most optimal coloring. The left-hand recipe above is the narrowest, with width 2, whereas the others have width 3:
mul A, A, B # A^2 mul A, A, B # A^2 mul A, A, B # A^2
mul A, B, B # A^3 mul A, B, A # A^3 mul A, B, A # A^3
mul B, B, B # A^6 mul B, B, C # A^4 mul A, A, C # A^6
mul A, B, B # A^7 mul C, C, B # A^8 mul C, C, C # A^12
mul B, B, B # A^14 mul C, B, C # A^12 mul B, C, C # A^14
mul A, B, B # A^15 mul B, B, B # A^16 mul A, C, A # A^17
mul B, B, B # A^30 mul C, B, B # A^28 mul C, A, R # A^31
mul A, B, R # A^31 mul A, B, R # A^31
The left-hand recipe corresponds to Russian peasant multiplication, which always generates an addition chain of width 2. For in-depth coverage of various algorithms to generate addition chains, see Knuth Volume 2 §4.6.3 “Evaluation of Powers.”
Surprisingly, the “tersest chain” problem is non-trivial in both Infinite Craft and addition-chains. Knuth writes:
Several authors have published statements (without proof) that the binary method [that is, Russian peasant multiplication] actually gives the minimum possible number of multiplications. But that is not true. The smallest counterexample is \(n = 15\), when the binary method needs six multiplications, yet we can calculate \(y = x^3\) in two multiplications and \(x^{15} = y^5\) in three more, achieving the desired result with only five multiplications.
This suggests the algorithm Knuth calls the “factor method”; but yet again, you can find numbers whose optimal chain eludes both the binary method and the factor method! It appears that there is no fast (non-exponential-time) algorithm that generates an optimal addition chain for every input.
To get an intuitive sense of the difficulty — in particular, why no greedy algorithm helps — look again at our tersest route to Sandwich:
Wave = Water + Wind
Sand = Earth + Wave
Glass = Fire + Sand
Wine = Glass + Water
Sandwich = Sand + Wine
This route to Sandwich passes through Wine on the fourth step. Now, the tersest route to Wine itself is only three steps:
Plant = Earth + Water
Dandelion = Plant + Wind
Wine = Dandelion + Water
But if you make Wine by that route, you’ll never reach Sandwich in the optimum number of steps!
Similarly, our tersest route to 31 was (1 2 3 6 12 14 17 31), passing through 17 on the sixth step. There are two routes that make 17 in only five steps:
(1 2 4 8 9 17)
(1 2 4 8 16 17)
But if you make 17 by either of these routes, you’ll never reach 31 in the optimum number of steps!
Neill Clift of AdditionChains.com produced this example for me — my utmost thanks to him! According to Neill, there are exactly five optimal chains for 31 that contain the number 17: none of those chains contain (1 2 4 8). Meanwhile there are seventy-two other optimal chains for 31 that don’t contain 17 at all.
Still, knowing that Infinite Craft and addition chains are two examples of the same hypergraph structure doesn’t tell me whether there’s an accepted name for this particular hypergraph structure. If you have any leads, please pop over to MathOverflow and/or send me an email!
Observe that the addition-chain structure is commutative and associative, whereas the Infinite Craft structure is commutative but non-associative:
Plant = Water + Earth
Lava = Earth + Fire
Smoke = (Water + Earth) + Fire
Stone = Water + (Earth + Fire)
This makes much of Knuth’s discussion (particularly “Graphical representation” and the generation of equivalent dual addition chains) inapplicable to Infinite Craft.
To explore the Infinite Craft hypergraph offline — without stressing Neal’s backend or needing to evade his Cloudflare bot-detection filter — you can download a compressed database containing about 30,000 elements from Tom Fang’s GitHub, szdytom/infinite-craft-dictionary. Computing the tersest recipe for each element, and inventing a compact way to represent such a recipe in the database, is left as an exercise for the reader!
]]>integer_sequence
at compile time” (2024-02-19),
I wrote:
I don’t know any good way to operate on a whole pack of
Is...
in value-space and then “return” them into type-space asVector<Is...>
.
So I had written a helper function to get each individual element of the result one by one, like this:
template<class> struct Sort;
template<int... Is> struct Sort<Vector<Is...>> {
static constexpr int GetNthElement(int i) {
int arr[] = {Is...};
std::ranges::sort(arr);
return arr[i];
}
template<class> struct Helper;
template<int... Indices> struct Helper<Vector<Indices...>> {
using type = Vector<GetNthElement(Indices)...>;
};
using type = Helper<Iota<sizeof...(Is)>>::type;
};
Alert reader “Chlorie” writes in to tell me the proper approach. We just need our helper to return
a std::array
of all the elements at once, and then — still — use a pack-expansion to unpack
the elements out of that constexpr std::array
and into Vector
’s template argument list.
(Here we use std::array
, not int[]
, despite my usual advice,
because we need something that can be returned from a function.) Godbolt:
template<int N> struct Array { int data[N]; };
template<class> struct Sort;
template<int... Is> struct Sort<Vector<Is...>> {
static constexpr auto sorted = []() {
Array<sizeof...(Is)> arr = {Is...};
std::ranges::sort(arr.data);
return arr;
}();
template<class> struct Helper;
template<int... Indices> struct Helper<Vector<Indices...>> {
using type = Vector<sorted.data[Indices]...>;
};
using type = Helper<Iota<sizeof...(Is)>>::type;
};
Now that sort
is called only once, we have a truly \(O(n\log n)\) compile-time sorting operation.
Yesterday’s benchmark is still here;
but I’ve updated it
to include Chlorie’s proper \(O(n\log n)\) constexpr solution, using both std::sort
and std::ranges::sort
.
Here’s Clang trunk (the future Clang 19) with libc++, running on my MacBook.
The new solutions are the orange and red lines closest to the X-axis. The orange and red
lines higher up (unchanged from yesterday’s graph) are yesterday’s std::sort
and std::ranges::sort
solutions, which do basically \(n\) times as much work as today’s solutions.
Here’s GCC 12.2 with libstdc++, running on RHEL 7.1.
There’s still a noticeable compile-time-perf difference between libstdc++’s sort
and ranges::sort
,
but even ranges::sort
(in orange) now wins handily against yesterday’s recursive-template selection sort (in black).
The exercises build gradually, from Prepend
and Append
, to RemoveFirst
and RemoveAll
, to Sort
.
The only things I’d have done differently in that sequence are to include PopFront
(which is easier)
before PopBack
(which, unless I’m missing something, is harder); and include Iota
(i.e. std::make_index_sequence
) after Length
.
Now, one way to implement Sort
is to implement a selection sort from scratch by combining Min
,
RemoveFirst
, and Prepend
. In fact that’s how Ondřej’s sample solution does it.
template<class> struct Sort;
template<> struct Sort<Vector<>> : TypeIdentity<Vector<>> {};
template<int... Is> struct Sort<Vector<Is...>> {
using Min = Min<Vector<Is...>>;
using Tail = RemoveFirst<Min::value, Vector<Is...>>::type;
using type = Prepend<Min::value, typename Sort<Tail>::type>::type;
};
But since as a Professional Software Engineer™ I have a (possibly irrational) allergy to writing sort algorithms by hand,
and since C++20 gives us a constexpr-friendly std::sort
, the way that first occurred to me was
simply to do the sorting in value-space using std::sort
,
and then lower the answer back down into type-space. So that’s how I did it
(Godbolt):
template<class> struct Sort;
template<int... Is> struct Sort<Vector<Is...>> {
static constexpr int GetNthElement(int i) {
int arr[] = {Is...};
std::ranges::sort(arr);
return arr[i];
}
template<class> struct Helper;
template<int... Indices> struct Helper<Vector<Indices...>> {
using type = Vector<GetNthElement(Indices)...>;
};
using type = Helper<Iota<sizeof...(Is)>>::type;
};
static_assert(std::same_as<Sort<Vector<2,4,1,3>>::type, Vector<1,2,3,4>>);
I don’t know any good way to operate on a whole pack of Is...
in value-space and then “return” them
into type-space as Vector<Is...>
; but I do know the trick above, which is to return each
element of the result individually and then glue those \(n\) individual constexpr results back together
as Vector<GetNthElement(Indices)...>
. This ends up doing the constexpr work \(n\) times instead of
just once, so we expect it to be super inefficient: as shown, this is an \(O(n^2 \log n)\) sorting
algorithm!
But the name of GetNthElement
hints that we can replace std::ranges::sort(arr)
with std::ranges::nth_element(arr, arr+i)
.
This makes the whole sorting operation \(O(n^2)\), just like the open-coded selection sort above.
The good way I was missing has been pointed out by alert reader “Chlorie.” Basically it’s to return an array, capture it in a
constexpr
variable, and then useVector<arr[Indices]...>
. See “Sorting at compile time, part 2” (2024-02-20).
Which compile-time sort is faster? Naïvely, I expect the constexpr version to be faster, because it
avoids “recursive templates”; see “Iteration is better than recursion” (2018-07-23).
On the other hand, its “gluing individual elements” trick causes it to do the same work \(n\) times in a row,
which sounds slow, even if each individual computation is constexpr and therefore fast. But, back on the first hand,
we know that both versions have the same asymptotic complexity — even the dumbest version,
calling std::ranges::sort
\(n\) times, is only off by a factor of \(\log n\) (which,
“for most practical purposes, is less than 64”).
And we expect the compiler to do better at direct value-space constexpr evaluation than at instantiating \(O(n)\)
intermediate class types. So I clearly cannot choose the implementation in front of you!
We also expect the compiler to do better at simple constexpr evaluation than at instantiating \(O(n)\)
intermediate class types. But, on the other hand, std::sort
and std::nth_element
are hardly “simple”
functions! Our STL vendor (whether it’s libc++, libstdc++, or Microsoft) will send those functions through
several layers of indirection and customization — even a simple operation like dereferencing the raw pointer
arr+i
might cause the instantiation of std::iter_move
and/or std::iter_swap
. And if we use std::ranges::sort
,
then we’re touching everything from std::ranges::random_access_range
to decltype(std::ranges::iter_move)::operator()
.
There’s no world in which std::ranges::sort
is less complicated than our simple open-coded selection sort.
So I clearly cannot choose the implementation in front of me!
Fortunately, C++ programmers have spent the last few years building up an immunity to long compile times.
To satisfy my curiosity, I wrote a little benchmark for this (here).
The benchmark generates several random lists of integers of a given length \(n\) and calls Sort
on them,
then static_assert
s that the output is what we expect.
The benchmark script compares selection sort against four minor variations of the constexpr solution:
std::sort
, std::ranges::sort
, std::nth_element
, and std::ranges::nth_element
.
It also compares what happens if Vector<Is...>
is a hand-coded empty struct, versus an alias for
std::integer_sequence<int, Is...>
. We always include the <algorithm>
header, even when it’s
not needed. We always compile with -O2
, since that’s what we’d expect in production.
Here’s Clang trunk (the future Clang 19) with libc++, running on my MacBook.
Here’s GCC 12.2 with libstdc++, running on RHEL 7.1.
I don’t have the ability to run this benchmark on MSVC, but if you do, please send me your results! Get my Python script here.
Observation #1: The constexpr-STL-algorithm version really doesn’t care whether you use std::integer_sequence
or Vector
. This actually surprised me a little bit, because the constexpr version is the one that
uses Iota
, and I expected my hand-rolled Iota
to be slower than the STL vendor’s make_integer_sequence
(which takes advantage of the compiler’s builtin __make_integer_seq
).
On such small inputs, though, I can see how the speed of Iota
is the least of our worries.
Observation #2: Below the two lines for selection sort, we see distinctive curves for \(O(n^2\log n)\) std::sort
and \(O(n^2)\) std::nth_element
. As predicted, nth_element
handily beats sort
. This also gives a vendor-independent
answer to my biggest question: Is constexpr-evaluated std::sort
faster than a hand-metaprogrammed selection sort?
Yes, constexpr beats metaprogramming.
Observation #3: On Clang/libc++, the Classic and Ranges algorithms are extremely similar in performance.
On GCC/libstdc++, the Ranges algorithms take a huge penalty. At first I chalked this up to how on libc++
the guts of std::sort
and std::ranges::sort
are literally the same template, parameterized by an
_AlgPolicy
.
(At the LLVM Dev Meeting in October 2023, Konstantin Varlamov gave a 20-minute talk
touching on this. Obviously in 20 minutes one can’t go deep into the details; but he did claim that
this design is unique to libc++. So this was on my mind already.)
However, libstdc++’s std::ranges::sort
is also merely a thin wrapper around std::sort
.
It boils down to this:
It operator()(It first, Sentinel last, Comp comp = {}, Proj proj = {}) const {
auto lasti = std::ranges::next(first, last);
std::sort(first, lasti, detail::make_comparator_by_pasting_together(comp, proj));
return lasti;
}
So if there’s a slowness to instantiating or evaluating this, it must be in the part that
creates the comparator by pasting together std::ranges::less
and std::identity
to come up with a type whose behavior one could summarize as “std::less<int>
, but slower.”
Both libc++ and libstdc++ need to do that pasting-together, so why should libstdc++ be so much
worse?… Well, libc++ optimizes
the “99% case” where Proj
is std::identity
. libstdc++, as far as I can tell as of this
writing, does not.
So that’s why libstdc++’s std::ranges::sort(first, last)
is vastly slower
than their std::sort(first, last)
— every call to the comparator turns into three or more calls
to std::invoke
!
In short, libstdc++’s Ranges algorithms suffer a huge constexpr penalty right now; but maybe
they could easily fix that, by adding the same special case we see in libc++. For the time being,
if your code uses STL algorithms at constexpr time, all else being equal, you should prefer
std
over std::ranges
. In fact I’d give this advice to ordinary runtime code, too.
In non-generic code that doesn’t need Ranges’ arcane features, why pay their cost?
See the followup for a better faster solution:
Auto
scope-guard macro
(“The Auto
macro” (2018-08-11)).
Usually, I say “No need!” The neat thing about Auto
’s particular syntax
is that it’s conceptually just a way to defer code to the end of a scope. Feature requests
usually take the form of modifying the deferred code in some way — which is already (and more
explicitly) allowed simply by… writing the code that way.
Before we look at those “not-a-bug” examples, let’s see the one feature request I’ve actually agreed with and adopted:
“My deferred code might throw, which smacks into the implicit noexcept
of the Auto
object’s destructor and terminates. That destructor should be noexcept(false)
!”
Quite true! Instead of the internal lambda-holder looking like this:
template<class L>
class AtScopeExit {
L& m_lambda;
public:
AtScopeExit(L& action) : m_lambda(action) {}
~AtScopeExit() { m_lambda(); }
};
its destructor should really look like this:
~AtScopeExit() noexcept(false) { m_lambda(); }
That explicit noexcept(false)
undoes the implicit noexcept
that C++ places
on most destructors.
Formally, noexceptness is propagated from the bases and data members into the destructor’s implicit noexcept-spec ([except.spec]/8); in other words, given
struct Y { X x; ~Y() {} }
,~Y
will have the same noexceptness as~X
even though~Y() {}
is user-provided. This is different from how it works for, say, move constructors, where a user-providedY(Y&&) {}
is implicitly non-noexcept even ifX(X&&)
is noexcept.
Adding noexcept(false)
to the internal type’s destructor
allows us to support code like this (Godbolt):
void example1() {
Auto(throw 42);
if (cond)
return; // here 42 is thrown
neverThrows();
// here 42 is thrown
}
Of course, if we’re exiting the scope because of an exception, and then the Auto
’s code
throws its own exception, the runtime will call std::terminate
anyway (because you can’t
propagate two exceptions at once). In that case, our addition of noexcept(false)
is
harmless but not helpful either.
Removing that implicit noexcept
from the destructor arguably alters the meaning of
(Godbolt):
void cleanup();
void test() {
Auto(cleanup());
}
Before, test
couldn’t ever propagate an exception; if cleanup
threw, it would smack
into the destructor’s implicit noexcept
and terminate. So there was only one way to
exit from test
. After, there are two ways to exit, so the behavior isn’t as simple —
but GCC 13 generates the same text section for both (pushing the whole difference
into the unwind tables), and Clang 17 actually generates smaller code for the new Auto
!
Anyway, if this matters to you, you can generate even smaller code — identical before and
after this change — by declaring void cleanup() noexcept
.
Now let’s see some “rejected” feature requests.
“My deferred code might throw, which terminates if there’s already an exception in flight.
Therefore, Auto
’s lambda should wrap my code in try
/catch
!”
“My deferred code might throw, which terminates if there’s already an exception in flight.
Therefore, Auto
’s lambda should wrap my code in an if
, so it won’t run if there’s an
exception in flight!”
The Auto
-world answer is that you’re responsible for what code runs at the end of your
scope; if you want a particular control-flow construct, just write it! Like this:
void example2() {
Auto(
try {
mightThrowC();
} catch (...) {}
);
mightThrowA();
mightThrowB();
}
void example3() {
int inflight = std::uncaught_exceptions();
Auto(
if (std::uncaught_exceptions() == inflight) {
mightThrowC();
}
);
mightThrowA();
mightThrowB();
}
But in most cases such code isn’t needed.
Auto
’s simple definition ensures that “you don’t pay for what you don’t use.”
commit
“I should be able to name the Auto
object and say guard.commit()
at the end of my
transaction; only uncommitted transactions should run the deferred code!”
The Auto
-world answer is that conceptually there is no “Auto
object”;
Auto
simulates a control-flow construct, not an object with state.
Instead of something like this:
void fantasy4() {
FantasyTransactionGuard g(
puts("Transaction failed");
);
mightThrowA();
if (rand()) return;
mightThrowB();
g.commit(); // now the puts won't run!
}
in Auto
-world you’d write the boolean variable explicitly in your code:
void example4(int k, int v) {
bool committed = false;
Auto(
if (!committed) {
puts("Transaction failed");
}
);
mightThrowA();
if (rand()) return;
mightThrowB();
committed = true; // now the puts won't run!
}
“Auto
’s lambda captures everything by [&]
, but I need a version that captures
by value instead!”
The Auto
-world answer is that there is no “lambda”; Auto
simulates
a control-flow construct, not an object with state. If you need a copy of
some variable i
, just make a copy! Instead of something like this:
void fantasy5() {
static int counter = 0;
FantasyAutoCapturingByValue(
printf("finished operation %d\n", counter);
);
counter += rand();
// here the original value is printed
}
in Auto
-world you’d write the copy operation explicitly in your code:
void example5() {
static int counter = 0;
int originalCounter = counter;
Auto(
printf("finished operation %d\n", originalCounter);
);
counter += rand();
// here the original value is printed
}
This keeps all the important control flow and data-copying visible in your code, while hiding only the very smallest amount of “magic” behind the macro.
]]>Here’s a partial map of the starting elements’ neighborhood:
The game’s UI is quite different on mobile versus desktop; I like the mobile version much better.
It’s also easy to create your own UI for it, e.g. at the command line using Python requests
to
hit the neal.fun/api/infinite-craft/pair
API endpoint. (Out of respect for his backend, I’ll refrain
from posting the exact code. He’s got what looks like a proper rate-limiter, though.)
The cleverest part of the design is that the backend doesn’t go to the LLM every time; that would be expensive and slow. The backend (certainly looks as if it) keeps the entire history of the world in something like memcached. Every time you combine a pair, the backend looks to see if that pair has ever been combined before, and if so, it serves you the cached response. Only completely new pairings are sent to the LLM, and then the response cached for posterity. This means that the same pairing will give the same result every time, for every player. Cutely, this also allows the game to detect if you’ve just created a never-before-seen element! For example, everyone knows that Water plus Fire equals Steam, but maybe you’re the first person ever to see that Brunch plus Sandwich equals Sandbrunch. (Hey, it’s an LLM. They can’t all be winners.) So then the UI can display a little “First Discovery” flair on that element for you. Sadly, as of this writing the flair is shown only on desktop, not on mobile.
Infinite Craft is a sandbox game; but you can give yourself a goal, such as to create Math, or Potato, or Hamlet. (I’ve achieved Hamlet. Math and — surprisingly — Potato remain elusive.) Play competitively by racing to create an element in the shortest time, or with the smallest number of byproducts. Play on one device by taking turns á là pick-up sticks or Jenga: the first player to fail to create a new result on their turn loses. The possibilities, like the combinations, are endless!
Since the number of possible interactions grows quadratically with the number of elements you’ve already found, Infinite Craft reminds me of HyperRogue in that the number of positions reachable in \(n\) steps is counterintuitively large and the task of retracing one’s steps (say, to remember the recipe for Hamlet) is counterintuitively difficult.
]]>Every C++ implementation is required to provide feature-test macros indicating which features it supports. These come in two flavors:
Core-language macros such as __cpp_generic_lambdas
,
specified in [cpp.predefined].
These are provided by the compiler vendor via the same mechanism as __FUNCTION__
and __GNUC__
.
Library macros such as __cpp_lib_ranges_iota
,
specified in [version.syn].
These are provided by the library vendor via the same mechanism as assert
and _GLIBCXX_RELEASE
.
They all begin with __cpp_lib_
. The easiest way to get all of them at once is
to #include <version>
(since C++20).
Most WG21 proposal papers not only change the Standard in some way but also add a feature-test macro of the appropriate kind. This lets the end-user C++ programmer detect whether their vendor claims to fully implement that feature yet.
I’ve always thought that the primary reason for feature-testing was conditional preprocessing. We can write polyfills like this:
#include <version>
#if __cpp_lib_expected >= 202211L
#include <expected>
#else
#include <tl/expected.hpp>
namespace std {
using tl::expected;
using tl::unexpected;
}
#endif
Or we can conditionally enable some functionality of our own library:
template<class PairLike>
auto extract_x_coord(const PairLike& p) {
#if __cpp_structured_bindings >= 201606L
const auto& [x, y] = p;
return x;
#else
// We can still work for std::pair, at least
return p.first;
#endif
}
This implies that if a paper does some very minor tweak, such that the code to detect-and-work-around the absence of that feature would always cost more than simply avoiding the feature altogether, then the paper doesn’t really need an associated feature-test macro. Semi-example 1: We might think it’s “obvious” that the code above should instead be written like this, so that it’s portable even to compilers without structured bindings. This attitude would reduce the necessity for a feature-test macro:
template<class PairLike>
auto extract_x_coord(const PairLike& p) {
static_assert(std::tuple_size<PairLike>::value == 2,
"p must have exactly two parts");
using std::get;
return get<0>(p);
}
Semi-example 2: We vacillated up to the last minute on whether
P2447 “std::span
over initializer list”
needed a feature-test macro or not, since we couldn’t imagine anybody writing conditional code
like this:
void f(std::span<const int>);
int main() {
#if __cpp_lib_span_initializer_list >= 202311L
f({1,2,3});
#else
f({{1,2,3}});
#endif
}
However, in the end we did add that feature-test macro.
The other day, I heard a second plausible use-case for feature-test macros. This new use-case makes
it a good idea to add a macro for pretty much every paper, no matter how small. The new idea is that
you can write your code “at head,” using whatever features of modern C++ you care to use, with no
#if
s at all; and then you can just assert to your build system that all the features you use are
in fact implemented by your current platform.
So we simply write:
#include <expected>
template<class PairLike>
auto extract_x_coord(const PairLike& p) {
const auto& [x,y] = p;
return x;
}
void f(std::span<const int>);
int main() {
f({1,2,3});
}
and then in a companion file (which could be a file built only by CMake, or a unit test, or simply a header pulled into our build at any point) we assert the features we expect to be available:
#include <version>
static_assert(__cpp_lib_expected >= 202211L);
static_assert(__cpp_lib_span_initializer_list >= 202311L);
static_assert(__cpp_structured_bindings >= 201606L);
If all these assertions pass, then the platform is “modern enough” for our codebase.
If any assertion fails, we simply tell the library-user to upgrade their C++ compiler
and we’re done. No fallbacks, no polyfills, just a simple file full of features and
version numbers — just like a pip requirements.txt
file
or the dependencies in a package.json
.
In this model, there’s no cost at all to “detecting the absence of a feature,” because we never intend to work around that absence. It’s cheap to add a single line to the “requirements header.” So paper authors should be liberal in adding feature-test macros to their proposals.
Once a feature-test macro has been added to the Standard it can never be removed, by definition. Will the C++11-era system of named feature-test macros eventually collapse under its own weight? Perhaps.
The C++11 system has already buckled in one sense: The original idea was that each macro’s
value could just be bumped every time a change was made to that feature. For example,
__cpp_constexpr
has been bumped eight times
as more and more of the core language has been made constexpr-friendly.
To take another relatively extreme example, __cpp_lib_ranges
started at 201911L
,
then P2415 owning_view
bumped it to 202110L
,
then P2602 “Poison Pills Are Too Toxic” bumped it to 202211L
,
and so on (omitting some intermediate bumps). But this system works only
if all vendors can be relied on to implement P2415 before P2602, since there’s no value of the macro
that would correspond to “I implement P2602 without P2415.”
Many Ranges-relevant papers (e.g. P1206 ranges::to
)
introduce their own macros different from __cpp_lib_ranges
, to allow vendors to implement them
independently of any other changes to the same header.
Eventually, the good names might all be taken.
I can imagine someday simply naming macros after paper numbers,
e.g. __cpp_lib_p1234r6
; but I think that day is still a long way off.
this
” explicit object parameter of an
arbitrary type, but I can’t do the same for a capturing lambda?
auto alpha = [](this std::any a) { return 42; };
// OK
auto beta = [&](this std::any a) { return 42; };
// ill-formed
GCC complains: “a lambda with captures may not have an explicit object parameter of an unrelated type.” (GCC’s diagnostic is — I think properly — SFINAE-friendly. Clang and MSVC — I think improperly — allow you to form the lambda type, and then error on any attempt to call it.)
This restriction exists by the following logic: A capturing lambda has captures, presumably,
because it wants to use them. To use its captures, the lambda’s operator()
must have access
to the lambda object (because its captures are stored as data members of that object). Therefore,
the operator()
must have an object parameter of the lambda’s own type — or at least a type
unambiguously derived from that type!
auto gamma = [x]() { return x + 1; };
is lowered by the compiler into basically
struct Gamma {
int x_;
auto operator()() const { return x_ + 1; }
};
auto gamma = Gamma{x};
It’s able to get at x_
(i.e. this->x_
) only because it has a this
parameter
(an “implicit object parameter”) of type Gamma
. Change the lambda to use either
of the new C++23 syntaxes which fiddle with that object parameter, and you’ll find
trouble. The static
specifier removes the object parameter entirely:
auto delta = [x]() static { return x + 1; };
// is ill-formed...
struct Delta {
int x_;
static auto operator()() { return x_ + 1; }
// ...because this is ill-formed!
};
The this
specifier fiddles with the object parameter’s type, either by pinning
it down to one concrete type, or by making it a template parameter:
auto zeta = [x](this std::any self) { return x + 1; };
// is ill-formed...
struct Zeta {
int x_;
auto operator()(this std::any self) {
return static_cast<Zeta&>(self).x_ + 1;
// ...because this is ill-formed!
}
};
auto eta = [x](this auto self) { return x + 1; };
// can be ill-formed...
struct Eta {
int x_;
auto operator()(this auto self) {
return static_cast<Eta&>(self).x_ + 1;
// ...(roughly) whenever this is ill-formed!
}
};
But the lowering is not the reality!
In the above examples, I was careful to use the name x
for the lambda’s capture (in the real C++23 code)
and the name x_
for the struct’s data member (in the lowered code). That reminds us that the lowering
operation isn’t a program transformation; you can’t blithely assume that everything inside
the lambda-expression’s curly braces works exactly the same as it would in an ordinary member function.
For example, the keyword this
acts “differently” inside a lambda — which is to say, it acts the same
as it does outside the lambda. That’s usually what we want.
struct Worker {
void run();
void sync_run_on_this_thread() {
this->run(); // OK
}
void sync_run_on_another_thread() {
std::thread([&]() {
this->run(); // OK
}).join();
}
};
It’s convenient that the two lines marked OK
both mean the same thing. Inside a lambda, historically,
there’s been no sense in letting this->run()
mean “invoke the run
method of this lambda object itself,”
because that’s never been something you can do with a lambda. Starting in C++23, we can create types derived
from lambda types whose operator()
(thanks to “deducing this
”) can actually get at the derived type.
So we can now create puzzles like this…
return [&](this auto self) {
printf("%d %d %d\n", x, this->x, self.x);
};
When embedded at the line marked HERE
in this program, the lambda above prints “1 2 3”
(Godbolt) — that is, each name x
refers to a different
object.
int one = 1;
struct Enclosing {
int x = 2;
auto factory(int& x = one) {
// HERE
}
};
using Base = decltype(Enclosing().factory());
struct Derived : Base {
int x = 3;
Derived(Enclosing& e) : Base(e.factory()) {}
};
int main() {
auto e = Enclosing();
auto theta = Derived(e);
theta();
}
Of course, this assumes you find a C++23 compiler capable of compiling this program at all! Readers may recall Knuth’s “man or boy” test for ALGOL 60; this program seems to be a bit like that for C++23 at the moment.
]]>Pack the 12 pentominoes into a 6x10 rectangle, then label each cell of that rectangle with a digit from 1 to 5 such that each pentomino contains all of 1 through 5, and, whenever two adjacent cells belong to different pentominoes, their labels’ sum is a prime number.
“marpocky” pointed out that the puzzle was unsolvable, because of the X pentomino: no matter how you label its legs, you arrive at a contradiction. For example, if two adjacent legs of the X are labeled 3 and 4, then we must have \(x\) such that \(x+3\) and \(x+4\) are both prime; this is impossible.
Likewise, adjacent legs cannot be labeled (2,3), (2,5), or (4,5).
Suppose one of the legs of the X is labeled 2; then neither adjacent leg can be either 3 or 5, so they must be 1 and 4 in either order. With 1,2,4 accounted for, the fourth leg (opposite the 2) can’t be either 3 (because (3,4) is verboten) or 5 (because (4,5) is verboten). Ergo, none of the X’s four legs can be labeled 2.
Suppose one of the legs is labeled 3; then neither adjacent leg can be either 2 or 4 (because (2,3) and (3,4) are verboten), so they must be 1 and 5 in either order. With 1,3,5 accounted for, the fourth leg (opposite the 3) can’t be either 2 (because (2,5) is verboten) or 4 (because (4,5) is verboten). Ergo, none of the X’s four legs can be labeled 3.
Without 2 and 3, we’re left with only three labels for the X’s four legs; Q.E.D., the X can’t be labeled with these digits at all.
However, “electricmaster23” quickly found a solution to the following variation:
Pack the 12 pentominoes into a 6x10 rectangle, then label each cell of that rectangle with a digit from 0 to 4 such that each pentomino contains all of 0 through 4, and, whenever two adjacent cells belong to different pentominoes, their labels’ sum is a prime number. (Neither 0 nor 1 is prime.)
There are 2339 possible packings to consider (up to rotation and reflection). I didn’t quickly find a listing of all 2339 packings of the 12 pentominoes into a 6x10 rectangle, so I’ve made my own listing, here. It starts with the line
FIIIIILLLLFFFNWWTTTLYFNNXWWTZZYYNXXXWTZVYUNUXPPZZVYUUUPPPVVV
which represents the packing
I wrote a little C program to find all the labelings of these 2339 graphs satisfying the sum-to-a-prime criterion. There are a lot of them! For the packing above, there are 12,347,672 possible labelings. This is the lexicographically first of them:
When we consider not just this one packing but also the other 2338 possible packings, we find that the total number of viable labelings satisfying the sum-to-a-prime criterion is truly astronomical: 20,116,548,805. Here’s the number of labelings found by my computer search (up to rotation and reflection — we ensure the F pentomino isn’t flipped and occupies the left side of the rectangle) where the upper right corner is occupied by:
F | 585401512 | U | 2503719527 |
I | 3674641616 | V | 2290775642 |
L | 2471381448 | W | 559212338 |
N | 1182035984 | X | — |
P | 2405917776 | Y | 1320223809 |
T | 2419068152 | Z | 704171001 |
There are other labeling criteria we could investigate. Many possible criteria are simply unsolvable, just like the 12345-coloring criterion we started with:
Label with 1,2,3,4,5 such that adjacent cells belonging to different pentominoes always sum to k mod n. (No solutions; consider the X tile.)
Label with 1,2,3,4,5 such that adjacent cells belonging to different pentominoes always differ by at least 3. (No solutions; consider the X tile.)
Label with 1,2,3,4,5 such that adjacent cells belonging to different pentominoes always multiply to at least 6. (No solutions; consider the F tile’s 1-cell.)
But some are even more “interesting” (in the sense of having fewer solutions) than the 01234-coloring criterion. For example:
Label with 1,2,3,4,5 such that adjacent cells belonging to different pentominoes always differ by at most 1. There are 1,059,492,120 labelings that satisfy this criterion. 517 of the 2339 possible packings have no viable labeling at all. The packing with the fewest viable labelings (at 64) is:
Label with 1,2,3,4,5 such that adjacent cells belonging to different pentominoes always differ by at least 2. There are 101,275,328 labelings that satisfy this criterion. 91 of the 2339 possible packings have no viable labeling at all. The packing with the fewest viable labelings (at 48) is:
Label with 0,1,2,3,4 such that adjacent cells belonging to different pentominoes always sum to a power of two. (0 is not a power of two.) There are only 240 labelings that satisfy this criterion. 2331 of the 2339 possible packings have no viable labeling at all. Here’s a representative labeling from each viable packing:
In fact, we can completely abstract the criterion-construction part: Label with \(a_1, a_2, a_3, a_4, a_5\) such that the labels \((a_i, a_j)\) of adjacent cells belonging to different pentominoes satisfy \(R(a_i, a_j)\), for a given symmetric relation \(R\). There are only 544 possible values for \(R\) (that’s OEIS A000666), so we could just run through them and see if any of them produce a thrilling puzzle.
Here’s Python code to generate those 544 relations. But to exhaustively explore all 544 puzzles corresponding to those 544 relations, you’ll need a smarter algorithm and/or a faster computer than I’ve got!
Looking at these pictures, it strikes me that the “right” presentation for this kind of puzzle would be to give the pentominoes cut free of the grid, with their cells still labeled; then the puzzle-solver’s task would be to fit them back into the grid such that the criterion was satisfied.
Can you fit these pentominoes back into a 6x10 grid such that adjacent cells belonging to different pentominoes always differ by at least 2? (Some pieces must be rotated, none reflected.)
Now, a really clever puzzle-constructor would present a single set of pieces that could be put back together in two different ways — one way satisfying the “at least 2” criterion and another way satisfying the “at most 1” criterion. I’m afraid I don’t have nearly that much patience.
]]>