Computing OEIS A360447

Last week an interesting integer sequence popped up on /r/CasualMath and then OEIS: OEIS A360447. This isn’t a “numerical” sequence per se: it’s a permutation of the positive integers achieved by the following rule:

  • Start with the list 0, 1, 2.
  • For each integer i, in order, starting with 3, find the two adjacent integers in the list which sum to i and add i between them. For example, 3 goes between 1 and 2; then 4 goes between 1 and 3; producing the list 0, 1, 4, 3, 2 so far.
  • Now 5 could go between 1, 4 or between 3, 2. When you have multiple options, choose the pair with the smallest absolute difference: (3 - 2) = 1 is smaller than (4 - 1) = 3, so 5 goes between 3 and 2.
  • Now there’s no pair of adjacent integers that sums to 6. When you have no options, just insert the integer at the end of the list; producing 0, 1, 4, 3, 5, 2, 6 so far.
  • Consider the prefix 0, 1, 4. Notice that the only numbers that could ever possibly be inserted in between these elements are 1 and 5, and we’ve already passed those values: we’re working on i=7 now. So 0, 1, 4 is guaranteed never to change after this point — it’s a prefix of the “true” sequence OEIS A360447.

You can compute this sequence by brute force something like this:

std::vector<int> v = {0, 1, 2};
for (int i=3; i < MAX; ++i) {
    auto p = find_insertion_point(v, i);
    v.insert(p, i);
}

But finding the insertion point, with no other bookkeeping to speed the computation, takes O(n) time. So does v.insert. We can do vastly better with an algorithm like this:

std::list<int> v = {0, 1, 2};
std::map<int, ListIterator> sums = {
    { 3, std::prev(v.end()) },
};
for (int i=3; i < MAX; ++i) {
    auto it = sums.find(i);
    if (it == sums.end()) {
        v.push_back(i);
        maybe_update_sums(std::prev(v.end()));
    } else {
        auto p = it->second;
        p = v.insert(p, i);
        sums.erase(it);
        maybe_update_sums(p);
        maybe_update_sums(std::next(p));
    }
}

where maybe_update_sums(pos) checks to see if pos is now the best place to insert the number i in the future, and if so, updates the sums map with { i, pos }.

Using this algorithm, we’re bottlenecked mainly on memory — and what a bottleneck it is! We can save some memory by observing that we’ll never look up any integer bigger than MAX in sums, so we never have to insert sums bigger than MAX into sums. We can save more memory, and also save a lot of time, by switching from std::list to std::forward_list: a list node holding an int plus a next pointer is only 66% the size of a list node holding that int plus next and prev pointers. It turns out we can also save a lot of time by switching from std::map to std::unordered_map.

In fact, we can save the most memory by switching both of our data structures to completely preallocated arrays of int: The “list” structure becomes an array where next_[i] == j indicates that the integer j follows the integer i in our current list. The “map” structure becomes an array where data_[s] == i if and only if i + next_[i] == s. This lowers our memory usage to the probably-minimal 2*MAX integers (where MAX is the number of integers we’ll insert into our list before stopping and reporting our results).

Find a snapshot of my code here. On my laptop, the STL forward_list/unordered_map version chugs through the first 100 million integers, producing entries a(0)=0 through a(59)=44289045 of OEIS A360447, in 49 seconds. The “most custom” version, using preallocated arrays for both data structures, does the same in 4.5 seconds.

Tackling the first billion integers, it produces a(95)=362806936 in 60 seconds:

$ g++ -std=c++20 -O2 -march=native "-DMAX=1'000'000'000" \
      -DUSE_ARRAY_LIST -DUSE_ARRAY_MAP \
      2023-03-05-oeis-a360447.cpp
$ ./a.out
[...]
0, 1, 4, 11, 29, 47, 18, 61, 165, 434, 703, 1675, 972, 2213,
10093, 17973, 25853, 59586, 33733, 7880, 21427, 56401, 204177,
147776, 91375, 217724, 126349, 414021, 287672, 161323, 34974,
48521, 13547, 5667, 9121, 12575, 28604, 16029, 3454, 1241, 269,
1180, 3271, 2091, 911, 2464, 11409, 20354, 90361, 250729, 160368,
551111, 1492965, 941854, 390743, 1011861, 621118, 2714847,
15667964, 44289045, 161488216, 278687387, 117199171, 72910126,
174441333, 450413873, 275972540, 101531207, 28621081, 12953117,
36144504, 23191387, 33429657, 10238270, 7523423, 19855422,
52042843, 32187421, 44519420, 12331999, 4808576, 21328033,
59175523, 156198536, 253221549, 350244562, 97023013, 231893516,
134870503, 37847490, 16519457, 61269252, 167288299, 273307346,
106019047, 362806936,
(i=982401761, t=60s, next update at i=1345208697)

a(96)=1345208697 takes another five minutes beyond that. At that point, the list-so-far continues:

1345208697, (982401761, 1601996586, 619594825, ...)

and 2327610458 is in fact inserted between 1345208697 and 982401761. Whether 3672819155 will be inserted between 1345208697 and 2327610458, I don’t know — I run out of RAM before then.

Update (2023-03-08)

Christian Sievers made the excellent observation that we don’t even need two distinct data structures — all we need is a single array of Int, where the ith element of the array holds the successor of integer i in the list if it’s already been inserted, or the (currently anticipated) predecessor of i if it hasn’t yet been inserted. A single pass over the array constructs the whole list. using only two loads and two stores for elements added at the back of the list, and three loads and four stores for elements added in the middle. The resulting program feels very TAOCP.

The new and improved code is available here. Brendan McKay used it to produce a(0) through a(106) of OEIS A360447:

0, 1, 4, 11, 29, 47, 18, 61, 165, 434, 703, 1675, 972, 2213,
10093, 17973, 25853, 59586, 33733, 7880, 21427, 56401, 204177,
147776, 91375, 217724, 126349, 414021, 287672, 161323, 34974,
48521, 13547, 5667, 9121, 12575, 28604, 16029, 3454, 1241, 269,
1180, 3271, 2091, 911, 2464, 11409, 20354, 90361, 250729, 160368,
551111, 1492965, 941854, 390743, 1011861, 621118, 2714847,
15667964, 44289045, 161488216, 278687387, 117199171, 72910126,
174441333, 450413873, 275972540, 101531207, 28621081, 12953117,
36144504, 23191387, 33429657, 10238270, 7523423, 19855422,
52042843, 32187421, 44519420, 12331999, 4808576, 21328033,
59175523, 156198536, 253221549, 350244562, 97023013, 231893516,
134870503, 37847490, 16519457, 61269252, 167288299, 273307346,
106019047, 362806936, 1345208697, 3672819155, 6000429613,
2327610458, 982401761, 1601996586, 5425584583, 14674757163,
9249172580, 3823587997, 9868767405,
(i=25782714218, next update at i=35651481623, elapsed=5744s)

See also:

Posted 2023-03-05