Underseeding mt19937; introducing xoshiro256ss

TLDR: std::mt19937 is half the speed of a good PRNG (and 500 times slower to seed, for what that’s worth). These days I copy-and-paste xoshiro256ss.h when I need random numbers for a project.

The immediate impetus for this post comes from Melissa O’Neill’s blog post “C++ Seeding Surprises” (April 2015). That post is six years old, but just the other day Dimitrij Mijoski discovered a bug in its code: Melissa’s post had claimed that these two snippets (which both underseed std::mt19937 by initializing it with only 32 bits of entropy) had identical effects, when in fact they don’t.

// #1
std::random_device rdev;
uint32_t random_seed = rdev();
std::seed_seq seeder{random_seed};
std::mt19937 my_rng(seeder);
if (my_rng() == 7) ~~~

// #2
std::mt19937 my_rng(std::random_device{}());
if (my_rng() == 7) ~~~

Snippet #1 never produces 7 as its first output; but snippet #2 sometimes does. (For example, when rdev() returns 1080100664: Godbolt.) To determine that snippet #1 never produces 7, you basically have to run it exhaustively for every 32-bit value from 0 to 4294967295. On my machine, that snippet takes 20 microseconds per iteration, which means 4.3 billion iterations takes just about 24 hours.

The standard std::seed_seq is basically a thin wrapper around std::vector<int>, so creating a seed_seq (as in snippet #1) requires heap traffic. Snippet #2 eliminates the heap traffic, which makes it run much faster, at about 1.2 microseconds per iteration. Snippet #2 can iterate all 4.3 billion possible seeds in about 90 minutes. (Which reveals that snippet #2 never produces 42 as its first output.)

But the real offender here is the Mersenne Twister itself. The point of Melissa’s original post is that both of these snippets “under-seed” std::mt19937 with only 32 bits of entropy (so it’s totally unsurprising that many values never appear as the engine’s first output) — and unfortunately it’s much easier to under-seed mt19937 than to seed it correctly. To seed it correctly, you need to provide 624 32-bit words’ worth of entropy. (There is no API for computing the number 624 from within C++; you “just have to know.”)

Seeding the Mersenne Twister is slow… and pulling random numbers out of it is also slow, because it has that huge internal state to update.

static_assert(sizeof(std::mt19937) == 5000);

This discussion reminded me that about six months back, I’d switched some of my hobby projects (notably Buddhabrot and Knuth’s elevator simulator) away from std::mt19937 to a fast random number generator known as Xoshiro256** (Sebastiano Vigna and David Blackman, 2018). I took Vigna and Blackman’s code, translated it from C to C++, and put the resulting 50-line snippet on GitHub so I’d always have it handy to copy and paste.

xoshiro256ss has an internal state of only four 64-bit words, so it is much faster to seed and to generate from. Substituting xoshiro256ss for mt19937, we can test snippet #2 above:

xoshiro256ss my_rng(std::random_device{}());
if (my_rng() == 7) ~~~

and blaze through all 4.3 billion seeds in eight seconds.

Here’s a QuickBench — libc++, libstdc++ — showing the time to seed mt19937 versus xoshiro256ss. The latter is several hundred times faster; and even 50% faster than the linear congruential generator in std::default_random_engine.

Keep in mind:

  • In this post we are unrealistically emphasizing the performance of seeding, which a good program does only once, over the performance of generating output, which a program will do many more times. (This is unsurprising, because our original motivation was to prove something about underseeding by exhaustively testing all 32-bit seeds.) But, for the record, xoshiro256ss is also markedly faster at generating output than std::mt19937 (QuickBench).

  • Switching PRNGs doesn’t magically fix underseeding. We are still giving xoshiro256ss only 32 bits of entropy when it really wants 256 bits. (Seeding xoshiro256ss according to snippet #2, as above, will also never produce 7 as the engine’s first output. You have to set some high bits to get a 7; for example, seeding with 0x2'0f1d46b1 will produce a first output of 0x5827a259'00000007.)

You might call underseeding a problem; but really, I think of it as a universal constant, given the current state of the language. Given the current library APIs, programmers simply are going to underseed their PRNGs by passing in small (32-bit or 64-bit) seeds. So we have two independent lines of attack here:

  • WG21 ought to make seeding a PRNG easier to do properly than improperly. P0205R0 “Allow Seeding Random Number Engines with std::random_device (Moritz Klammler, 2016) proposed the obvious syntax std::mt19937 g(std::random_device{}); to mean “hey PRNG, please seed yourself appropriately from this source of seed material.” It is not clear to me whether P0205R1 (2021) has preserved this simple syntax. If it hasn’t, then it’s already broken.

  • In the meantime, given that you’re going to seed your PRNG with a relatively small amount of entropy, you might as well use a PRNG that has good performance characteristics. std::mt19937 is the worst possible default if you care about performance (which you do, if you’re using random numbers for Monte Carlo simulation or anything of that nature). Saying “oh, but mt19937 has a big state space!” is irrelevant if you’re underseeding it anyway.

In my own hobby projects, I’ve taken to using Quuxplusone/Xoshiro256ss in lieu of std::mt19937.

Posted 2021-11-23