Underseeding mt19937
; introducing xoshiro256ss
mt19937
; introducing xoshiro256ss
TLDR: std::mt19937
is half the speed of a good PRNG (and 500 times slower to seed,
for what that’s worth).
These days I copy-and-paste xoshiro256ss.h
when I need random numbers for a project.
The immediate impetus for this post comes from Melissa O’Neill’s
blog post “C++ Seeding Surprises”
(April 2015). That post is six years old, but just the other day
Dimitrij Mijoski discovered a bug in its code: Melissa’s post had
claimed that these two snippets (which both underseed std::mt19937
by initializing it with only 32 bits of entropy) had identical effects,
when in fact they don’t.
// #1
std::random_device rdev;
uint32_t random_seed = rdev();
std::seed_seq seeder{random_seed};
std::mt19937 my_rng(seeder);
if (my_rng() == 7) ~~~
// #2
std::mt19937 my_rng(std::random_device{}());
if (my_rng() == 7) ~~~
Snippet #1 never produces 7
as its first output; but snippet #2 sometimes does.
(For example, when rdev()
returns 1080100664: Godbolt.)
To determine that snippet #1 never produces 7
, you basically have to run it exhaustively
for every 32-bit value from 0 to 4294967295. On my machine, that snippet takes 20 microseconds
per iteration, which means 4.3 billion iterations takes just about 24 hours.
The standard std::seed_seq
is basically a thin wrapper around std::vector<int>
, so
creating a seed_seq
(as in snippet #1) requires heap traffic. Snippet #2 eliminates the
heap traffic, which makes it run much faster, at about 1.2 microseconds per iteration.
Snippet #2 can iterate all 4.3 billion possible seeds in about 90 minutes. (Which reveals
that snippet #2 never produces 42
as its first output.)
But the real offender here is the Mersenne Twister itself. The point of Melissa’s original
post is that both of these snippets “under-seed” std::mt19937
with only 32 bits of
entropy (so it’s totally unsurprising that many values never appear as the engine’s first
output) — and unfortunately it’s much easier to under-seed mt19937
than to seed it
correctly. To seed it correctly, you need to provide 624 32-bit words’ worth of entropy.
(There is no API
for computing the number 624 from within C++; you “just have to know.”)
Seeding the Mersenne Twister is slow… and pulling random numbers out of it is also slow, because it has that huge internal state to update.
static_assert(sizeof(std::mt19937) == 5000);
This discussion reminded me that about six months back, I’d switched some of my hobby
projects (notably Buddhabrot and
Knuth’s elevator simulator)
away from std::mt19937
to a fast random number generator known as
Xoshiro256** (Sebastiano Vigna and David Blackman, 2018).
I took Vigna and Blackman’s code, translated it from C to C++, and put the
resulting 50-line snippet on GitHub so I’d always have it handy to copy and paste.
xoshiro256ss
has an internal state of only four 64-bit words, so it is much faster
to seed and to generate from. Substituting xoshiro256ss
for mt19937
, we can test
snippet #2 above:
xoshiro256ss my_rng(std::random_device{}());
if (my_rng() == 7) ~~~
and blaze through all 4.3 billion seeds in eight seconds.
Here’s a QuickBench — libc++,
libstdc++ — showing the time
to seed mt19937
versus xoshiro256ss
. The latter is several hundred times faster;
and even 50% faster than the linear congruential generator in std::default_random_engine
.
Keep in mind:
-
In this post we are unrealistically emphasizing the performance of seeding, which a good program does only once, over the performance of generating output, which a program will do many more times. (This is unsurprising, because our original motivation was to prove something about underseeding by exhaustively testing all 32-bit seeds.) But, for the record,
xoshiro256ss
is also markedly faster at generating output thanstd::mt19937
(QuickBench). -
Switching PRNGs doesn’t magically fix underseeding. We are still giving
xoshiro256ss
only 32 bits of entropy when it really wants 256 bits. (Seedingxoshiro256ss
according to snippet #2, as above, will also never produce7
as the engine’s first output. You have to set some high bits to get a7
; for example, seeding with0x2'0f1d46b1
will produce a first output of0x5827a259'00000007
.)
You might call underseeding a problem; but really, I think of it as a universal constant, given the current state of the language. Given the current library APIs, programmers simply are going to underseed their PRNGs by passing in small (32-bit or 64-bit) seeds. So we have two independent lines of attack here:
-
WG21 ought to make seeding a PRNG easier to do properly than improperly. P0205R0 “Allow Seeding Random Number Engines with
std::random_device
” (Moritz Klammler, 2016) proposed the obvious syntaxstd::mt19937 g(std::random_device{});
to mean “hey PRNG, please seed yourself appropriately from this source of seed material.” It is not clear to me whether P0205R1 (2021) has preserved this simple syntax. If it hasn’t, then it’s already broken. -
In the meantime, given that you’re going to seed your PRNG with a relatively small amount of entropy, you might as well use a PRNG that has good performance characteristics.
std::mt19937
is the worst possible default if you care about performance (which you do, if you’re using random numbers for Monte Carlo simulation or anything of that nature). Saying “oh, butmt19937
has a big state space!” is irrelevant if you’re underseeding it anyway.
In my own hobby projects, I’ve taken to using
Quuxplusone/Xoshiro256ss
in lieu of std::mt19937
.