PSA: <random>
’s distributions are stateful
<random>
’s distributions are statefulTwice in the past week I’ve run into this issue (once on Code Review StackExchange and once on Slack), so I thought I’d make a blog post about it.
Distributions are stateful!
Consider the following code:
std::mt19937 g;
std::cout << std::uniform_int_distribution<int>(0, 10)(g) << '\n';
This code works fine. But, a student may think, let’s pull out the distribution into a named local variable:
std::mt19937 g;
const auto dist = std::uniform_int_distribution<int>(0, 10);
std::cout << dist(g) << '\n';
This code does not compile (except on MSVC, which is how I think
students get lulled into thinking it’s correct). The problem is that dist
’s operator()
is
not const-qualified, so you can’t call it on a const-qualified object.
(See “const
is a contract” (2019-01-03).)
The operator()
is not const-qualified because — just like the operator()
of std::mt19937
itself — a call to that operator may modify the internal state of the distribution.
That’s right, distributions are just as stateful as random number engines!
Why might a distribution want to keep internal state? Well, the most popular algorithm for producing
numbers in a bell-shaped normal distribution
is the Box–Muller transform, which produces
two independently distributed outputs in a single computation.
libc++ (here)
uses the Marsaglia polar method, which
similarly produces two results at a time. Having computed those two results, it would be wasteful
to throw the second one away; so std::normal_distribution
will cache the second result inside
itself until the user asks for another output.
For uniform_int_distribution
, which produces one value at a time, there’s no need to keep
any internal state (other than the parameters min
and max
); but all vendors except MSVC
continue to mark uniform_int_distribution::operator()()
as non-const, for two reasons:
-
The Standard mandates that they do it. Vendors are not free to arbitrarily add
const
to random member functions. [EDIT: Oops! [member.functions]/2 seems to say that vendors are free to do that! Really? Thanks to Tim Song for bringing this to my attention.) -
It protects against Hyrum’s Law. If your code compiles with
uniform_real_distribution
but fails to compile when you change it tonormal_distribution
, that’s brittle code. It’s beneficial for all distributions to have the same API and conform to the same concepts.
So, engines and distributions are both stateful. You can observe the effects of these two different levels of statefulness with a test program like this (Godbolt):
template<class G, class D, class F>
void f(G& g, D& d, F reset_some_stuff) {
g.seed(1);
d.reset();
printf("Expected output: ");
printf("%0.2f ", d(g));
printf("%0.2f ", d(g));
printf("%0.2f ", d(g));
printf("%0.2f ", d(g));
printf("%0.2f ...\n", d(g));
g.seed(1);
d.reset();
reset_some_stuff();
printf("%0.2f ", d(g));
printf("%0.2f ", d(g));
printf("%0.2f ...\n", d(g));
}
int main() {
std::mt19937 g;
std::normal_distribution<float> dist(0.5f, 1.0f / 6);
f(g, dist, [&]() {
dist(g);
printf("Grab one, then reset both: ");
g.seed(1); dist.reset();
});
f(g, dist, [&]() {
dist(g);
printf("Grab one, then reset just the engine: ");
g.seed(1);
});
f(g, dist, [&]() {
dist(g);
printf("Grab one, then reset just the distribution: ");
dist.reset();
});
f(g, dist, [&]() {
dist(g);
printf("Grab one, then reset neither: ");
});
}
Interestingly enough, you can also observe that the normal_distribution
s of libstdc++ and libc++
produce their outputs using the same algorithm… but produce them in reversed order! This
illustrates another common <random>
pitfall (and a personal pet peeve of mine): the standard
library’s distributions are not portable from one implementation to another. If you are generating
random numbers for use in simulations or games where you need reproducible results, you can
get away with using the standard random number engines
(if you don’t care about performance), but you should
never use the standard distributions to produce your random results, if you care about
being able to reproduce those results later on someone else’s machine.