Bit-vector manipulations in standard C++
Recently on the std-proposals mailing list, Madhur Chauhan proposed
that C++ needs a better way to find the first set bit in an arbitrary-length bit-string.
Today, if you want to manipulate bit-strings of length N
, you have these options:
- For
N <= 64
, useunsigned long long
. - On some platforms, for
N <= 128
, use__uint128
. - On Clang 13, use
_ExtInt(N)
. (Clang 14 drops support forN > 128
, which removes any reason to use_ExtInt
as far as I know.) - Use something like my
Wider<T>
, which is statically sized and stack-allocated. - Use
std::bitset<N>
, which is statically sized and stack-allocated. - Use
std::vector<bool>
, which is dynamically resizeable and heap-allocated. - Use
boost::dynamic_bitset<>
, which is dynamically resizeable and heap-allocated.
Suppose we have a bit-string stored in one of these ways, and we want to find the second-lowest 1-bit.
Conventionally, the lowest-order bit will be
(x & 1)
for integer-like types which are written high-to-low, butx[0]
for sequence-like types which are written low-to-high. So the operation that for__uint128
andstd::bitset<N>
we spell>>
(“right-shift”) is, forvector<bool>
, spelledstd::shift_left
.
We’ll generalize the problem by defining a function f(n, x)
that finds the lowest
1-bit in x
whose index is greater than or equal to the given index n
, or returns
some large value if no such 1-bit exists.
The original problem, of finding the index of the second-lowest 1-bit in x
, becomes:
y = f(f(0, x) + 1, x);
if (y < N) ~~~
Here are the current state-of-the-art methods for implementing f
, as far as I know.
Notice that I’ll use int
for bit indices, since 2147483647 bits
ought to be enough for anyone.
Godbolt:
int f(int n, unsigned long long x) {
return n >= N ? N : std::countr_zero(x >> n) + n;
}
#ifndef _MSC_VER
static int f(int n, __uint128_t x) {
constexpr unsigned long long mask = ~0uLL;
return (x == 0) ? 128 :
(n < 64 && (x & mask) != 0) ?
std::countr_zero((unsigned long long)(x >> n)) + n :
f(n - 64, x >> 64) + 64;
}
#endif
int f(int n, const Wider<T>& x) {
Wider<T> mask = Wider<T>(1) << n;
for (int i = n; i < N; ++i, mask <<= 1) {
if (x & mask) return i;
}
return N;
}
int f(int n, const std::bitset<N>& x) {
#ifdef __GLIBCXX__
return (n == 0) ? x._Find_first() : x._Find_next(n-1);
#else
for (int i = n; i < N; ++i) {
if (x[i]) return i;
}
return N;
#endif
}
static int f(int n, const std::vector<bool>& x) {
if (n >= N) return N;
return std::find(x.begin() + n, x.end(), true) - x.begin();
}
static int f(int n, const boost::dynamic_bitset<>& x) {
if (n >= N) return N;
return (n == 0) ? x.find_first() : x.find_next(n-1);
}
For unsigned long long
, we can’t just call std::countr_zero(x >> n)
because n
might be out of range, making x >> n
undefined behavior.
libc++ provides std::countr_zero
for __uint128_t
, but GNU libstdc++ does not,
and MSVC doesn’t support __uint128_t
at all. See also
“Is __int128
integral?” (2019-02-28).
GNU libstdc++ provides implementation-detail methods _Find_first
and _Find_next
on std::bitset
, with exactly the same semantics as the public find_first
and find_next
methods on boost::dynamic_bitset
. Oddly, libstdc++ does not
provide those methods for their vector<bool>
, even though the two types’ elements
are laid out in the same way.
STL algorithms for bit iterators
libc++ provides clever specializations of certain STL algorithms for bit iterators
specifically, so that they can exploit full-word-length instructions instead of
extracting “elements” bit by bit. For example, std::find
can use rep bsfq
,
and std::count
can use popcntq
.
The full list of algorithms that libc++ optimizes is:
std::find
,
count
,
fill
, fill_n
,
copy
,
copy_backward
,
move
, move_backward
,
swap_ranges
,
rotate
, and
equal
.
Oddly, as of this writing libc++ does not optimize std::mismatch
for bit iterators, even though
mismatch
can be considered a building block of equal
. Maybe a mismatch
optimization will be added
along the way to implementing vector<bool>::operator<=>
.
Even more oddly, libc++ fails to optimize
copy_n
, so in Clang 15 std::copy(first, first+n, dest)
is 1000x faster
than std::copy_n(first, n, dest)
when first
is a bit iterator.
I’ve submitted a patch for the issue and expect it’ll be fixed soon.
You might expect libraries also to specialize
std::all_of
,any_of
, andnone_of
. The problem with those algorithms is that they don’t default theirPredicate
argument: you must provide a predicate, makingany_of
more likecount_if
than likecount
. Thus, on libc++,std::count(v.begin(), v.end(), false) == 0
will run much faster thanstd::ranges::all_of(v, std::identity())
.
Microsoft STL optimizes fill
, fill_n
, find
, and count
for bit iterators.
(Search the code
for uses of _Is_vb_iterator
.)
libstdc++ optimizes only std::fill
for bit iterators.
From the vantage point of 2022, you might think that std::bitset<N>
is to std::vector<bool>
in the same way that std::array<T, N>
is to std::vector<T>
. That is, it uses stack storage
with a fixed size instead of dynamic storage, but ought to provide basically the same iterator API
otherwise. Sadly, this is not the case!
std::bitset
is one of those “not quite really STL” types, like std::string
and std::valarray
,
that tries to fit a couple of use-cases but none of them are “STL iterable container.” Arguably, the
hint is in its name: whereas vector<bool>
is clearly a vector, a sequence container, of boolean values,
bitset
is a set, a collection of small integer indices. bs.test(42)
is the equivalent of s.find(42)
;
bs.set(42)
is the equivalent of s.insert(42)
; and so on. (As usual, these named methods throw
std::out_of_range
on error, and bitset
provides operator[]
for faster unchecked access.)
If you were going to iterate over the “contents” of such an object, what would you expect to see?
std::bitset<1000> bs;
bs.set(42);
for (auto elt : bs) {
std::cout << bs;
}
Surely you wouldn’t expect to see “false, false, false, false, …”!
So std::bitset
(and boost::dynamic_bitset
too) simply don’t provide iterators at all.
Instead, dynamic_bitset
provides find_first
and find_next
, and libstdc++’s bitset
provides secret _Uglified
versions of the same methods.
Like a Python set, bitset
provides overloaded &
, |
, ^
, ~
that perform intersection,
union, symmetric-difference, and invert operations. But then, as if the type author were free-associating
on the other meanings of those operators, it goes on to provide <<
and >>
. Which makes no sense at all
if you’re thinking of bitset
either as a sequence container of bits (like vector<bool>
) or a set of
indices (as we were just doing in the previous paragraph), but perfect sense if you’re thinking of
it as an integer type.
std::bitset<N>
ends up behaving similarly to an integer type with N
bits… except that it
can’t do the full suite of arithmetic operations out of the box. You can still implement those operations
tediously by hand. (Thanks to Maciej Hehl on StackOverflow for this code.
Note that adding operators to a type we don’t own, such as bitset
, is very ill-advised in practice;
we’d do better to name this function plus
, or wrap the bitset
s in a class of our own devising.)
template<size_t N>
auto operator+(const std::bitset<N>& a, const std::bitset<N>& b) {
auto carry = a & b;
auto result = a ^ b;
while (carry.any()) {
auto shifted = carry << 1;
carry = result & shifted;
result ^= shifted;
}
return result;
}
For a move-semantic type like
string
orset<int>
, we’d want to passa
by value and accumulate the result ina
instead of making a whole new copy. But for giant trivial stack-storage types likebitset
andarray
, move buys us nothing and NRVO buys us much. So, for these types, we pass by reference and return in ways that can be NRVO’ed.
Conclusion
There’s not much to conclude here, except that C++ currently has a bunch of ways
to do bit-strings, and none of them are particularly well crafted.
vector<bool>
is iterable but not arithmetic’able; bitset<N>
is arithmetic’able
but fails to be iterable or even resizable; different library vendors provide vastly
different performance profiles for STL algorithms on bit iterators.
Madhur and I produced a benchmark (backup) comparing all these different ways of
“find-next’ing” in a bit-string of a million elements. Here
are the results on libc++; you can see that libc++’s std::find
is a huge winner.
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
VectorBoolManual 1862 ns 1862 ns 100000
VectorBoolStdFind 69 ns 44 ns 20286454
BitsetManual 7300 ns 2357 ns 100000
DynBitsetManual 6864 ns 2876 ns 252099
DynBitsetFindnext 73 ns 41 ns 15469004
Here
are the results on libstdc++; note that libstdc++’s std::find
is a huge loser,
even compared to manual iteration:
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
VectorBoolManual 2017 ns 2016 ns 100000
VectorBoolStdFind 8323 ns 3479 ns 100000
BitsetManual 3532 ns 3481 ns 100000
BitsetFindnext 57 ns 30 ns 18880905
DynBitsetManual 3611 ns 2099 ns 331986
DynBitsetFindnext 47 ns 30 ns 29609779
These benchmark numbers were prooduced on VMs at godbolt.org, so they should be taken with a grain of salt especially when comparing across the two benchmark runs. But no amount of salt will erase a 100x speedup or slowdown.