Help wanted: Compile your codebase with P1144 and P2786 relocatability!

At today’s WG21 SG14 (Low Latency) meeting, there was a discussion of P1144 trivial relocatability versus P2786 trivial relocatability. It was remarked that each proposal has a corresponding Clang fork.

So I suggested that anyone interested in relocation could really help us out by downloading both compiler implementations and trying them out on their own codebases! Of course, that means you need to know how to compile them from scratch. Here’s the answer for my P1144 implementation [UPDATE, 2024-04-22:] and for Corentin’s P2786 implementation, as far as I know.

I would love to turn these instructions into Dockerfiles so that you could just build Docker containers containing each Clang, and somehow build your codebase with those Dockerized Clangs. I’ve heard that VS Code actually makes that “easy.” If you do it, I’d love to hear about it. I’ll upload the Dockerfiles here and credit you.

Get the code and build it

cd ~
git clone --depth=20 --single-branch --branch=trivially-relocatable-prod \
    https://github.com/Quuxplusone/llvm-project p1144-llvm-project
cd p1144-llvm-project
git checkout origin/trivially-relocatable-prod
mkdir build
cd build
cmake -G Ninja \
    -DDEFAULT_SYSROOT="$(xcrun --show-sdk-path)" \
    -DLLVM_ENABLE_PROJECTS="clang" \
    -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
    -DLLVM_TARGETS_TO_BUILD="AArch64;X86" \
    -DCMAKE_BUILD_TYPE=Release ../llvm
ninja clang
ninja cxx

For Corentin’s P2786 branch, I believe the same approach should work; simply substitute these lines above:

git clone --depth=20 --single-branch --branch=corentin/trivially_relocatable \
    https://github.com/cor3ntin/llvm-project p2786-llvm-project
cd p2786-llvm-project
git checkout origin/corentin/trivially_relocatable
mkdir build
[etc]

Optionally, run tests

ninja check-clang
ninja check-cxx

Do the new features compile and link?

For P1144

cat >hello.cpp <<EOF
 #include <stdio.h>
 #include <memory>
 #include <tuple>
 #include <type_traits>
 static_assert(std::is_trivially_relocatable<std::unique_ptr<int>>::value, "");
 struct RuleOfZero { std::unique_ptr<int> p_; };
 static_assert(std::is_trivially_relocatable<RuleOfZero>::value, "");
 struct [[trivially_relocatable]] Wrap0 {
   static_assert(not std::is_trivially_relocatable<std::tuple<int&>>::value, "");
   std::tuple<int&> t_;
 };
 static_assert(std::is_trivially_relocatable<Wrap0>::value, "");
 template<class T> void open_window(T *s, int n, int k) {
   std::uninitialized_relocate_backward(s, s+n, s+n+k);
 }
 template<class T> void close_window(T *s, int n, int k) {
   std::uninitialized_relocate(s+n, s+n+k, s);
 }
 template void open_window(Wrap0*, int, int);
 template void close_window(Wrap0*, int, int);
 int main() { puts("Success!"); }
EOF
~/p1144-llvm-project/build/bin/clang++ -std=c++11 hello.cpp -o ./a.out
./a.out

The above should compile fine, and print “Success!”

~/p1144-llvm-project/build/bin/clang++ -std=c++11 hello.cpp -S -o - | grep memmove

The above should find two memmoves: one in open_window and one in close_window.

For P2786

cat >hello.cpp <<EOF
 #include <memory>
 #include <type_traits>
 struct Wrap0 trivially_relocatable {
   static_assert(not std::is_trivially_relocatable<std::unique_ptr<int>>::value, "");
   std::unique_ptr<int> t_;
 };
EOF
~/p2786-llvm-project/build/bin/clang++ -std=c++20 hello.cpp -o ./a.out

The above should error out with this diagnostic:

hello.cpp:3:15: error: invalid 'trivially_relocatable' specifier on non trivially-relocatable class 'Wrap0'
    3 |  struct Wrap0 trivially_relocatable {
      |               ^
hello.cpp:5:4: note: because it has a non trivially-relocatable member 't_'
    5 |    std::unique_ptr<int> t_;
      |    ^~~~~~~~~~~~~~~~~~~~~~~

Now try:

cat >hello.cpp <<EOF
 #include <stdio.h>
 #include <memory>
 #include <tuple>
 #include <type_traits>
 static_assert(std::is_trivially_relocatable<std::tuple<int&>>::value, "");
 struct RuleOfZero { std::tuple<int&> p_; };
 static_assert(std::is_trivially_relocatable<RuleOfZero>::value, "");
 struct Poly {
    virtual int f() = 0;
    virtual ~Poly() = default;
 };
 template<class T> void open_window(T *s, int n, int k) {
   std::trivially_relocate(s, s+n, s+n+k);
 }
 template void open_window(Poly*, int, int);
 template void open_window(RuleOfZero*, int, int);
 int main() { puts("Success!"); }
EOF
~/p2786-llvm-project/build/bin/clang++ -std=c++20 hello.cpp -o ./a.out
./a.out

The above should compile fine, and print “Success!”

Plug it into your own code!

If your codebase compiles with Clang trunk and libc++ trunk, it will also compile fine with the P1144-enabled Clang and libc++. There is no minimum -std=c++XX version required.

If your codebase requires GNU libstdc++ instead of libc++, then you’ll also want to download and build my libstdc++ fork, in order to get the full experience. I don’t have good instructions for that, though, sorry.

If your codebase requires GCC instead of Clang, then I can’t (yet) help you. I’m offering a large bounty to whomever produces a GCC fork that implements P1144! GCC hackers, email me for details.


If you’re using my P1144-enabled libc++, you’ll see some containers and algorithms immediately get faster. Everywhere you’d expect relocation to be used, it’s used. Notably, vector::insert uses the equivalent of our test code’s open_window function, and vector::erase uses the equivalent of close_window. P1144 learned this pattern from folly::fbvector; you can see the fully evolved pattern in amc::Vector.

With libstdc++, you won’t see much difference in speed, because I haven’t implemented many relocation optimizations in libstdc++.

Suppose your codebase has a lot of Rule-of-Five types that you’d like the STL to relocate “as if by memcpy.” Then you can use P1144’s new standard attribute as shown in our test code’s struct Wrap0. But since other compilers don’t recognize that attribute (because it’s not standard), you’ll prefer to use the vendor-specific attribute:

struct [[clang::trivially_relocatable]] S {
    ~~~~
};

If your codebase has its own containers and algorithms that would benefit from relocating trivially relocatable types “as if by memcpy,” then you can express that desire using the P1144 library API:

  • std::is_trivially_relocatable<T>
  • std::uninitialized_relocate(first, last, dfirst)
  • std::uninitialized_relocate_n(first, n, dfirst)
  • std::uninitialized_relocate_backward(first, last, dlast)
  • std::relocate_at(psource, pdest)
  • T std::relocate(psource)

For this purpose, Stéphane Janel’s AMC is a great library to study: he’s implemented so many optimizations, very cleanly, in terms of P1144’s library functions.

But since other STLs don’t provide these functions (because they’re not standard), you’ll prefer to implement them yourself in your own namespace and just do something like:

#ifdef __cpp_lib_trivially_relocatable
  using std::is_trivially_relocatable;
  using std::uninitialized_relocate;
  ~~~~
#endif

to get the as-if-by-memcpy benefits on STLs that do support P1144.

Or, instead of using-declarations, you could implement your namespace’s functions as wrappers around the P1144 functions whenever they exist, as HPX does.

Let us know how it went!

Finally, the most important step: Send an email to me (at arthur.j.odwyer@gmail.com) and/or SG14 (at sg14@lists.isocpp.org) and describe your experience! The important questions are all about usability and fitness-for-purpose:

  • Was switching compilers and libraries sufficiently seamless, or did something break?

  • If you tried annotating your own types with [[clang::trivially_relocatable]], how did that go?

  • If you tried using (or wrapping) the library interface, how did that go?

  • Did you see a change in performance?


And then, of course, repeat the process with the Clang fork implementing P2786, and let us know how that went, too!

Posted 2024-04-10