Memory Ordering in C++ and OS support (Linux)
Memory Order enum in C/C++
Some basic things to note:
- All atomic operations are guaranteed to be atomic within themselves. These ones cannot be reordered at all. Also, a binary operation on two atomic vars isn't guaranteed to be atomic in itself. That operation needs to be marked atomic.
- A relaxed operation is just this, it provides no restrictions, it can be anything. For architectures like x86/AMD64 where everything is strongly ordered by the hardware, this boils down to plain, ordinary move (this can be kept in the cache though, which is a whole different story).
- Sequentially consistent means that it enforces strict ordering for everything. Nothing is reorganized, it appears exactly where it is in the flow of memory.
- A release operation prevents ordinary loads or stores from being reordered after the atomic operation. This is kind of akin to releasing a lock except the lock is being held in terms of the loads and stores happening.
- An acquire means that every load and store operation after it has to happen after it. This is like acquiring a lock for the loads and stores.
- A consume is like an acquire except it only does that for dependent data, this may not be sound.
We have 4 kinds of memory orders in C++. They act as follows:
memory_order_relaxed
: These are not synchronization primitives. The only thing they order is atomicity and modification order consistency for atomics.An example: assume that
x
andy
are initially 0// Thread 1: r1 = atomic_load_explicit(y, memory_order_relaxed); // A atomic_store_explicit(x, r1, memory_order_relaxed); // B // Thread 2: r2 = atomic_load_explicit(x, memory_order_relaxed); // C atomic_store_explicit(y, 42, memory_order_relaxed); // D
This can produce r1 = r2 = 42 as shown
x y r1 r2 -------------------------------------------------------------------------- 0 | | | | | | | | | | | | | |<---------- D ------------------------------------ 42 | | | | | | | | | |----------- A ------->| | | | | | | | | | | | | | |<---------------- B ----------------------------| | | | | | | | | | | | | | |-------------------------------------- C ---------------------------->|
or r1 = 0, r2 = 42
x y r1 r2 -------------------------------------------------------------------------- 0 | | | | | | | | | | | | | |<---------- D ------------------------------------ 42 | | | | | | | | |<---------------- B ----------------------------| | | | | | | | | | | |----------- A ------->| | | | | | | | | | | | | | | | | | |-------------------------------------- C ---------------------------->|
memory_order_release
andmemory_order_acquire
: These have some sort of synchronization in them. Take two threads A and B. A does an atomic store on some variablex
with release semantics, and then thread B reads the value ofx
with acquire semantics, then B will have all the previous writes, atomic or not, release or not, as a visible effect for itself after that read.#include <atomic> #include <cassert> #include <string> #include <thread> std::atomic<std::string*> ptr; int data; void producer() { std::string* p = new std::string("Hello"); // A data = 42; // B ptr.store(p, std::memory_order_release); // C } void consumer() { std::string* p2; while (!(p2 = ptr.load(std::memory_order_acquire))) // D ; assert(*p2 == "Hello"); // never fires assert(data == 42); // never fires } int main() { std::thread t1(producer); std::thread t2(consumer); t1.join(); t2.join(); }
data ptr p p2 | | | | | |----- E --------------------------->| (in the while loop | | | | it does not proceed | | | | and keeps spinning) | | | | | | | | | | |<-------- A -------------- "Hello" | | | | |<--------------------------- B ---------------------------------- 42 | | | | | | | | | |<--- C -------| | (this forces the above to | | | | happen) | | | | | | | | | | | | | |----- E --------------------------->| (this time it succeeds) | | | | | | | | | | | |
Another example with three threads and mixing relaxed semantics with release and acquire:
#include <atomic> #include <cassert> #include <thread> #include <vector> std::vector<int> data; std::atomic<int> flag = {0}; void thread_1() { data.push_back(42); // A flag.store(1, std::memory_order_release); // B } void thread_2() { int expected = 1; // memory_order_relaxed is okay because this is an RMW, // and RMWs (with any ordering) following a release form a release sequence while (!flag.compare_exchange_strong(expected, 2, std::memory_order_relaxed)) // C { expected = 1; } } void thread_3() { while (flag.load(std::memory_order_acquire) < 2) // D ; // if we read the value 2 from the atomic flag, we see 42 in the vector assert(data.at(0) == 42); // will never fire } int main() { std::thread a(thread_1); std::thread b(thread_2); std::thread c(thread_3); a.join(); b.join(); c.join(); }
data flag expected | | | | |------ C ------->| (C keeps on happening, and since its | | |<---------------- 1 atomic, it is sequenced with itself | | | as of now) | |------ D -------------------------> (comparison keeps happening here) | | | |<-------------- A ----------------------------- 42 (Happens before B because of release) | | | | |<------- B ------------------------ 1 (Makes A visible to every other thread) | | | | | | | |------ C ------->| (while loop is exited here) | | | | |------ D -------------------------> (comparison exits here) | | | (assert happens after)
memory_order_seq_cst
: This not only does what release and acquire do, but also establishes a single total global memory ordering for all the operations tagged with this semantic.