Memory Ordering in C++ and OS support (Linux)
Memory Order enum in C/C++
Some basic things to note:
- All atomic operations are guaranteed to be atomic within themselves. These ones cannot be reordered at all. Also, a binary operation on two atomic vars isn't guaranteed to be atomic in itself. That operation needs to be marked atomic.
- A relaxed operation is just this, it provides no restrictions, it can be anything. For architectures like x86/AMD64 where everything is strongly ordered by the hardware, this boils down to plain, ordinary move (this can be kept in the cache though, which is a whole different story).
- Sequentially consistent means that it enforces strict ordering for everything. Nothing is reorganized, it appears exactly where it is in the flow of memory.
- A release operation prevents ordinary loads or stores from being reordered after the atomic operation. This is kind of akin to releasing a lock except the lock is being held in terms of the loads and stores happening.
- An acquire means that every load and store operation after it has to happen after it. This is like acquiring a lock for the loads and stores.
- A consume is like an acquire except it only does that for dependent data, this may not be sound.
We have 4 kinds of memory orders in C++. They act as follows:
memory_order_relaxed
These are not synchronization primitives. The only thing they order is atomicity and modification order consistency for atomics.
An example: assume that x
and y
are initially 0
// Thread 1: r1 = atomic_load_explicit(y, memory_order_relaxed); // A atomic_store_explicit(x, r1, memory_order_relaxed); // B // Thread 2: r2 = atomic_load_explicit(x, memory_order_relaxed); // C atomic_store_explicit(y, 42, memory_order_relaxed); // D
This can produce r1 = r2 = 42 as shown
x y r1 r2 -------------------------------------------------------------------------- 0 | | | | | | | | | | | | | |<---------- D ------------------------------------ 42 | | | | | | | | | |----------- A ------->| | | | | | | | | | | | | | |<---------------- B ----------------------------| | | | | | | | | | | | | | |-------------------------------------- C ---------------------------->|
or r1 = 0, r2 = 42
x y r1 r2 -------------------------------------------------------------------------- 0 | | | | | | | | | | | | | |<---------- D ------------------------------------ 42 | | | | | | | | |<---------------- B ----------------------------| | | | | | | | | | | |----------- A ------->| | | | | | | | | | | | | | | | | | |-------------------------------------- C ---------------------------->|
memory_order_release
and memory_order_acquire
These have some sort of synchronization in them. Take two threads A and B. A does an atomic
store on some variable x
with release semantics, and then thread B reads the value of x
with
acquire semantics, then B will have all the previous writes, atomic or not, release or not,
as a visible effect for itself after that read.
#include <atomic> #include <cassert> #include <string> #include <thread> std::atomic<std::string*> ptr; int data; void producer() { std::string* p = new std::string("Hello"); // A data = 42; // B ptr.store(p, std::memory_order_release); // C } void consumer() { std::string* p2; while (!(p2 = ptr.load(std::memory_order_acquire))) // D ; assert(*p2 == "Hello"); // never fires assert(data == 42); // never fires } int main() { std::thread t1(producer); std::thread t2(consumer); t1.join(); t2.join(); }
data ptr p p2 | | | | | |----- E --------------------------->| (in the while loop | | | | it does not proceed | | | | and keeps spinning) | | | | | | | | | | |<-------- A -------------- "Hello" | | | | |<--------------------------- B ---------------------------------- 42 | | | | | | | | | |<--- C -------| | (this forces the above to | | | | happen) | | | | | | | | | | | | | |----- E --------------------------->| (this time it succeeds) | | | | | | | | | | | |
Another example with three threads and mixing relaxed semantics with release and acquire:
#include <atomic> #include <cassert> #include <thread> #include <vector> std::vector<int> data; std::atomic<int> flag = {0}; void thread_1() { data.push_back(42); // A flag.store(1, std::memory_order_release); // B } void thread_2() { int expected = 1; // memory_order_relaxed is okay because this is an RMW, // and RMWs (with any ordering) following a release form a release sequence while (!flag.compare_exchange_strong(expected, 2, std::memory_order_relaxed)) // C { expected = 1; } } void thread_3() { while (flag.load(std::memory_order_acquire) < 2) // D ; // if we read the value 2 from the atomic flag, we see 42 in the vector assert(data.at(0) == 42); // will never fire } int main() { std::thread a(thread_1); std::thread b(thread_2); std::thread c(thread_3); a.join(); b.join(); c.join(); }
data flag expected | | | | |------ C ------->| (C keeps on happening, and since its | | |<---------------- 1 atomic, it is sequenced with itself | | | as of now) | |------ D -------------------------> (comparison keeps happening here) | | | |<-------------- A ----------------------------- 42 (Happens before B because of release) | | | | |<------- B ------------------------ 1 (Makes A visible to every other thread) | | | | | | | |------ C ------->| (while loop is exited here) | | | | |------ D -------------------------> (comparison exits here) | | | (assert happens after)
memory_order_seq_cst
This not only does what release and acquire do, but also establishes a single total global memory ordering for all the operations tagged with this semantic.
An example of a program that does that:
#include <atomic> #include <cassert> #include <thread> std::atomic<bool> x = {false}; std::atomic<bool> y = {false}; std::atomic<int> z = {0}; void write_x() { x.store(true, std::memory_order_seq_cst); } void write_y() { y.store(true, std::memory_order_seq_cst); } void read_x_then_y() { while (!x.load(std::memory_order_seq_cst)) ; if (y.load(std::memory_order_seq_cst)) ++z; } void read_y_then_x() { while (!y.load(std::memory_order_seq_cst)) ; if (x.load(std::memory_order_seq_cst)) ++z; } int main() { std::thread a(write_x); std::thread b(write_y); std::thread c(read_x_then_y); std::thread d(read_y_then_x); a.join(); b.join(); c.join(); d.join(); assert(z.load() != 0); // will never happen }
The key issue is the fact that the writes to the variables happen in different threads. So,
any other synchronization only works when the same thread orders them in some way. However,
right now, the way c sees it may not be the way d sees it, unless it is seq_cst.