11 Nov 2024

Memory Ordering in C++ and OS support (Linux)

Memory Order enum in C/C++

Some basic things to note:

All atomic operations are guaranteed to be atomic within themselves. These ones cannot be reordered at all. Also, a binary operation on two atomic vars isn't guaranteed to be atomic in itself. That operation needs to be marked atomic.
A relaxed operation is just this, it provides no restrictions, it can be anything. For architectures like x86/AMD64 where everything is strongly ordered by the hardware, this boils down to plain, ordinary move (this can be kept in the cache though, which is a whole different story).
Sequentially consistent means that it enforces strict ordering for everything. Nothing is reorganized, it appears exactly where it is in the flow of memory.
A release operation prevents ordinary loads or stores from being reordered after the atomic operation. This is kind of akin to releasing a lock except the lock is being held in terms of the loads and stores happening.
An acquire means that every load and store operation after it has to happen after it. This is like acquiring a lock for the loads and stores.
A consume is like an acquire except it only does that for dependent data, this may not be sound.

We have 4 kinds of memory orders in C++. They act as follows:

memory_order_relaxed: These are not synchronization primitives. The only thing they order is atomicity and modification order consistency for atomics.

An example: assume that x and y are initially 0

// Thread 1:
r1 = atomic_load_explicit(y, memory_order_relaxed); // A
atomic_store_explicit(x, r1, memory_order_relaxed); // B
// Thread 2:
r2 = atomic_load_explicit(x, memory_order_relaxed); // C
atomic_store_explicit(y, 42, memory_order_relaxed); // D

This can produce r1 = r2 = 42 as shown

x                         y                      r1                    r2
--------------------------------------------------------------------------    0
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |<---------- D ------------------------------------ 42
|                         |                      |                     |
|                         |                      |                     |
|                         |----------- A ------->|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|<---------------- B ----------------------------|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|-------------------------------------- C ---------------------------->|

or r1 = 0, r2 = 42

x                         y                      r1                    r2
--------------------------------------------------------------------------    0
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |<---------- D ------------------------------------ 42
|                         |                      |                     |
|                         |                      |                     |
|<---------------- B ----------------------------|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |----------- A ------->|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|-------------------------------------- C ---------------------------->|

memory_order_release and memory_order_acquire: These have some sort of synchronization in them. Take two threads A and B. A does an atomic store on some variable x with release semantics, and then thread B reads the value of x with acquire semantics, then B will have all the previous writes, atomic or not, release or not, as a visible effect for itself after that read.

#include <atomic>
#include <cassert>
#include <string>
#include <thread>

std::atomic<std::string*> ptr;
int data;

void producer()
{
    std::string* p = new std::string("Hello"); // A
    data = 42; // B
    ptr.store(p, std::memory_order_release); // C
}

void consumer()
{
    std::string* p2; 
    while (!(p2 = ptr.load(std::memory_order_acquire))) // D
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}

int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join(); t2.join();
}

data                    ptr             p                    p2
 |                       |              |                     |
 |                       |----- E --------------------------->| (in the while loop 
 |                       |              |                     |  it does not proceed
 |                       |              |                     |  and keeps spinning)
 |                       |              |                     |
 |                       |              |                     |
 |                       |              |<-------- A -------------- "Hello"
 |                       |              |                     |
 |<--------------------------- B ---------------------------------- 42
 |                       |              |                     |
 |                       |              |                     |
 |                       |<--- C -------|                     | (this forces the above to 
 |                       |              |                     |  happen)
 |                       |              |                     |
 |                       |              |                     |
 |                       |              |                     |
 |                       |----- E --------------------------->| (this time it succeeds)
 |                       |              |                     |  
 |                       |              |                     | 
 |                       |              |                     |

Another example with three threads and mixing relaxed semantics with release and acquire:

#include <atomic>
#include <cassert>
#include <thread>
#include <vector>

std::vector<int> data;
std::atomic<int> flag = {0};

void thread_1()
{
    data.push_back(42); // A
    flag.store(1, std::memory_order_release); // B
}

void thread_2()
{
    int expected = 1;
    // memory_order_relaxed is okay because this is an RMW,
    // and RMWs (with any ordering) following a release form a release sequence
    while (!flag.compare_exchange_strong(expected, 2, std::memory_order_relaxed)) // C
    {
        expected = 1;
    }
}

void thread_3()
{
    while (flag.load(std::memory_order_acquire) < 2) // D
        ;
    // if we read the value 2 from the atomic flag, we see 42 in the vector
    assert(data.at(0) == 42); // will never fire
}

int main()
{
    std::thread a(thread_1);
    std::thread b(thread_2);
    std::thread c(thread_3);
    a.join(); b.join(); c.join();
}

data        flag            expected
 |           |                 |
 |           |------ C ------->|                    (C keeps on happening, and since its
 |           |                 |<---------------- 1  atomic, it is sequenced with itself
 |           |                 |                     as of now)
 |           |------ D ------------------------->   (comparison keeps happening here)
 |           |                 |
 |<-------------- A ----------------------------- 42 (Happens before B because of release)
 |           |                 |
 |           |<------- B ------------------------ 1  (Makes A visible to every other thread)
 |           |                 |
 |           |                 |
 |           |------ C ------->| (while loop is exited here)
 |           |                 |
 |           |------ D ------------------------->   (comparison exits here)
 |           |                 |                    (assert happens after)

memory_order_seq_cst: This not only does what release and acquire do, but also establishes a single total global memory ordering for all the operations tagged with this semantic.

InnocentZero's Treasure Chest

Memory Ordering in C++ and OS support (Linux)

Memory Order enum in C/C++