Memory Ordering

Memory Ordering

innocentzero

2026-06-15

#c #cpp #memorymodel #multithread #concurrency | Status: Complete

Memory Order enum in C/C++

Some basic things to note:

We have 4 kinds of memory orders in C++. They act as follows:

memory_order_relaxed

These are not synchronization primitives. The only thing they order is atomicity and modification order consistency for atomics.

An example: assume that x and y are initially 0

// Thread 1:
r1 = atomic_load_explicit(y, memory_order_relaxed); // A
atomic_store_explicit(x, r1, memory_order_relaxed); // B
// Thread 2:
r2 = atomic_load_explicit(x, memory_order_relaxed); // C
atomic_store_explicit(y, 42, memory_order_relaxed); // D

This can produce r1 = r2 = 42 as shown

x                         y                      r1                    r2
--------------------------------------------------------------------------    0
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |<---------- D ------------------------------------ 42
|                         |                      |                     |
|                         |                      |                     |
|                         |----------- A ------->|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|<---------------- B ----------------------------|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|-------------------------------------- C ---------------------------->|

or r1 = 0, r2 = 42

x                         y                      r1                    r2
--------------------------------------------------------------------------    0
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |<---------- D ------------------------------------ 42
|                         |                      |                     |
|                         |                      |                     |
|<---------------- B ----------------------------|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |----------- A ------->|                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|                         |                      |                     |
|-------------------------------------- C ---------------------------->|

memory_order_release and memory_order_acquire

These have some sort of synchronization in them. Take two threads A and B. A does an atomic store on some variable x with release semantics, and then thread B reads the value of x with acquire semantics, then B will have all the previous writes, atomic or not, release or not, as a visible effect for itself after that read.

#include <atomic>
#include <cassert>
#include <string>
#include <thread>

std::atomic<std::string*> ptr;
int data;

void producer()
{
    std::string* p = new std::string("Hello"); // A
    data = 42; // B
    ptr.store(p, std::memory_order_release); // C
}

void consumer()
{
    std::string* p2; 
    while (!(p2 = ptr.load(std::memory_order_acquire))) // D
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}

int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join(); t2.join();
}
data                    ptr             p                    p2
 |                       |              |                     |
 |                       |----- E --------------------------->| (in the while loop 
 |                       |              |                     |  it does not proceed
 |                       |              |                     |  and keeps spinning)
 |                       |              |                     |
 |                       |              |                     |
 |                       |              |<-------- A -------------- "Hello"
 |                       |              |                     |
 |<--------------------------- B ---------------------------------- 42
 |                       |              |                     |
 |                       |              |                     |
 |                       |<--- C -------|                     | (this forces the above to 
 |                       |              |                     |  happen)
 |                       |              |                     |
 |                       |              |                     |
 |                       |              |                     |
 |                       |----- E --------------------------->| (this time it succeeds)
 |                       |              |                     |  
 |                       |              |                     | 
 |                       |              |                     |

Another example with three threads and mixing relaxed semantics with release and acquire:

#include <atomic>
#include <cassert>
#include <thread>
#include <vector>

std::vector<int> data;
std::atomic<int> flag = {0};

void thread_1()
{
    data.push_back(42); // A
    flag.store(1, std::memory_order_release); // B
}

void thread_2()
{
    int expected = 1;
    // memory_order_relaxed is okay because this is an RMW,
    // and RMWs (with any ordering) following a release form a release sequence
    while (!flag.compare_exchange_strong(expected, 2, std::memory_order_relaxed)) // C
    {
        expected = 1;
    }
}

void thread_3()
{
    while (flag.load(std::memory_order_acquire) < 2) // D
        ;
    // if we read the value 2 from the atomic flag, we see 42 in the vector
    assert(data.at(0) == 42); // will never fire
}

int main()
{
    std::thread a(thread_1);
    std::thread b(thread_2);
    std::thread c(thread_3);
    a.join(); b.join(); c.join();
}
data        flag            expected
 |           |                 |
 |           |------ C ------->|                    (C keeps on happening, and since its
 |           |                 |<---------------- 1  atomic, it is sequenced with itself
 |           |                 |                     as of now)
 |           |------ D ------------------------->   (comparison keeps happening here)
 |           |                 |
 |<-------------- A ----------------------------- 42 (Happens before B because of release)
 |           |                 |
 |           |<------- B ------------------------ 1  (Makes A visible to every other thread)
 |           |                 |
 |           |                 |
 |           |------ C ------->| (while loop is exited here)
 |           |                 |
 |           |------ D ------------------------->   (comparison exits here)
 |           |                 |                    (assert happens after)

memory_order_seq_cst

This not only does what release and acquire do, but also establishes a single total global memory ordering for all the operations tagged with this semantic.

An example of a program that does that:

#include <atomic>
#include <cassert>
#include <thread>

std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};

void write_x()
{
    x.store(true, std::memory_order_seq_cst);
}

void write_y()
{
    y.store(true, std::memory_order_seq_cst);
}

void read_x_then_y()
{
    while (!x.load(std::memory_order_seq_cst))
        ;
    if (y.load(std::memory_order_seq_cst))
        ++z;
}

void read_y_then_x()
{
    while (!y.load(std::memory_order_seq_cst))
        ;
    if (x.load(std::memory_order_seq_cst))
        ++z;
}

int main()
{
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
    assert(z.load() != 0); // will never happen
}

The key issue is the fact that the writes to the variables happen in different threads. So, any other synchronization only works when the same thread orders them in some way. However, right now, the way c sees it may not be the way d sees it, unless it is seq_cst.