InnocentZero's Treasure Chest

HomeFeedAbout Me

11 Nov 2024

Memory Ordering in C++ and OS support (Linux)

Memory Order enum in C/C++

Some basic things to note:

  • All atomic operations are guaranteed to be atomic within themselves. These ones cannot be reordered at all. Also, a binary operation on two atomic vars isn't guaranteed to be atomic in itself. That operation needs to be marked atomic.
  • A relaxed operation is just this, it provides no restrictions, it can be anything. For architectures like x86/AMD64 where everything is strongly ordered by the hardware, this boils down to plain, ordinary move (this can be kept in the cache though, which is a whole different story).
  • Sequentially consistent means that it enforces strict ordering for everything. Nothing is reorganized, it appears exactly where it is in the flow of memory.
  • A release operation prevents ordinary loads or stores from being reordered after the atomic operation. This is kind of akin to releasing a lock except the lock is being held in terms of the loads and stores happening.
  • An acquire means that every load and store operation after it has to happen after it. This is like acquiring a lock for the loads and stores.
  • A consume is like an acquire except it only does that for dependent data, this may not be sound.

We have 4 kinds of memory orders in C++. They act as follows:

  • memory_order_relaxed: These are not synchronization primitives. The only thing they order is atomicity and modification order consistency for atomics.

    An example: assume that x and y are initially 0

    // Thread 1:
    r1 = atomic_load_explicit(y, memory_order_relaxed); // A
    atomic_store_explicit(x, r1, memory_order_relaxed); // B
    // Thread 2:
    r2 = atomic_load_explicit(x, memory_order_relaxed); // C
    atomic_store_explicit(y, 42, memory_order_relaxed); // D
    

    This can produce r1 = r2 = 42 as shown

    x                         y                      r1                    r2
    --------------------------------------------------------------------------    0
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |<---------- D ------------------------------------ 42
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |----------- A ------->|                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |<---------------- B ----------------------------|                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |-------------------------------------- C ---------------------------->|
    

    or r1 = 0, r2 = 42

    x                         y                      r1                    r2
    --------------------------------------------------------------------------    0
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |<---------- D ------------------------------------ 42
    |                         |                      |                     |
    |                         |                      |                     |
    |<---------------- B ----------------------------|                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |----------- A ------->|                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |                         |                      |                     |
    |-------------------------------------- C ---------------------------->|
    
  • memory_order_release and memory_order_acquire: These have some sort of synchronization in them. Take two threads A and B. A does an atomic store on some variable x with release semantics, and then thread B reads the value of x with acquire semantics, then B will have all the previous writes, atomic or not, release or not, as a visible effect for itself after that read.

    #include <atomic>
    #include <cassert>
    #include <string>
    #include <thread>
    
    std::atomic<std::string*> ptr;
    int data;
    
    void producer()
    {
        std::string* p = new std::string("Hello"); // A
        data = 42; // B
        ptr.store(p, std::memory_order_release); // C
    }
    
    void consumer()
    {
        std::string* p2; 
        while (!(p2 = ptr.load(std::memory_order_acquire))) // D
            ;
        assert(*p2 == "Hello"); // never fires
        assert(data == 42); // never fires
    }
    
    int main()
    {
        std::thread t1(producer);
        std::thread t2(consumer);
        t1.join(); t2.join();
    }
    
    data                    ptr             p                    p2
     |                       |              |                     |
     |                       |----- E --------------------------->| (in the while loop 
     |                       |              |                     |  it does not proceed
     |                       |              |                     |  and keeps spinning)
     |                       |              |                     |
     |                       |              |                     |
     |                       |              |<-------- A -------------- "Hello"
     |                       |              |                     |
     |<--------------------------- B ---------------------------------- 42
     |                       |              |                     |
     |                       |              |                     |
     |                       |<--- C -------|                     | (this forces the above to 
     |                       |              |                     |  happen)
     |                       |              |                     |
     |                       |              |                     |
     |                       |              |                     |
     |                       |----- E --------------------------->| (this time it succeeds)
     |                       |              |                     |  
     |                       |              |                     | 
     |                       |              |                     |
    
    

    Another example with three threads and mixing relaxed semantics with release and acquire:

    #include <atomic>
    #include <cassert>
    #include <thread>
    #include <vector>
    
    std::vector<int> data;
    std::atomic<int> flag = {0};
    
    void thread_1()
    {
        data.push_back(42); // A
        flag.store(1, std::memory_order_release); // B
    }
    
    void thread_2()
    {
        int expected = 1;
        // memory_order_relaxed is okay because this is an RMW,
        // and RMWs (with any ordering) following a release form a release sequence
        while (!flag.compare_exchange_strong(expected, 2, std::memory_order_relaxed)) // C
        {
            expected = 1;
        }
    }
    
    void thread_3()
    {
        while (flag.load(std::memory_order_acquire) < 2) // D
            ;
        // if we read the value 2 from the atomic flag, we see 42 in the vector
        assert(data.at(0) == 42); // will never fire
    }
    
    int main()
    {
        std::thread a(thread_1);
        std::thread b(thread_2);
        std::thread c(thread_3);
        a.join(); b.join(); c.join();
    }
    
    data        flag            expected
     |           |                 |
     |           |------ C ------->|                    (C keeps on happening, and since its
     |           |                 |<---------------- 1  atomic, it is sequenced with itself
     |           |                 |                     as of now)
     |           |------ D ------------------------->   (comparison keeps happening here)
     |           |                 |
     |<-------------- A ----------------------------- 42 (Happens before B because of release)
     |           |                 |
     |           |<------- B ------------------------ 1  (Makes A visible to every other thread)
     |           |                 |
     |           |                 |
     |           |------ C ------->| (while loop is exited here)
     |           |                 |
     |           |------ D ------------------------->   (comparison exits here)
     |           |                 |                    (assert happens after)
    
    
  • memory_order_seq_cst: This not only does what release and acquire do, but also establishes a single total global memory ordering for all the operations tagged with this semantic.
Tags: programming C C++ hardware

Other posts
Creative Commons License
This website by Md Isfarul Haque is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.