C++ Lock-Free Design: How It Works, STL Usage, Scenarios, and Examples

Lock-free programming eliminates mutexes and locks, using atomic operations instead. This approach can provide better performance and avoid deadlocks, but requires careful design and understanding of memory ordering.

Table of Contents

  1. What is Lock-Free Design?
  2. How Lock-Free Works
  3. STL Atomic Operations
  4. Memory Ordering
  5. Common Scenarios
  6. Lock-Free Data Structures
  7. Templates and Examples
  8. Best Practices and Pitfalls

What is Lock-Free Design?

Lock-free programming uses atomic operations instead of mutexes to achieve thread safety. A data structure or algorithm is lock-free if:

  • Progress guarantee: At least one thread makes progress
  • No blocking: Threads don’t wait for locks
  • Atomic operations: Uses hardware-level atomic instructions

Lock-Free vs Lock-Based

// Lock-based (using mutex)
class LockBasedCounter {
    int count_ = 0;
    mutex mtx_;
public:
    void increment() {
        lock_guard<mutex> lock(mtx_);  // Blocks other threads
        ++count_;
    }
};

// Lock-free (using atomic)
class LockFreeCounter {
    atomic<int> count_{0};
public:
    void increment() {
        count_.fetch_add(1);  // No blocking!
    }
};

Benefits

  • No deadlocks: No locks means no deadlock risk
  • Better performance: Avoids lock contention
  • Scalability: Works well with many threads
  • Responsiveness: No thread blocking

Trade-offs

  • Complexity: Harder to design and verify
  • ABA problem: Value changes but appears same
  • Memory ordering: Requires understanding of memory models
  • Limited operations: Not all operations can be lock-free

How Lock-Free Works

Atomic Operations

Atomic operations are indivisible - they either complete entirely or not at all:

#include <atomic>
using namespace std;

atomic<int> counter{0};

// These are atomic (indivisible):
counter.store(42);           // Write
int val = counter.load();    // Read
counter.fetch_add(1);        // Read-modify-write

Compare-and-Swap (CAS)

The fundamental operation for lock-free programming:

atomic<int> value{10};

int expected = 10;
bool success = value.compare_exchange_weak(expected, 20);
// If value == expected, set to 20 and return true
// Otherwise, set expected to current value and return false

Lock-Free Progress Guarantees

  1. Wait-free: Every operation completes in bounded steps
  2. Lock-free: At least one thread makes progress
  3. Obstruction-free: Thread makes progress if no contention

STL Atomic Operations

std::atomic

The foundation of lock-free programming:

#include <atomic>
using namespace std;

// Atomic integer
atomic<int> counter{0};

// Atomic pointer
atomic<int*> ptr{nullptr};

// Atomic boolean
atomic<bool> flag{false};

// Atomic custom type (if trivially copyable)
struct Point { int x, y; };
atomic<Point> point{{0, 0}};

Atomic Operations

Load and Store

atomic<int> value{42};

// Load (read)
int read = value.load(memory_order_acquire);

// Store (write)
value.store(100, memory_order_release);

// Exchange (read and write atomically)
int old = value.exchange(50);

Read-Modify-Write

atomic<int> counter{0};

// Add
counter.fetch_add(5);        // counter += 5, return old value
counter += 5;                // Same, but return new value

// Subtract
counter.fetch_sub(3);

// Bitwise operations
atomic<int> flags{0};
flags.fetch_and(0xFF);       // flags &= 0xFF
flags.fetch_or(0x01);        // flags |= 0x01
flags.fetch_xor(0x10);       // flags ^= 0x10

Compare-and-Swap

atomic<int> value{10};

// Weak version (may fail spuriously)
int expected = 10;
bool success = value.compare_exchange_weak(expected, 20);
// If value == 10, set to 20
// Otherwise, expected = current value

// Strong version (never fails spuriously)
int expected2 = 10;
bool success2 = value.compare_exchange_strong(expected2, 20);

// Typical pattern in loop
int expected = value.load();
do {
    int desired = expected + 1;
} while (!value.compare_exchange_weak(expected, desired));

Memory Ordering

Memory ordering controls how atomic operations synchronize with other memory operations:

Memory Order Options

// Sequential consistency (default, strongest)
atomic<int> x{0};
x.store(1, memory_order_seq_cst);

// Acquire (for loads)
int val = x.load(memory_order_acquire);

// Release (for stores)
x.store(1, memory_order_release);

// Acquire-Release (for RMW operations)
x.fetch_add(1, memory_order_acq_rel);

// Relaxed (weakest, no synchronization)
x.store(1, memory_order_relaxed);

Memory Order Semantics

  1. memory_order_relaxed: No ordering constraints
  2. memory_order_acquire: Loads can’t be reordered before this
  3. memory_order_release: Stores can’t be reordered after this
  4. memory_order_acq_rel: Both acquire and release
  5. memory_order_seq_cst: Sequential consistency (default)

Example: Release-Acquire Synchronization

atomic<bool> ready{false};
int data = 0;

// Thread 1: Producer
void producer() {
    data = 42;  // Non-atomic write
    ready.store(true, memory_order_release);  // Release store
}

// Thread 2: Consumer
void consumer() {
    while (!ready.load(memory_order_acquire)) {  // Acquire load
        // Wait
    }
    // data is guaranteed to be 42 here
    cout << data << endl;
}

Common Scenarios

Scenario 1: Lock-Free Counter

class LockFreeCounter {
private:
    atomic<int> count_{0};

public:
    void increment() {
        count_.fetch_add(1, memory_order_relaxed);
    }

    void decrement() {
        count_.fetch_sub(1, memory_order_relaxed);
    }

    int get() const {
        return count_.load(memory_order_acquire);
    }

    int getAndReset() {
        return count_.exchange(0, memory_order_acq_rel);
    }
};

Scenario 2: Lock-Free Flag

class LockFreeFlag {
private:
    atomic<bool> flag_{false};

public:
    void set() {
        flag_.store(true, memory_order_release);
    }

    bool test() {
        return flag_.load(memory_order_acquire);
    }

    bool testAndSet() {
        return flag_.exchange(true, memory_order_acq_rel);
    }

    void clear() {
        flag_.store(false, memory_order_release);
    }
};

Scenario 3: Lock-Free Stack

template<typename T>
class LockFreeStack {
private:
    struct Node {
        T data;
        atomic<Node*> next;
        Node(const T& d) : data(d), next(nullptr) {}
    };

    atomic<Node*> head_{nullptr};

public:
    void push(const T& data) {
        Node* new_node = new Node(data);
        Node* old_head = head_.load();
        do {
            new_node->next = old_head;
        } while (!head_.compare_exchange_weak(old_head, new_node));
    }

    bool pop(T& result) {
        Node* old_head = head_.load();
        do {
            if (old_head == nullptr) {
                return false;
            }
        } while (!head_.compare_exchange_weak(old_head, old_head->next));
        
        result = old_head->data;
        delete old_head;
        return true;
    }
};

Scenario 4: Lock-Free Queue

template<typename T>
class LockFreeQueue {
private:
    struct Node {
        atomic<T*> data{nullptr};
        atomic<Node*> next{nullptr};
    };

    atomic<Node*> head_{new Node};
    atomic<Node*> tail_{head_.load()};

public:
    void enqueue(const T& item) {
        Node* new_node = new Node;
        T* data = new T(item);
        
        Node* prev_tail = tail_.exchange(new_node, memory_order_acq_rel);
        prev_tail->data.store(data, memory_order_release);
        prev_tail->next.store(new_node, memory_order_release);
    }

    bool dequeue(T& result) {
        Node* head = head_.load();
        Node* next = head->next.load();
        
        if (next == nullptr) {
            return false;  // Empty
        }
        
        T* data = next->data.load();
        if (data == nullptr) {
            return false;  // Not ready yet
        }
        
        result = *data;
        head_.store(next);
        delete data;
        delete head;
        return true;
    }
};

Lock-Free Data Structures

Lock-Free Single-Producer Single-Consumer Queue

template<typename T>
class SPSCQueue {
private:
    struct Node {
        T data;
        atomic<bool> ready{false};
    };

    vector<Node> buffer_;
    atomic<size_t> write_pos_{0};
    atomic<size_t> read_pos_{0};
    size_t capacity_;

public:
    SPSCQueue(size_t capacity) : capacity_(capacity) {
        buffer_.resize(capacity);
    }

    bool push(const T& item) {
        size_t pos = write_pos_.load(memory_order_relaxed);
        size_t next_pos = (pos + 1) % capacity_;
        
        if (next_pos == read_pos_.load(memory_order_acquire)) {
            return false;  // Full
        }
        
        buffer_[pos].data = item;
        buffer_[pos].ready.store(true, memory_order_release);
        write_pos_.store(next_pos, memory_order_release);
        return true;
    }

    bool pop(T& result) {
        size_t pos = read_pos_.load(memory_order_relaxed);
        
        if (pos == write_pos_.load(memory_order_acquire)) {
            return false;  // Empty
        }
        
        if (!buffer_[pos].ready.load(memory_order_acquire)) {
            return false;  // Not ready
        }
        
        result = buffer_[pos].data;
        buffer_[pos].ready.store(false, memory_order_release);
        read_pos_.store((pos + 1) % capacity_, memory_order_release);
        return true;
    }
};

Templates and Examples

Generic Lock-Free Counter Template

template<typename T>
class LockFreeCounter {
private:
    atomic<T> count_{0};

public:
    void increment(T delta = 1) {
        count_.fetch_add(delta, memory_order_relaxed);
    }

    void decrement(T delta = 1) {
        count_.fetch_sub(delta, memory_order_relaxed);
    }

    T get() const {
        return count_.load(memory_order_acquire);
    }

    T addAndGet(T delta) {
        return count_.fetch_add(delta, memory_order_acq_rel) + delta;
    }

    T getAndAdd(T delta) {
        return count_.fetch_add(delta, memory_order_acq_rel);
    }
};

// Usage
LockFreeCounter<int> int_counter;
LockFreeCounter<long> long_counter;

Lock-Free Reference Counting

template<typename T>
class LockFreeSharedPtr {
private:
    struct ControlBlock {
        atomic<int> ref_count_{1};
        T* ptr_;
        
        ControlBlock(T* p) : ptr_(p) {}
        
        void add_ref() {
            ref_count_.fetch_add(1, memory_order_relaxed);
        }
        
        bool release() {
            if (ref_count_.fetch_sub(1, memory_order_acq_rel) == 1) {
                delete ptr_;
                delete this;
                return true;
            }
            return false;
        }
    };

    ControlBlock* control_;

public:
    explicit LockFreeSharedPtr(T* ptr) 
        : control_(new ControlBlock(ptr)) {}

    LockFreeSharedPtr(const LockFreeSharedPtr& other) 
        : control_(other.control_) {
        if (control_) {
            control_->add_ref();
        }
    }

    ~LockFreeSharedPtr() {
        if (control_) {
            control_->release();
        }
    }

    T& operator*() { return *control_->ptr_; }
    T* operator->() { return control_->ptr_; }
};

Complete Example: Lock-Free Producer-Consumer

#include <iostream>
#include <thread>
#include <atomic>
#include <vector>
using namespace std;

template<typename T, size_t Size>
class LockFreeRingBuffer {
private:
    array<T, Size> buffer_;
    atomic<size_t> write_pos_{0};
    atomic<size_t> read_pos_{0};

public:
    bool push(const T& item) {
        size_t current_write = write_pos_.load(memory_order_relaxed);
        size_t next_write = (current_write + 1) % Size;
        
        if (next_write == read_pos_.load(memory_order_acquire)) {
            return false;  // Full
        }
        
        buffer_[current_write] = item;
        write_pos_.store(next_write, memory_order_release);
        return true;
    }

    bool pop(T& item) {
        size_t current_read = read_pos_.load(memory_order_relaxed);
        
        if (current_read == write_pos_.load(memory_order_acquire)) {
            return false;  // Empty
        }
        
        item = buffer_[current_read];
        read_pos_.store((current_read + 1) % Size, memory_order_release);
        return true;
    }
};

void lockFreeProducerConsumer() {
    LockFreeRingBuffer<int, 100> buffer;
    atomic<bool> done{false};

    // Producer
    thread producer([&]() {
        for (int i = 0; i < 50; ++i) {
            while (!buffer.push(i)) {
                this_thread::yield();
            }
        }
        done = true;
    });

    // Consumer
    thread consumer([&]() {
        int value;
        while (!done || buffer.pop(value)) {
            if (buffer.pop(value)) {
                cout << "Consumed: " << value << endl;
            }
        }
    });

    producer.join();
    consumer.join();
}

Best Practices and Pitfalls

1. Understand Memory Ordering

// GOOD: Correct memory ordering
atomic<bool> ready{false};
int data = 0;

void producer() {
    data = 42;
    ready.store(true, memory_order_release);  // Release
}

void consumer() {
    if (ready.load(memory_order_acquire)) {  // Acquire
        // data is guaranteed to be 42
    }
}

// BAD: Wrong memory ordering
void bad_producer() {
    data = 42;
    ready.store(true, memory_order_relaxed);  // No ordering!
}

void bad_consumer() {
    if (ready.load(memory_order_relaxed)) {
        // data might not be 42 yet!
    }
}

2. Handle ABA Problem

// ABA Problem: Value changes A->B->A, but we think it's still A
// Solution: Use version numbers or hazard pointers

template<typename T>
class ABAFreeStack {
    struct Node {
        T data;
        atomic<Node*> next;
        atomic<size_t> version{0};  // Version counter
    };
    // ... implementation with version checking
};

3. Avoid False Sharing

// BAD: False sharing (same cache line)
struct {
    atomic<int> counter1;
    atomic<int> counter2;  // Same cache line!
} counters;

// GOOD: Separate cache lines
alignas(64) atomic<int> counter1;  // 64-byte alignment
alignas(64) atomic<int> counter2;

4. Use Appropriate Atomic Types

// GOOD: Use atomic for shared data
atomic<int> shared_counter{0};

// BAD: Non-atomic shared data
int shared_counter = 0;  // Race condition!

Common Pitfalls

  1. Wrong memory ordering: Using relaxed when acquire/release needed
  2. ABA problem: Not handling value reuse
  3. False sharing: Contention on same cache line
  4. Non-atomic operations: Mixing atomic and non-atomic
  5. Infinite loops: CAS loops without proper exit conditions

Summary

Lock-free design provides high-performance thread-safe operations:

  • Atomic operations: Foundation of lock-free programming
  • Memory ordering: Critical for correctness
  • CAS operations: Enable complex lock-free algorithms
  • Performance: Better scalability than locks
  • Complexity: Requires careful design

Key takeaways:

  • Use std::atomic for shared data
  • Understand memory ordering semantics
  • Use CAS for complex operations
  • Beware of ABA problem
  • Profile to verify performance gains

By mastering lock-free design, you can build high-performance concurrent systems in C++.