C++ Lock-Free Design: How It Works, STL Usage, Scenarios, and Examples
C++ Lock-Free Design: How It Works, STL Usage, Scenarios, and Examples
Lock-free programming eliminates mutexes and locks, using atomic operations instead. This approach can provide better performance and avoid deadlocks, but requires careful design and understanding of memory ordering.
Table of Contents
- What is Lock-Free Design?
- How Lock-Free Works
- STL Atomic Operations
- Memory Ordering
- Common Scenarios
- Lock-Free Data Structures
- Templates and Examples
- Best Practices and Pitfalls
What is Lock-Free Design?
Lock-free programming uses atomic operations instead of mutexes to achieve thread safety. A data structure or algorithm is lock-free if:
- Progress guarantee: At least one thread makes progress
- No blocking: Threads don’t wait for locks
- Atomic operations: Uses hardware-level atomic instructions
Lock-Free vs Lock-Based
// Lock-based (using mutex)
class LockBasedCounter {
int count_ = 0;
mutex mtx_;
public:
void increment() {
lock_guard<mutex> lock(mtx_); // Blocks other threads
++count_;
}
};
// Lock-free (using atomic)
class LockFreeCounter {
atomic<int> count_{0};
public:
void increment() {
count_.fetch_add(1); // No blocking!
}
};
Benefits
- No deadlocks: No locks means no deadlock risk
- Better performance: Avoids lock contention
- Scalability: Works well with many threads
- Responsiveness: No thread blocking
Trade-offs
- Complexity: Harder to design and verify
- ABA problem: Value changes but appears same
- Memory ordering: Requires understanding of memory models
- Limited operations: Not all operations can be lock-free
How Lock-Free Works
Atomic Operations
Atomic operations are indivisible - they either complete entirely or not at all:
#include <atomic>
using namespace std;
atomic<int> counter{0};
// These are atomic (indivisible):
counter.store(42); // Write
int val = counter.load(); // Read
counter.fetch_add(1); // Read-modify-write
Compare-and-Swap (CAS)
The fundamental operation for lock-free programming:
atomic<int> value{10};
int expected = 10;
bool success = value.compare_exchange_weak(expected, 20);
// If value == expected, set to 20 and return true
// Otherwise, set expected to current value and return false
Lock-Free Progress Guarantees
- Wait-free: Every operation completes in bounded steps
- Lock-free: At least one thread makes progress
- Obstruction-free: Thread makes progress if no contention
STL Atomic Operations
std::atomic
The foundation of lock-free programming:
#include <atomic>
using namespace std;
// Atomic integer
atomic<int> counter{0};
// Atomic pointer
atomic<int*> ptr{nullptr};
// Atomic boolean
atomic<bool> flag{false};
// Atomic custom type (if trivially copyable)
struct Point { int x, y; };
atomic<Point> point{{0, 0}};
Atomic Operations
Load and Store
atomic<int> value{42};
// Load (read)
int read = value.load(memory_order_acquire);
// Store (write)
value.store(100, memory_order_release);
// Exchange (read and write atomically)
int old = value.exchange(50);
Read-Modify-Write
atomic<int> counter{0};
// Add
counter.fetch_add(5); // counter += 5, return old value
counter += 5; // Same, but return new value
// Subtract
counter.fetch_sub(3);
// Bitwise operations
atomic<int> flags{0};
flags.fetch_and(0xFF); // flags &= 0xFF
flags.fetch_or(0x01); // flags |= 0x01
flags.fetch_xor(0x10); // flags ^= 0x10
Compare-and-Swap
atomic<int> value{10};
// Weak version (may fail spuriously)
int expected = 10;
bool success = value.compare_exchange_weak(expected, 20);
// If value == 10, set to 20
// Otherwise, expected = current value
// Strong version (never fails spuriously)
int expected2 = 10;
bool success2 = value.compare_exchange_strong(expected2, 20);
// Typical pattern in loop
int expected = value.load();
do {
int desired = expected + 1;
} while (!value.compare_exchange_weak(expected, desired));
Memory Ordering
Memory ordering controls how atomic operations synchronize with other memory operations:
Memory Order Options
// Sequential consistency (default, strongest)
atomic<int> x{0};
x.store(1, memory_order_seq_cst);
// Acquire (for loads)
int val = x.load(memory_order_acquire);
// Release (for stores)
x.store(1, memory_order_release);
// Acquire-Release (for RMW operations)
x.fetch_add(1, memory_order_acq_rel);
// Relaxed (weakest, no synchronization)
x.store(1, memory_order_relaxed);
Memory Order Semantics
memory_order_relaxed: No ordering constraintsmemory_order_acquire: Loads can’t be reordered before thismemory_order_release: Stores can’t be reordered after thismemory_order_acq_rel: Both acquire and releasememory_order_seq_cst: Sequential consistency (default)
Example: Release-Acquire Synchronization
atomic<bool> ready{false};
int data = 0;
// Thread 1: Producer
void producer() {
data = 42; // Non-atomic write
ready.store(true, memory_order_release); // Release store
}
// Thread 2: Consumer
void consumer() {
while (!ready.load(memory_order_acquire)) { // Acquire load
// Wait
}
// data is guaranteed to be 42 here
cout << data << endl;
}
Common Scenarios
Scenario 1: Lock-Free Counter
class LockFreeCounter {
private:
atomic<int> count_{0};
public:
void increment() {
count_.fetch_add(1, memory_order_relaxed);
}
void decrement() {
count_.fetch_sub(1, memory_order_relaxed);
}
int get() const {
return count_.load(memory_order_acquire);
}
int getAndReset() {
return count_.exchange(0, memory_order_acq_rel);
}
};
Scenario 2: Lock-Free Flag
class LockFreeFlag {
private:
atomic<bool> flag_{false};
public:
void set() {
flag_.store(true, memory_order_release);
}
bool test() {
return flag_.load(memory_order_acquire);
}
bool testAndSet() {
return flag_.exchange(true, memory_order_acq_rel);
}
void clear() {
flag_.store(false, memory_order_release);
}
};
Scenario 3: Lock-Free Stack
template<typename T>
class LockFreeStack {
private:
struct Node {
T data;
atomic<Node*> next;
Node(const T& d) : data(d), next(nullptr) {}
};
atomic<Node*> head_{nullptr};
public:
void push(const T& data) {
Node* new_node = new Node(data);
Node* old_head = head_.load();
do {
new_node->next = old_head;
} while (!head_.compare_exchange_weak(old_head, new_node));
}
bool pop(T& result) {
Node* old_head = head_.load();
do {
if (old_head == nullptr) {
return false;
}
} while (!head_.compare_exchange_weak(old_head, old_head->next));
result = old_head->data;
delete old_head;
return true;
}
};
Scenario 4: Lock-Free Queue
template<typename T>
class LockFreeQueue {
private:
struct Node {
atomic<T*> data{nullptr};
atomic<Node*> next{nullptr};
};
atomic<Node*> head_{new Node};
atomic<Node*> tail_{head_.load()};
public:
void enqueue(const T& item) {
Node* new_node = new Node;
T* data = new T(item);
Node* prev_tail = tail_.exchange(new_node, memory_order_acq_rel);
prev_tail->data.store(data, memory_order_release);
prev_tail->next.store(new_node, memory_order_release);
}
bool dequeue(T& result) {
Node* head = head_.load();
Node* next = head->next.load();
if (next == nullptr) {
return false; // Empty
}
T* data = next->data.load();
if (data == nullptr) {
return false; // Not ready yet
}
result = *data;
head_.store(next);
delete data;
delete head;
return true;
}
};
Lock-Free Data Structures
Lock-Free Single-Producer Single-Consumer Queue
template<typename T>
class SPSCQueue {
private:
struct Node {
T data;
atomic<bool> ready{false};
};
vector<Node> buffer_;
atomic<size_t> write_pos_{0};
atomic<size_t> read_pos_{0};
size_t capacity_;
public:
SPSCQueue(size_t capacity) : capacity_(capacity) {
buffer_.resize(capacity);
}
bool push(const T& item) {
size_t pos = write_pos_.load(memory_order_relaxed);
size_t next_pos = (pos + 1) % capacity_;
if (next_pos == read_pos_.load(memory_order_acquire)) {
return false; // Full
}
buffer_[pos].data = item;
buffer_[pos].ready.store(true, memory_order_release);
write_pos_.store(next_pos, memory_order_release);
return true;
}
bool pop(T& result) {
size_t pos = read_pos_.load(memory_order_relaxed);
if (pos == write_pos_.load(memory_order_acquire)) {
return false; // Empty
}
if (!buffer_[pos].ready.load(memory_order_acquire)) {
return false; // Not ready
}
result = buffer_[pos].data;
buffer_[pos].ready.store(false, memory_order_release);
read_pos_.store((pos + 1) % capacity_, memory_order_release);
return true;
}
};
Templates and Examples
Generic Lock-Free Counter Template
template<typename T>
class LockFreeCounter {
private:
atomic<T> count_{0};
public:
void increment(T delta = 1) {
count_.fetch_add(delta, memory_order_relaxed);
}
void decrement(T delta = 1) {
count_.fetch_sub(delta, memory_order_relaxed);
}
T get() const {
return count_.load(memory_order_acquire);
}
T addAndGet(T delta) {
return count_.fetch_add(delta, memory_order_acq_rel) + delta;
}
T getAndAdd(T delta) {
return count_.fetch_add(delta, memory_order_acq_rel);
}
};
// Usage
LockFreeCounter<int> int_counter;
LockFreeCounter<long> long_counter;
Lock-Free Reference Counting
template<typename T>
class LockFreeSharedPtr {
private:
struct ControlBlock {
atomic<int> ref_count_{1};
T* ptr_;
ControlBlock(T* p) : ptr_(p) {}
void add_ref() {
ref_count_.fetch_add(1, memory_order_relaxed);
}
bool release() {
if (ref_count_.fetch_sub(1, memory_order_acq_rel) == 1) {
delete ptr_;
delete this;
return true;
}
return false;
}
};
ControlBlock* control_;
public:
explicit LockFreeSharedPtr(T* ptr)
: control_(new ControlBlock(ptr)) {}
LockFreeSharedPtr(const LockFreeSharedPtr& other)
: control_(other.control_) {
if (control_) {
control_->add_ref();
}
}
~LockFreeSharedPtr() {
if (control_) {
control_->release();
}
}
T& operator*() { return *control_->ptr_; }
T* operator->() { return control_->ptr_; }
};
Complete Example: Lock-Free Producer-Consumer
#include <iostream>
#include <thread>
#include <atomic>
#include <vector>
using namespace std;
template<typename T, size_t Size>
class LockFreeRingBuffer {
private:
array<T, Size> buffer_;
atomic<size_t> write_pos_{0};
atomic<size_t> read_pos_{0};
public:
bool push(const T& item) {
size_t current_write = write_pos_.load(memory_order_relaxed);
size_t next_write = (current_write + 1) % Size;
if (next_write == read_pos_.load(memory_order_acquire)) {
return false; // Full
}
buffer_[current_write] = item;
write_pos_.store(next_write, memory_order_release);
return true;
}
bool pop(T& item) {
size_t current_read = read_pos_.load(memory_order_relaxed);
if (current_read == write_pos_.load(memory_order_acquire)) {
return false; // Empty
}
item = buffer_[current_read];
read_pos_.store((current_read + 1) % Size, memory_order_release);
return true;
}
};
void lockFreeProducerConsumer() {
LockFreeRingBuffer<int, 100> buffer;
atomic<bool> done{false};
// Producer
thread producer([&]() {
for (int i = 0; i < 50; ++i) {
while (!buffer.push(i)) {
this_thread::yield();
}
}
done = true;
});
// Consumer
thread consumer([&]() {
int value;
while (!done || buffer.pop(value)) {
if (buffer.pop(value)) {
cout << "Consumed: " << value << endl;
}
}
});
producer.join();
consumer.join();
}
Best Practices and Pitfalls
1. Understand Memory Ordering
// GOOD: Correct memory ordering
atomic<bool> ready{false};
int data = 0;
void producer() {
data = 42;
ready.store(true, memory_order_release); // Release
}
void consumer() {
if (ready.load(memory_order_acquire)) { // Acquire
// data is guaranteed to be 42
}
}
// BAD: Wrong memory ordering
void bad_producer() {
data = 42;
ready.store(true, memory_order_relaxed); // No ordering!
}
void bad_consumer() {
if (ready.load(memory_order_relaxed)) {
// data might not be 42 yet!
}
}
2. Handle ABA Problem
// ABA Problem: Value changes A->B->A, but we think it's still A
// Solution: Use version numbers or hazard pointers
template<typename T>
class ABAFreeStack {
struct Node {
T data;
atomic<Node*> next;
atomic<size_t> version{0}; // Version counter
};
// ... implementation with version checking
};
3. Avoid False Sharing
// BAD: False sharing (same cache line)
struct {
atomic<int> counter1;
atomic<int> counter2; // Same cache line!
} counters;
// GOOD: Separate cache lines
alignas(64) atomic<int> counter1; // 64-byte alignment
alignas(64) atomic<int> counter2;
4. Use Appropriate Atomic Types
// GOOD: Use atomic for shared data
atomic<int> shared_counter{0};
// BAD: Non-atomic shared data
int shared_counter = 0; // Race condition!
Common Pitfalls
- Wrong memory ordering: Using relaxed when acquire/release needed
- ABA problem: Not handling value reuse
- False sharing: Contention on same cache line
- Non-atomic operations: Mixing atomic and non-atomic
- Infinite loops: CAS loops without proper exit conditions
Summary
Lock-free design provides high-performance thread-safe operations:
- Atomic operations: Foundation of lock-free programming
- Memory ordering: Critical for correctness
- CAS operations: Enable complex lock-free algorithms
- Performance: Better scalability than locks
- Complexity: Requires careful design
Key takeaways:
- Use
std::atomicfor shared data - Understand memory ordering semantics
- Use CAS for complex operations
- Beware of ABA problem
- Profile to verify performance gains
By mastering lock-free design, you can build high-performance concurrent systems in C++.