C++ Fork-Join Pattern: Real-World Engineering Guide

Problem Solved

Divide a task into smaller tasks recursively, process in parallel, then join results.

How It Works

  • Fork: Split task into sub-tasks
  • Join: Merge results at the end
  • Recursive divide-and-conquer approach

STL Usage

#include <future>
#include <vector>
#include <algorithm>
#include <thread>
#include <numeric>
using namespace std;

template<typename InputIt, typename Func>
auto forkJoin(InputIt first, InputIt last, Func func, size_t threshold) 
    -> decltype(func(first, last)) {
    
    size_t size = distance(first, last);
    
    // Base case: process directly
    if (size <= threshold) {
        return func(first, last);
    }
    
    // Fork: split into two halves
    auto mid = first + size / 2;
    
    auto left_future = async(launch::async, [=]() {
        return forkJoin(first, mid, func, threshold);
    });
    
    auto right_result = forkJoin(mid, last, func, threshold);
    auto left_result = left_future.get();
    
    // Join: combine results (simplified - actual join depends on operation)
    return left_result + right_result;  // Example: sum
}

// Parallel for_each using fork-join
template<typename InputIt, typename Func>
void parallelForEach(InputIt first, InputIt last, Func func, size_t threshold = 1000) {
    size_t size = distance(first, last);
    
    if (size <= threshold) {
        for_each(first, last, func);
        return;
    }
    
    auto mid = first + size / 2;
    
    auto left_future = async(launch::async, [=]() {
        parallelForEach(first, mid, func, threshold);
    });
    
    parallelForEach(mid, last, func, threshold);
    left_future.wait();
}

Example

#include <iostream>
using namespace std;

int parallelSum(const vector<int>& data, size_t threshold = 1000) {
    return forkJoin(
        data.begin(), data.end(),
        [](auto first, auto last) {
            return accumulate(first, last, 0);
        },
        threshold
    );
}

void forkJoinExample() {
    vector<int> data(10000);
    iota(data.begin(), data.end(), 1);
    
    int sum = parallelSum(data);
    cout << "Sum: " << sum << endl;
}

Use Cases

  • Parallel algorithms: Divide and conquer
  • Sorting: Parallel merge sort, quick sort
  • Tree processing: Parallel tree traversal
  • Scientific computing: Parallel numerical methods

Key Takeaways

  • Natural for divide-and-conquer algorithms
  • Recursive parallelization
  • Built into many frameworks (Java ForkJoinPool, C++ parallel algorithms)
  • Efficient for CPU-bound tasks

Things to Be Careful About

  • Threshold: Too small causes overhead, too large limits parallelism
  • Load balancing: Ensure even work distribution
  • Memory: Recursive calls use stack space
  • Overhead: Task creation has cost

Summary

Fork-Join enables efficient recursive parallelization, ideal for divide-and-conquer algorithms and parallel processing.