C++ Map-Reduce Pattern: Real-World Engineering Guide
C++ Map-Reduce Pattern: Real-World Engineering Guide
Problem Solved
Parallelize independent computations then aggregate results efficiently.
How It Works
- Map phase: Split work into parallel chunks, process independently
- Reduce phase: Merge results (sum, max, combine objects)
- Each phase can run in parallel
STL Usage
#include <vector>
#include <thread>
#include <numeric>
#include <algorithm>
#include <future>
using namespace std;
template<typename InputIt, typename MapFunc, typename ReduceFunc>
auto mapReduce(InputIt first, InputIt last, MapFunc map_func, ReduceFunc reduce_func, size_t num_threads)
-> decltype(reduce_func(map_func(*first), map_func(*first))) {
using MapResult = decltype(map_func(*first));
using ReduceResult = decltype(reduce_func(MapResult{}, MapResult{}));
size_t total = distance(first, last);
size_t chunk_size = total / num_threads;
vector<future<MapResult>> futures;
// Map phase: parallel processing
for (size_t i = 0; i < num_threads; ++i) {
auto chunk_start = first + i * chunk_size;
auto chunk_end = (i == num_threads - 1) ? last : chunk_start + chunk_size;
futures.push_back(async(launch::async, [=]() {
MapResult result{};
for (auto it = chunk_start; it != chunk_end; ++it) {
result = reduce_func(result, map_func(*it));
}
return result;
}));
}
// Reduce phase: combine results
ReduceResult final_result{};
for (auto& fut : futures) {
final_result = reduce_func(final_result, fut.get());
}
return final_result;
}
Example
#include <iostream>
using namespace std;
void mapReduceExample() {
vector<int> data(1000);
iota(data.begin(), data.end(), 1);
// Map: square each number, Reduce: sum all
int result = mapReduce(
data.begin(), data.end(),
[](int x) { return x * x; }, // Map function
[](int a, int b) { return a + b; }, // Reduce function
4 // Number of threads
);
cout << "Sum of squares: " << result << endl;
}
Use Cases
- Data processing: Process large datasets in parallel
- Parallel algorithms: Divide and conquer
- Big data: Distributed computation
- Scientific computing: Parallel numerical operations
Key Takeaways
- Excellent for parallelizable computations
- Scales well with data size
- Two-phase approach: map then reduce
- Common in distributed systems
Things to Be Careful About
- Data dependencies: Map operations must be independent
- Reduce associativity: Reduce function must be associative
- Load balancing: Ensure even work distribution
- Memory usage: Large intermediate results
Summary
Map-Reduce is powerful for parallel data processing, enabling efficient parallelization of independent computations.