C++ std::unordered_set Guide: Hash-Based Set Container

A comprehensive guide to std::unordered_set, a hash-based associative container that stores unique elements with average O(1) operations, covering all essential methods, common use cases, and best practices.

Unordered Set Basics
Element Access Methods
Modifiers
Lookup Operations
Bucket Interface
Common Use Cases
Runtime Complexity Analysis
Best Practices

Unordered Set Basics

std::unordered_set is a hash-based associative container that stores unique elements. It provides average O(1) operations for insert, find, and erase.

#include <unordered_set>
#include <string>
#include <iostream>
using namespace std;

int main() {
    // Default construction
    unordered_set<int> set1;
    
    // Initializer list (C++11)
    unordered_set<int> set2 = {1, 2, 3, 4, 5};
    
    // Copy construction
    unordered_set<int> set3(set2);
    
    // Custom hash and equality
    struct MyHash {
        size_t operator()(const string& s) const {
            return hash<string>{}(s);
        }
    };
    unordered_set<string, MyHash> set4;
    
    // Insert elements
    set2.insert(6);
    set2.insert(7);
    
    // Iterate (order is not guaranteed)
    for (const auto& elem : set2) {
        cout << elem << " ";
    }
}

Key Characteristics

Hash-based: Uses hash table for O(1) average operations
Unique elements: Each element appears only once
Unordered: Elements are not sorted (order is implementation-defined)
Fast operations: Average O(1) for insert, find, erase
Load factor: Automatically manages hash table load

Element Access Methods

Iterator-Based Access

#include <unordered_set>
using namespace std;

unordered_set<int> s = {1, 2, 3, 4, 5};

// Access via iterator
auto it = s.begin();
int first = *it;  // Value of first element (order not guaranteed)

// No direct access by value - must use find()

⚠️ Note: std::unordered_set doesn’t provide front(), back(), or operator[] because elements are not ordered and not key-value pairs.

Modifiers

Insertion

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> s;

// Method 1: insert() with value
s.insert(42);

// Method 2: insert() with hint (may improve performance)
auto hint = s.end();
s.insert(hint, 100);

// Method 3: insert() with range
vector<int> vec = {1, 2, 3};
s.insert(vec.begin(), vec.end());

// Method 4: emplace() (C++11) - constructs in place
s.emplace(200);

// Method 5: emplace_hint() - with hint
s.emplace_hint(s.end(), 300);

// Check if insertion succeeded
auto [it, inserted] = s.insert(42);
if (!inserted) {
    cout << "Element already exists" << endl;
}

Erasure

#include <unordered_set>
using namespace std;

unordered_set<int> s = {1, 2, 3, 4, 5};

// Erase by value
size_t erased = s.erase(3);  // Returns 1 if erased, 0 if not found

// Erase by iterator
auto it = s.find(4);
if (it != s.end()) {
    s.erase(it);  // Returns iterator to next element
}

// Erase range
auto first = s.find(2);
auto last = s.end();
s.erase(first, last);  // Erases from first to last (exclusive)

// Clear all
s.clear();

Lookup Operations

`find()` - Find Element

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> s = {1, 2, 3, 4, 5};

auto it = s.find(3);
if (it != s.end()) {
    cout << "Found: " << *it << endl;
} else {
    cout << "Not found" << endl;
}

`count()` - Check Existence

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> s = {1, 2, 3};

if (s.count(2) > 0) {
    cout << "Element exists" << endl;
}
// Returns 1 if exists, 0 if not (since elements are unique)

`contains()` - Check Existence (C++20)

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> s = {1, 2, 3};

if (s.contains(2)) {
    cout << "Element exists" << endl;
}

`equal_range()` - Find All Elements with Key

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> s = {1, 2, 3, 4, 5};

// For unordered_set, equal_range returns range of size 0 or 1
auto [first, last] = s.equal_range(3);
if (first != last) {
    cout << *first << endl;  // 3
}

Bucket Interface

std::unordered_set uses buckets to store elements. Understanding the bucket interface helps optimize performance.

Bucket Operations

#include <unordered_set>
using namespace std;

unordered_set<int> s = {1, 2, 3, 4, 5};

// Get number of buckets
size_t bucket_count = s.bucket_count();

// Get bucket for a value
size_t bucket = s.bucket(3);

// Get bucket size
size_t size = s.bucket_size(bucket);

// Get load factor
float load_factor = s.load_factor();  // size() / bucket_count()

// Get max load factor
float max_load = s.max_load_factor();

// Set max load factor (triggers rehash if needed)
s.max_load_factor(0.75f);

// Rehash to accommodate at least n elements
s.rehash(100);

// Reserve space for at least n elements
s.reserve(100);  // More efficient than rehash

Iterating Buckets

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> s = {1, 2, 3, 4, 5};

// Iterate through buckets
for (size_t i = 0; i < s.bucket_count(); ++i) {
    cout << "Bucket " << i << ": ";
    for (auto it = s.begin(i); it != s.end(i); ++it) {
        cout << *it << " ";
    }
    cout << endl;
}

Common Use Cases

1. Fast Lookup/Existence Check

#include <unordered_set>
#include <vector>
#include <iostream>
using namespace std;

unordered_set<int> lookup;

// Build lookup set
lookup.insert(100);
lookup.insert(200);
lookup.insert(300);

// Fast O(1) average lookup
if (lookup.find(200) != lookup.end()) {
    cout << "Found!" << endl;
}

2. Removing Duplicates

#include <unordered_set>
#include <vector>
#include <iostream>
using namespace std;

vector<int> vec = {1, 2, 2, 3, 3, 3, 4, 5};

// Remove duplicates
unordered_set<int> unique_set(vec.begin(), vec.end());
vector<int> unique_vec(unique_set.begin(), unique_set.end());

// unique_vec: {1, 2, 3, 4, 5} (order may vary)

3. Fast Membership Testing

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<string> valid_keys = {"key1", "key2", "key3"};

string input = "key2";
if (valid_keys.contains(input)) {  // C++20
    cout << "Valid key" << endl;
}

// Pre-C++20
if (valid_keys.find(input) != valid_keys.end()) {
    cout << "Valid key" << endl;
}

4. Caching/Memoization

#include <unordered_set>
#include <iostream>
using namespace std;

unordered_set<int> computed_values;

bool is_computed(int value) {
    return computed_values.find(value) != computed_values.end();
}

void mark_computed(int value) {
    computed_values.insert(value);
}

// Use in computation
int compute(int n) {
    if (is_computed(n)) {
        return n;  // Already computed
    }
    mark_computed(n);
    // ... computation
    return n;
}

5. Custom Hash Function

#include <unordered_set>
#include <string>
using namespace std;

struct Person {
    string name;
    int age;
};

// Custom hash function
struct PersonHash {
    size_t operator()(const Person& p) const {
        return hash<string>{}(p.name) ^ (hash<int>{}(p.age) << 1);
    }
};

// Custom equality
struct PersonEqual {
    bool operator()(const Person& a, const Person& b) const {
        return a.name == b.name && a.age == b.age;
    }
};

unordered_set<Person, PersonHash, PersonEqual> people;
people.insert({Person{"Alice", 30}});
people.insert({Person{"Bob", 25}});

Runtime Complexity Analysis

Understanding the time and space complexity of std::unordered_set operations is crucial for writing efficient code.

Time Complexity

Operation	Average	Worst Case	Notes
Element Access
`begin()`, `end()`	O(1)	O(1)	Iterator creation
Lookup
`find()`, `count()`, `contains()` (C++20)	O(1)	O(n)	Hash collision worst case
`equal_range()`	O(1)	O(n)	Hash collision worst case
Modifiers
`insert()` (single element)	O(1)	O(n)	Rehash may occur
`insert()` (hint)	O(1)	O(n)	Hint may improve performance
`insert()` (range)	O(m)	O(m × n)	m = range size
`emplace()`, `emplace_hint()`	O(1)	O(n)	Similar to insert
`erase()` (by value)	O(1)	O(n)	Hash collision worst case
`erase()` (by iterator)	O(1)	O(n)	Hash collision worst case
`erase()` (range)	O(m)	O(m × n)	m = number of elements erased
`clear()`	O(n)	O(n)	Destroys all elements
Bucket Operations
`bucket()`, `bucket_count()`, `bucket_size()`	O(1)	O(1)	Constant time
`load_factor()`, `max_load_factor()`	O(1)	O(1)	Constant time
`rehash()`, `reserve()`	O(n)	O(n)	Rebuilds hash table
Operations
`size()`, `empty()`, `max_size()`	O(1)	O(1)	Constant time
`swap()`	O(1)	O(1)	Constant time, swaps internal pointers
`merge()` (C++17)	O(n)	O(n × m)	n = source size, m = destination size
Comparison
`==`, `!=`	O(n)	O(n)	Element-wise comparison

Space Complexity

Storage: O(n) where n is the number of elements
Bucket overhead: Hash table with buckets (typically 1-2x the number of elements)
Node overhead: Each node stores value and next pointer (for chaining)
Total: Typically ~24-32 bytes per element on 64-bit systems (including hash table structure)

Hash Table Characteristics

std::unordered_set is implemented as a hash table:

Load factor: Ratio of elements to buckets (default max ~1.0)
Rehashing: Automatically rehashes when load factor exceeds max
Collision resolution: Typically uses chaining (linked lists in buckets)
Hash function: Uses std::hash<Key> by default
Order: Elements are not ordered (order is implementation-defined)

Factors Affecting Performance

Hash function quality: Poor hash function leads to more collisions
Load factor: Higher load factor increases collision probability
Element distribution: Uniform element distribution improves performance
Rehashing: Occurs when load factor exceeds max, causing O(n) operation

Comparison with Other Containers

Operation	`std::unordered_set`	`std::set`	`std::vector`
Insert	O(1) average	O(log n)	O(1) amortized (end)
Find	O(1) average	O(log n)	O(n)
Erase	O(1) average	O(log n)	O(n)
Ordered iteration	❌ No	✅ Yes	✅ Yes (if sorted)
Memory overhead	Higher	Higher	Lower
Worst case	O(n)	O(log n)	O(n)

Performance Tips Based on Complexity

Use reserve() when size is known → Avoids rehashing, improves performance
Choose good hash function → Reduces collisions, improves average performance
Monitor load factor → Adjust max_load_factor() if needed
Prefer emplace() for complex types → Avoids unnecessary copies
Use find() instead of count() == 1 → Both O(1) average, but find() is more semantic
Consider std::set if ordering is needed → O(log n) vs O(1) average, but ordered

When to Use `std::unordered_set`

✅ Use std::unordered_set when:

Fast lookups are critical (O(1) average)
Ordering is not needed
Elements are hashable
Insert/erase operations are frequent
Large datasets with good hash distribution
Removing duplicates efficiently

❌ Avoid std::unordered_set when:

Ordering is required → Use std::set (O(log n))
Worst-case performance matters → std::set has guaranteed O(log n)
Elements are not hashable → Use std::set
Memory is very constrained → Consider std::vector with sorted elements

Best Practices

✅ Do’s

Use reserve() when size is known

#include <unordered_set>
using namespace std;
   
unordered_set<int> s;
s.reserve(1000);  // Avoids rehashing

Use find() or contains() for existence checks

#include <unordered_set>
using namespace std;
   
unordered_set<int> s = {1, 2, 3};
   
// ✅ Good
if (s.find(2) != s.end()) {
    // Element exists
}
   
// ✅ Good (C++20)
if (s.contains(2)) {
    // Element exists
}

Use structured bindings (C++17) for iteration

#include <unordered_set>
#include <iostream>
using namespace std;
   
unordered_set<int> s = {1, 2, 3, 4, 5};
for (const auto& elem : s) {
    cout << elem << " ";
}

Use emplace() for complex types

#include <unordered_set>
using namespace std;
   
unordered_set<ComplexType> s;
s.emplace(arg1, arg2);  // Constructs in place

Provide custom hash for custom types

#include <unordered_set>
#include <string>
using namespace std;
   
struct MyHash {
    size_t operator()(const MyType& t) const {
        // Good hash function
    }
};

❌ Don’ts

Don’t assume ordering

#include <unordered_set>
using namespace std;
   
unordered_set<int> s = {1, 2, 3, 4, 5};
   
// ❌ Bad - order is not guaranteed
for (const auto& elem : s) {
    // Order may vary between runs
}

Don’t use std::unordered_set when ordering is needed

#include <unordered_set>
#include <set>
using namespace std;
   
// ❌ If ordering needed
unordered_set<int> s;
   
// ✅ Use set
set<int> s;

Don’t ignore hash function quality

#include <unordered_set>
using namespace std;
   
// ❌ Bad hash function (all map to same bucket)
struct BadHash {
    size_t operator()(const int&) const { return 0; }
};
   
// ✅ Good hash function
struct GoodHash {
    size_t operator()(const int& x) const {
        return hash<int>{}(x);
    }
};

Don’t forget to reserve space when size is known

#include <unordered_set>
#include <vector>
using namespace std;
   
vector<int> data = {1, 2, 3, 4, 5};
   
// ❌ May cause rehashing
unordered_set<int> s;
for (int x : data) {
    s.insert(x);
}
   
// ✅ Better - reserve space
unordered_set<int> s;
s.reserve(data.size());
for (int x : data) {
    s.insert(x);
}

Performance Tips

Reserve space: Use reserve() to avoid rehashing
Good hash function: Distributes elements evenly across buckets
Monitor load factor: Adjust max_load_factor() if needed
Custom hash: Provide efficient hash function for custom types
Avoid rehashing: Use reserve() when size is known

Summary: std::unordered_set provides fast O(1) average operations for unique element storage. Use it when ordering is not needed and fast lookups are critical. For ordered element storage, use std::set instead.

C++ & Systems Blog

C++ std::unordered_set Guide: Hash-Based Set Container

C++ std::unordered_set Guide: Hash-Based Set Container

Table of Contents

Unordered Set Basics

Key Characteristics

Element Access Methods

Iterator-Based Access

Modifiers

Insertion

Erasure

Lookup Operations

`find()` - Find Element

`count()` - Check Existence

`contains()` - Check Existence (C++20)

`equal_range()` - Find All Elements with Key

Bucket Interface

Bucket Operations

Iterating Buckets

Common Use Cases

1. Fast Lookup/Existence Check

2. Removing Duplicates

3. Fast Membership Testing

4. Caching/Memoization

5. Custom Hash Function

Runtime Complexity Analysis

Time Complexity

Space Complexity

Hash Table Characteristics

Factors Affecting Performance

Comparison with Other Containers

Performance Tips Based on Complexity

When to Use `std::unordered_set`

Best Practices

✅ Do’s

❌ Don’ts

Performance Tips

Related Posts

Recent Posts

C++ std::unordered_set Guide: Hash-Based Set Container

Table of Contents

Unordered Set Basics

Key Characteristics

Element Access Methods

Iterator-Based Access

Modifiers

Insertion

Erasure

Lookup Operations

find() - Find Element

count() - Check Existence

contains() - Check Existence (C++20)

equal_range() - Find All Elements with Key

Bucket Interface

Bucket Operations

Iterating Buckets

Common Use Cases

1. Fast Lookup/Existence Check

2. Removing Duplicates

3. Fast Membership Testing

4. Caching/Memoization

5. Custom Hash Function

Runtime Complexity Analysis

Time Complexity

Space Complexity

Hash Table Characteristics

Factors Affecting Performance

Comparison with Other Containers

Performance Tips Based on Complexity

When to Use std::unordered_set

Best Practices

✅ Do’s

❌ Don’ts

Performance Tips

Related Posts

Recent Posts

`find()` - Find Element

`count()` - Check Existence

`contains()` - Check Existence (C++20)

`equal_range()` - Find All Elements with Key

When to Use `std::unordered_set`