[Medium] 49. Group Anagrams

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Examples

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Example 2:

Input: strs = [""]
Output: [[""]]

Example 3:

Input: strs = ["a"]
Output: [["a"]]

Constraints

1 <= strs.length <= 10^4
0 <= strs[i].length <= 100
strs[i] consists of lowercase English letters.

Clarification Questions

Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:

Anagram definition: What is an anagram? (Assumption: Words with same characters in different order - same character frequencies)
Grouping rule: How should we group anagrams? (Assumption: Group strings that are anagrams of each other together)
Return format: What should we return? (Assumption: List of groups - each group contains anagrams)
Order requirement: Does order of groups matter? (Assumption: No - can return in any order)
Single character: Do single characters count as anagrams? (Assumption: Yes - “a” is anagram of “a”)

Interview Deduction Process (20 minutes)

Step 1: Brute-Force Approach (5 minutes)

For each string, compare it with all other strings to check if they are anagrams. To check if two strings are anagrams, sort both strings and compare, or count character frequencies. Group strings that are anagrams together. This approach has O(n² × m log m) complexity where n is the number of strings and m is the average length, which is too slow for large inputs.

Step 2: Semi-Optimized Approach (7 minutes)

Sort each string and use the sorted string as a key to group anagrams. Use a hash map where the key is the sorted string and the value is a list of original strings. This reduces comparison time but still requires sorting each string, giving O(n × m log m) time complexity. This works well but can be optimized further by avoiding sorting.

Step 3: Optimized Solution (8 minutes)

Use character frequency as the key instead of sorting. For each string, count the frequency of each character and create a key (e.g., “a2b1c3” for “aabccc”). Use this key to group anagrams in a hash map. This avoids sorting and achieves O(n × m) time complexity where m is the average string length. Alternatively, use a tuple or array of 26 integers as the key. The key insight is that anagrams have the same character frequencies, so frequency counts serve as a perfect grouping key without the overhead of sorting.

Solution: Character Count Key Approach

Time Complexity: O(N * K) where N is the number of strings and K is the maximum length of a string
Space Complexity: O(N * K) for storing all strings in the hash map

The key insight is to use a character frequency count as the hash map key. Strings with the same character frequencies are anagrams of each other.

Solution: Character Count Key

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        if(strs.size() == 0) return vector<vector<string>>();
        
        unordered_map<string, vector<string>> hm;
        int count[26];
        
        for(string& s: strs) {
            fill(begin(count), end(count), 0);
            
            for(char c: s) count[c-'a']++;
            
            string key = "";
            for(int i = 0; i < 26; i++) {
                key += "#";
                key += to_string(count[i]);
            }
            
            if(!hm.contains(key)) hm[key] = vector<string>();
            hm[key].push_back(s);
        }
        
        vector<vector<string>> rtn;
        for(auto itr = hm.begin(); itr != hm.end(); itr++) {
            rtn.push_back(itr->second);
        }
        
        return rtn;
    }
};

How the Algorithm Works

Step-by-Step Example: `strs = ["eat","tea","tan","ate","nat","bat"]`

Step 1: Process "eat"
  Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0"
  hm[key] = ["eat"]

Step 2: Process "tea"
  Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0" (same as "eat")
  hm[key] = ["eat", "tea"]

Step 3: Process "tan"
  Count: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#1#0#0#0#0#0#0"
  hm[key] = ["tan"]

Step 4: Process "ate"
  Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0" (same as "eat")
  hm[key] = ["eat", "tea", "ate"]

Step 5: Process "nat"
  Count: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#1#0#0#0#0#0#0" (same as "tan")
  hm[key] = ["tan", "nat"]

Step 6: Process "bat"
  Count: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0"
  hm[key] = ["bat"]

Result:
  [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Visual Representation

Input:  ["eat", "tea", "tan", "ate", "nat", "bat"]
         ↓       ↓       ↓       ↓       ↓       ↓
Keys:   key1   key1   key2   key1   key2   key3
         ↓       ↓       ↓       ↓       ↓       ↓
Groups: ["eat","tea","ate"]  ["tan","nat"]  ["bat"]

Key Insights

Character Frequency as Key: Use character count array to create a unique key for each anagram group
Hash Map Grouping: Strings with identical character frequencies map to the same key
Delimiter Usage: Using “#” delimiter ensures keys are unique (e.g., “1#2” vs “12#”)
Efficient Counting: Count array of size 26 (for lowercase letters) is space-efficient

Algorithm Breakdown

vector<vector<string>> groupAnagrams(vector<string>& strs) {
    // Handle empty input
    if(strs.size() == 0) return vector<vector<string>>();
    
    // Map: character count key -> list of anagrams
    unordered_map<string, vector<string>> hm;
    int count[26];  // Count array for 26 lowercase letters
    
    for(string& s: strs) {
        // Reset count array
        fill(begin(count), end(count), 0);
        
        // Count characters in current string
        for(char c: s) count[c-'a']++;
        
        // Build key from character counts
        string key = "";
        for(int i = 0; i < 26; i++) {
            key += "#";           // Delimiter
            key += to_string(count[i]);  // Count for each letter
        }
        
        // Add string to appropriate group
        if(!hm.contains(key)) hm[key] = vector<string>();
        hm[key].push_back(s);
    }
    
    // Convert map values to result vector
    vector<vector<string>> rtn;
    for(auto itr = hm.begin(); itr != hm.end(); itr++) {
        rtn.push_back(itr->second);
    }
    
    return rtn;
}

Edge Cases

Empty input: strs = [] → return []
Single empty string: strs = [""] → return [[""]]
Single character: strs = ["a"] → return [["a"]]
All anagrams: strs = ["eat","tea","ate"] → return [["eat","tea","ate"]]
No anagrams: strs = ["abc","def","ghi"] → return [["abc"],["def"],["ghi"]]

Alternative Approaches

Approach 2: Sorted String Key

Time Complexity: O(N * K log K) where K is the average string length
Space Complexity: O(N * K)

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        unordered_map<string, vector<string>> hm;
        
        for(string& s: strs) {
            string key = s;
            sort(key.begin(), key.end());
            hm[key].push_back(s);
        }
        
        vector<vector<string>> rtn;
        for(auto& [key, group] : hm) {
            rtn.push_back(group);
        }
        
        return rtn;
    }
};

Pros:

Simpler implementation
Easier to understand

Cons:

Slower due to sorting: O(K log K) per string
Less efficient for long strings

Approach 3: Prime Number Hash (Advanced)

Time Complexity: O(N * K)
Space Complexity: O(N * K)

Uses prime numbers to create hash keys, avoiding string concatenation overhead.

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        // Prime numbers for each letter
        int primes[26] = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101};
        
        unordered_map<long long, vector<string>> hm;
        
        for(string& s: strs) {
            long long key = 1;
            for(char c: s) {
                key *= primes[c - 'a'];
            }
            hm[key].push_back(s);
        }
        
        vector<vector<string>> rtn;
        for(auto& [key, group] : hm) {
            rtn.push_back(group);
        }
        
        return rtn;
    }
};

Pros:

Fast key generation (multiplication)
No string concatenation overhead

Cons:

Risk of integer overflow for very long strings
More complex to implement

Complexity Analysis

Approach	Time	Space	Pros	Cons
Character Count Key	O(N * K)	O(N * K)	Fast, no sorting	String concatenation overhead
Sorted String Key	O(N * K log K)	O(N * K)	Simple, readable	Slower due to sorting
Prime Number Hash	O(N * K)	O(N * K)	Very fast key generation	Overflow risk, complex

Why Character Count Key is Preferred

Optimal Time Complexity: O(N * K) without sorting overhead
Predictable Performance: No dependency on string length for key generation
Memory Efficient: Fixed-size count array (26 integers)
Robust: Works for any string length without overflow concerns

Implementation Details

Character Count Array

int count[26];  // For 26 lowercase letters a-z
fill(begin(count), end(count), 0);  // Reset to zero

// Count characters
for(char c: s) count[c-'a']++;  // 'a' maps to index 0, 'z' to 25

Key Construction

string key = "";
for(int i = 0; i < 26; i++) {
    key += "#";              // Delimiter prevents ambiguity
    key += to_string(count[i]);  // Count for letter at position i
}

Why use “#” delimiter?

Without delimiter: “12” could mean count[0]=1, count[1]=2 OR count[0]=12
With delimiter: “#1#2” unambiguously means count[0]=1, count[1]=2

C++20 contains() Method

if(!hm.contains(key)) hm[key] = vector<string>();

Alternative (C++11/14):

if(hm.find(key) == hm.end()) hm[key] = vector<string>();

Common Mistakes

Forgetting to reset count array: Must reset for each string
Wrong delimiter: Using numbers without delimiter causes key collisions
Case sensitivity: Assuming uppercase letters (this problem uses lowercase only)
Empty string handling: Not handling empty input or empty strings correctly
Inefficient key generation: Using sorting when counting is faster

Optimization Tips

Pre-allocate result vector: Can reserve space if you know approximate number of groups
Use emplace_back: More efficient than push_back for strings
Avoid string concatenation: Character count approach minimizes this overhead
Early return: Handle empty input immediately

242. Valid Anagram - Check if two strings are anagrams
438. Find All Anagrams in a String - Find anagram substrings
2273. Find Resultant Array After Removing Anagrams - Remove anagrams from array
49. Group Anagrams - This problem

Real-World Applications

Word Games: Grouping words by anagram patterns (Scrabble, Boggle)
Text Analysis: Finding similar words or patterns in text
Cryptography: Anagram-based ciphers and puzzles
Search Engines: Grouping similar search terms
Data Deduplication: Identifying similar strings

This problem demonstrates the power of using character frequency as a hash key, showing how counting can be more efficient than sorting for certain string problems.

Robina Li

[Medium] 49. Group Anagrams

[Medium] 49. Group Anagrams

Examples

Constraints

Clarification Questions

Interview Deduction Process (20 minutes)

Solution: Character Count Key Approach

Solution: Character Count Key

How the Algorithm Works

Step-by-Step Example: `strs = ["eat","tea","tan","ate","nat","bat"]`

Visual Representation

Key Insights

Algorithm Breakdown

Edge Cases

Alternative Approaches

Approach 2: Sorted String Key

Approach 3: Prime Number Hash (Advanced)

Complexity Analysis

Why Character Count Key is Preferred

Implementation Details

Character Count Array

Key Construction

C++20 contains() Method

Common Mistakes

Optimization Tips

Real-World Applications

Related Posts

Recent Posts

[Medium] 49. Group Anagrams

Examples

Constraints

Clarification Questions

Interview Deduction Process (20 minutes)

Solution: Character Count Key Approach

Solution: Character Count Key

How the Algorithm Works

Step-by-Step Example: strs = ["eat","tea","tan","ate","nat","bat"]

Visual Representation

Key Insights

Algorithm Breakdown

Edge Cases

Alternative Approaches

Approach 2: Sorted String Key

Approach 3: Prime Number Hash (Advanced)

Complexity Analysis

Why Character Count Key is Preferred

Implementation Details

Character Count Array

Key Construction

C++20 contains() Method

Common Mistakes

Optimization Tips

Related Problems

Real-World Applications

Related Posts

Recent Posts

Step-by-Step Example: `strs = ["eat","tea","tan","ate","nat","bat"]`