[Medium] 49. Group Anagrams
[Medium] 49. Group Anagrams
Given an array of strings strs, group the anagrams together. You can return the answer in any order.
An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
Examples
Example 1:
Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]
Example 2:
Input: strs = [""]
Output: [[""]]
Example 3:
Input: strs = ["a"]
Output: [["a"]]
Constraints
1 <= strs.length <= 10^40 <= strs[i].length <= 100strs[i]consists of lowercase English letters.
Clarification Questions
Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:
-
Anagram definition: What is an anagram? (Assumption: Words with same characters in different order - same character frequencies)
-
Grouping rule: How should we group anagrams? (Assumption: Group strings that are anagrams of each other together)
-
Return format: What should we return? (Assumption: List of groups - each group contains anagrams)
-
Order requirement: Does order of groups matter? (Assumption: No - can return in any order)
-
Single character: Do single characters count as anagrams? (Assumption: Yes - “a” is anagram of “a”)
Interview Deduction Process (20 minutes)
Step 1: Brute-Force Approach (5 minutes)
For each string, compare it with all other strings to check if they are anagrams. To check if two strings are anagrams, sort both strings and compare, or count character frequencies. Group strings that are anagrams together. This approach has O(n² × m log m) complexity where n is the number of strings and m is the average length, which is too slow for large inputs.
Step 2: Semi-Optimized Approach (7 minutes)
Sort each string and use the sorted string as a key to group anagrams. Use a hash map where the key is the sorted string and the value is a list of original strings. This reduces comparison time but still requires sorting each string, giving O(n × m log m) time complexity. This works well but can be optimized further by avoiding sorting.
Step 3: Optimized Solution (8 minutes)
Use character frequency as the key instead of sorting. For each string, count the frequency of each character and create a key (e.g., “a2b1c3” for “aabccc”). Use this key to group anagrams in a hash map. This avoids sorting and achieves O(n × m) time complexity where m is the average string length. Alternatively, use a tuple or array of 26 integers as the key. The key insight is that anagrams have the same character frequencies, so frequency counts serve as a perfect grouping key without the overhead of sorting.
Solution: Character Count Key Approach
Time Complexity: O(N * K) where N is the number of strings and K is the maximum length of a string
Space Complexity: O(N * K) for storing all strings in the hash map
The key insight is to use a character frequency count as the hash map key. Strings with the same character frequencies are anagrams of each other.
Solution: Character Count Key
class Solution {
public:
vector<vector<string>> groupAnagrams(vector<string>& strs) {
if(strs.size() == 0) return vector<vector<string>>();
unordered_map<string, vector<string>> hm;
int count[26];
for(string& s: strs) {
fill(begin(count), end(count), 0);
for(char c: s) count[c-'a']++;
string key = "";
for(int i = 0; i < 26; i++) {
key += "#";
key += to_string(count[i]);
}
if(!hm.contains(key)) hm[key] = vector<string>();
hm[key].push_back(s);
}
vector<vector<string>> rtn;
for(auto itr = hm.begin(); itr != hm.end(); itr++) {
rtn.push_back(itr->second);
}
return rtn;
}
};
How the Algorithm Works
Step-by-Step Example: strs = ["eat","tea","tan","ate","nat","bat"]
Step 1: Process "eat"
Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0"
hm[key] = ["eat"]
Step 2: Process "tea"
Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0" (same as "eat")
hm[key] = ["eat", "tea"]
Step 3: Process "tan"
Count: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
Key: "#1#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#1#0#0#0#0#0#0"
hm[key] = ["tan"]
Step 4: Process "ate"
Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0" (same as "eat")
hm[key] = ["eat", "tea", "ate"]
Step 5: Process "nat"
Count: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
Key: "#1#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#1#0#0#0#0#0#0" (same as "tan")
hm[key] = ["tan", "nat"]
Step 6: Process "bat"
Count: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
Key: "#1#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0"
hm[key] = ["bat"]
Result:
[["eat","tea","ate"], ["tan","nat"], ["bat"]]
Visual Representation
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
↓ ↓ ↓ ↓ ↓ ↓
Keys: key1 key1 key2 key1 key2 key3
↓ ↓ ↓ ↓ ↓ ↓
Groups: ["eat","tea","ate"] ["tan","nat"] ["bat"]
Key Insights
- Character Frequency as Key: Use character count array to create a unique key for each anagram group
- Hash Map Grouping: Strings with identical character frequencies map to the same key
- Delimiter Usage: Using “#” delimiter ensures keys are unique (e.g., “1#2” vs “12#”)
- Efficient Counting: Count array of size 26 (for lowercase letters) is space-efficient
Algorithm Breakdown
vector<vector<string>> groupAnagrams(vector<string>& strs) {
// Handle empty input
if(strs.size() == 0) return vector<vector<string>>();
// Map: character count key -> list of anagrams
unordered_map<string, vector<string>> hm;
int count[26]; // Count array for 26 lowercase letters
for(string& s: strs) {
// Reset count array
fill(begin(count), end(count), 0);
// Count characters in current string
for(char c: s) count[c-'a']++;
// Build key from character counts
string key = "";
for(int i = 0; i < 26; i++) {
key += "#"; // Delimiter
key += to_string(count[i]); // Count for each letter
}
// Add string to appropriate group
if(!hm.contains(key)) hm[key] = vector<string>();
hm[key].push_back(s);
}
// Convert map values to result vector
vector<vector<string>> rtn;
for(auto itr = hm.begin(); itr != hm.end(); itr++) {
rtn.push_back(itr->second);
}
return rtn;
}
Edge Cases
- Empty input:
strs = []→ return[] - Single empty string:
strs = [""]→ return[[""]] - Single character:
strs = ["a"]→ return[["a"]] - All anagrams:
strs = ["eat","tea","ate"]→ return[["eat","tea","ate"]] - No anagrams:
strs = ["abc","def","ghi"]→ return[["abc"],["def"],["ghi"]]
Alternative Approaches
Approach 2: Sorted String Key
Time Complexity: O(N * K log K) where K is the average string length
Space Complexity: O(N * K)
class Solution {
public:
vector<vector<string>> groupAnagrams(vector<string>& strs) {
unordered_map<string, vector<string>> hm;
for(string& s: strs) {
string key = s;
sort(key.begin(), key.end());
hm[key].push_back(s);
}
vector<vector<string>> rtn;
for(auto& [key, group] : hm) {
rtn.push_back(group);
}
return rtn;
}
};
Pros:
- Simpler implementation
- Easier to understand
Cons:
- Slower due to sorting: O(K log K) per string
- Less efficient for long strings
Approach 3: Prime Number Hash (Advanced)
Time Complexity: O(N * K)
Space Complexity: O(N * K)
Uses prime numbers to create hash keys, avoiding string concatenation overhead.
class Solution {
public:
vector<vector<string>> groupAnagrams(vector<string>& strs) {
// Prime numbers for each letter
int primes[26] = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101};
unordered_map<long long, vector<string>> hm;
for(string& s: strs) {
long long key = 1;
for(char c: s) {
key *= primes[c - 'a'];
}
hm[key].push_back(s);
}
vector<vector<string>> rtn;
for(auto& [key, group] : hm) {
rtn.push_back(group);
}
return rtn;
}
};
Pros:
- Fast key generation (multiplication)
- No string concatenation overhead
Cons:
- Risk of integer overflow for very long strings
- More complex to implement
Complexity Analysis
| Approach | Time | Space | Pros | Cons |
|---|---|---|---|---|
| Character Count Key | O(N * K) | O(N * K) | Fast, no sorting | String concatenation overhead |
| Sorted String Key | O(N * K log K) | O(N * K) | Simple, readable | Slower due to sorting |
| Prime Number Hash | O(N * K) | O(N * K) | Very fast key generation | Overflow risk, complex |
Why Character Count Key is Preferred
- Optimal Time Complexity: O(N * K) without sorting overhead
- Predictable Performance: No dependency on string length for key generation
- Memory Efficient: Fixed-size count array (26 integers)
- Robust: Works for any string length without overflow concerns
Implementation Details
Character Count Array
int count[26]; // For 26 lowercase letters a-z
fill(begin(count), end(count), 0); // Reset to zero
// Count characters
for(char c: s) count[c-'a']++; // 'a' maps to index 0, 'z' to 25
Key Construction
string key = "";
for(int i = 0; i < 26; i++) {
key += "#"; // Delimiter prevents ambiguity
key += to_string(count[i]); // Count for letter at position i
}
Why use “#” delimiter?
- Without delimiter: “12” could mean count[0]=1, count[1]=2 OR count[0]=12
- With delimiter: “#1#2” unambiguously means count[0]=1, count[1]=2
C++20 contains() Method
if(!hm.contains(key)) hm[key] = vector<string>();
Alternative (C++11/14):
if(hm.find(key) == hm.end()) hm[key] = vector<string>();
Common Mistakes
- Forgetting to reset count array: Must reset for each string
- Wrong delimiter: Using numbers without delimiter causes key collisions
- Case sensitivity: Assuming uppercase letters (this problem uses lowercase only)
- Empty string handling: Not handling empty input or empty strings correctly
- Inefficient key generation: Using sorting when counting is faster
Optimization Tips
- Pre-allocate result vector: Can reserve space if you know approximate number of groups
- Use emplace_back: More efficient than push_back for strings
- Avoid string concatenation: Character count approach minimizes this overhead
- Early return: Handle empty input immediately
Related Problems
- 242. Valid Anagram - Check if two strings are anagrams
- 438. Find All Anagrams in a String - Find anagram substrings
- 2273. Find Resultant Array After Removing Anagrams - Remove anagrams from array
- 49. Group Anagrams - This problem
Real-World Applications
- Word Games: Grouping words by anagram patterns (Scrabble, Boggle)
- Text Analysis: Finding similar words or patterns in text
- Cryptography: Anagram-based ciphers and puzzles
- Search Engines: Grouping similar search terms
- Data Deduplication: Identifying similar strings
This problem demonstrates the power of using character frequency as a hash key, showing how counting can be more efficient than sorting for certain string problems.