[Medium] 49. Group Anagrams

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Examples

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Example 2:

Input: strs = [""]
Output: [[""]]

Example 3:

Input: strs = ["a"]
Output: [["a"]]

Constraints

  • 1 <= strs.length <= 10^4
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Clarification Questions

Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:

  1. Anagram definition: What is an anagram? (Assumption: Words with same characters in different order - same character frequencies)

  2. Grouping rule: How should we group anagrams? (Assumption: Group strings that are anagrams of each other together)

  3. Return format: What should we return? (Assumption: List of groups - each group contains anagrams)

  4. Order requirement: Does order of groups matter? (Assumption: No - can return in any order)

  5. Single character: Do single characters count as anagrams? (Assumption: Yes - “a” is anagram of “a”)

Interview Deduction Process (20 minutes)

Step 1: Brute-Force Approach (5 minutes)

For each string, compare it with all other strings to check if they are anagrams. To check if two strings are anagrams, sort both strings and compare, or count character frequencies. Group strings that are anagrams together. This approach has O(n² × m log m) complexity where n is the number of strings and m is the average length, which is too slow for large inputs.

Step 2: Semi-Optimized Approach (7 minutes)

Sort each string and use the sorted string as a key to group anagrams. Use a hash map where the key is the sorted string and the value is a list of original strings. This reduces comparison time but still requires sorting each string, giving O(n × m log m) time complexity. This works well but can be optimized further by avoiding sorting.

Step 3: Optimized Solution (8 minutes)

Use character frequency as the key instead of sorting. For each string, count the frequency of each character and create a key (e.g., “a2b1c3” for “aabccc”). Use this key to group anagrams in a hash map. This avoids sorting and achieves O(n × m) time complexity where m is the average string length. Alternatively, use a tuple or array of 26 integers as the key. The key insight is that anagrams have the same character frequencies, so frequency counts serve as a perfect grouping key without the overhead of sorting.

Solution: Character Count Key Approach

Time Complexity: O(N * K) where N is the number of strings and K is the maximum length of a string
Space Complexity: O(N * K) for storing all strings in the hash map

The key insight is to use a character frequency count as the hash map key. Strings with the same character frequencies are anagrams of each other.

Solution: Character Count Key

class Solution:
def groupAnagrams(self, strs):
    if len(strs) == 0) return list[list[str>>(:
    dict[str, list[str>> hm
    count[26]
    for s in strs:
        fill(begin(count), end(count), 0)
        for(char c: s) count[c-'a']++
        str key = ""
        for(i = 0 i < 26 i += 1) :
        key += "#"
        key += to_string(count[i])
    if not key in hm) hm[key] = list[str>(:
    hm[key].append(s)
list[list[str>> rtn
for(itr = hm.begin() itr != hm.end() itr += 1) :
rtn.append(itr.second)
return rtn

How the Algorithm Works

Step-by-Step Example: strs = ["eat","tea","tan","ate","nat","bat"]

Step 1: Process "eat"
  Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0"
  hm[key] = ["eat"]

Step 2: Process "tea"
  Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0" (same as "eat")
  hm[key] = ["eat", "tea"]

Step 3: Process "tan"
  Count: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#1#0#0#0#0#0#0"
  hm[key] = ["tan"]

Step 4: Process "ate"
  Count: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0" (same as "eat")
  hm[key] = ["eat", "tea", "ate"]

Step 5: Process "nat"
  Count: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#1#0#0#0#0#0#0" (same as "tan")
  hm[key] = ["tan", "nat"]

Step 6: Process "bat"
  Count: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
  Key: "#1#1#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#1#0#0#0#0#0#0"
  hm[key] = ["bat"]

Result:
  [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Visual Representation

Input:  ["eat", "tea", "tan", "ate", "nat", "bat"]
         ↓       ↓       ↓       ↓       ↓       ↓
Keys:   key1   key1   key2   key1   key2   key3
         ↓       ↓       ↓       ↓       ↓       ↓
Groups: ["eat","tea","ate"]  ["tan","nat"]  ["bat"]

Key Insights

  1. Character Frequency as Key: Use character count array to create a unique key for each anagram group
  2. Hash Map Grouping: Strings with identical character frequencies map to the same key
  3. Delimiter Usage: Using “#” delimiter ensures keys are unique (e.g., “1#2” vs “12#”)
  4. Efficient Counting: Count array of size 26 (for lowercase letters) is space-efficient

Algorithm Breakdown

def groupAnagrams(self, strs):
    # Handle empty input
    if len(strs) == 0) return list[list[str>>(:
    # Map: character count key . list of anagrams
    dict[str, list[str>> hm
    count[26]  # Count array for 26 lowercase letters
    for s in strs:
        # Reset count array
        fill(begin(count), end(count), 0)
        # Count characters in current str
        for(char c: s) count[c-'a']++
        # Build key from character counts
        str key = ""
        for(i = 0 i < 26 i += 1) :
        key += "#"           # Delimiter
        key += to_string(count[i])  # Count for each letter
    # Add str to appropriate group
    if not key in hm) hm[key] = list[str>(:
    hm[key].append(s)
# Convert map values to result vector
list[list[str>> rtn
for(itr = hm.begin() itr != hm.end() itr += 1) :
rtn.append(itr.second)
return rtn

Edge Cases

  1. Empty input: strs = [] → return []
  2. Single empty string: strs = [""] → return [[""]]
  3. Single character: strs = ["a"] → return [["a"]]
  4. All anagrams: strs = ["eat","tea","ate"] → return [["eat","tea","ate"]]
  5. No anagrams: strs = ["abc","def","ghi"] → return [["abc"],["def"],["ghi"]]

Alternative Approaches

Approach 2: Sorted String Key

Time Complexity: O(N * K log K) where K is the average string length
Space Complexity: O(N * K)

class Solution:
def groupAnagrams(self, strs):
    dict[str, list[str>> hm
    for s in strs:
        str key = s
        key.sort()
        hm[key].append(s)
    list[list[str>> rtn
    for([key, group] : hm) :
    rtn.append(group)
return rtn

Pros:

  • Simpler implementation
  • Easier to understand

Cons:

  • Slower due to sorting: O(K log K) per string
  • Less efficient for long strings

Approach 3: Prime Number Hash (Advanced)

Time Complexity: O(N * K)
Space Complexity: O(N * K)

Uses prime numbers to create hash keys, avoiding string concatenation overhead.

class Solution:
def groupAnagrams(self, strs):
    # Prime numbers for each letter
    primes[26] = :2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101
dict[long long, list[str>> hm
for s in strs:
    long long key = 1
    for c in s:
        key = primes[c - 'a']
    hm[key].append(s)
list[list[str>> rtn
for([key, group] : hm) :
rtn.append(group)
return rtn

Pros:

  • Fast key generation (multiplication)
  • No string concatenation overhead

Cons:

  • Risk of integer overflow for very long strings
  • More complex to implement

Complexity Analysis

Approach Time Space Pros Cons
Character Count Key O(N * K) O(N * K) Fast, no sorting String concatenation overhead
Sorted String Key O(N * K log K) O(N * K) Simple, readable Slower due to sorting
Prime Number Hash O(N * K) O(N * K) Very fast key generation Overflow risk, complex

Why Character Count Key is Preferred

  1. Optimal Time Complexity: O(N * K) without sorting overhead
  2. Predictable Performance: No dependency on string length for key generation
  3. Memory Efficient: Fixed-size count array (26 integers)
  4. Robust: Works for any string length without overflow concerns

Implementation Details

Character Count Array

count[26]  # For 26 lowercase letters a-z
fill(begin(count), end(count), 0)  # Reset to zero
# Count characters
for(char c: s) count[c-'a']++  # 'a' maps to index 0, 'z' to 25

Key Construction

str key = ""
for(i = 0 i < 26 i += 1) :
key += "#"              # Delimiter prevents ambiguity
key += to_string(count[i])  # Count for letter at position i

Why use “#” delimiter?

  • Without delimiter: “12” could mean count[0]=1, count[1]=2 OR count[0]=12
  • With delimiter: “#1#2” unambiguously means count[0]=1, count[1]=2

Python20 contains() Method

if not key in hm) hm[key] = list[str>(:

Alternative (Python11/14):

if hm.find(key) == hm.end()) hm[key] = list[str>(:

Common Mistakes

  1. Forgetting to reset count array: Must reset for each string
  2. Wrong delimiter: Using numbers without delimiter causes key collisions
  3. Case sensitivity: Assuming uppercase letters (this problem uses lowercase only)
  4. Empty string handling: Not handling empty input or empty strings correctly
  5. Inefficient key generation: Using sorting when counting is faster

Optimization Tips

  1. Pre-allocate result vector: Can reserve space if you know approximate number of groups
  2. Use emplace_back: More efficient than push_back for strings
  3. Avoid string concatenation: Character count approach minimizes this overhead
  4. Early return: Handle empty input immediately

Real-World Applications

  1. Word Games: Grouping words by anagram patterns (Scrabble, Boggle)
  2. Text Analysis: Finding similar words or patterns in text
  3. Cryptography: Anagram-based ciphers and puzzles
  4. Search Engines: Grouping similar search terms
  5. Data Deduplication: Identifying similar strings

This problem demonstrates the power of using character frequency as a hash key, showing how counting can be more efficient than sorting for certain string problems.