692. Top K Frequent Words

Problem Statement

Given an array of strings words and an integer k, return the k most frequent strings.

Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.

Examples

Example 1:

Input: words = ["i","love","leetcode","i","love","coding"], k = 2
Output: ["i","love"]
Explanation: "i" and "love" are the two most frequent words.
Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:

Input: words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 2
Output: ["the","is"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.

Constraints

1 <= words.length <= 500
1 <= words[i].length <= 10
words[i] consists of lowercase English letters.
k is in the range [1, The number of unique words[i]]

Clarification Questions

Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:

Tie-breaking: When words have the same frequency, how should we break ties? (Assumption: Sort lexicographically - alphabetical order)
Frequency calculation: How is frequency calculated? (Assumption: Count occurrences of each word in the input array)
Case sensitivity: Are word comparisons case-sensitive? (Assumption: Based on constraints, only lowercase letters, so case doesn’t matter)
Output format: Should we return k words or all words with same frequency? (Assumption: Return exactly k words - top k most frequent)
Word uniqueness: Can the same word appear multiple times in result? (Assumption: No - each word appears once in the result, sorted by frequency then lexicographically)

Interview Deduction Process (20 minutes)

Step 1: Brute-Force Approach (5 minutes)

Initial Thought: “I need to find top k frequent words. Let me count frequencies and sort.”

Naive Solution: Count frequency of each word, sort by frequency (descending) then lexicographically, return top k.

Complexity: O(n log n) time, O(n) space

Issues:

O(n log n) time when O(n log k) is possible
Sorts all words when only need top k
Doesn’t leverage heap
Can be optimized

Step 2: Semi-Optimized Approach (7 minutes)

Insight: “I can use min-heap of size k to track top k words, with custom comparator.”

Improved Solution: Count frequencies, use min-heap of size k with custom comparator (frequency then lexicographic). For each word, if heap size < k, add; else if word is more frequent or same frequency but lexicographically smaller, replace top.

Complexity: O(n log k) time, O(n) space

Improvements:

O(n log k) time is better than O(n log n)
Heap efficiently maintains top k
Custom comparator handles tie-breaking
Can optimize further

Step 3: Optimized Solution (8 minutes)

Final Optimization: “Min-heap with custom comparator is optimal. Sort result at end for correct order.”

Best Solution: Min-heap approach is optimal. Count frequencies, use min-heap of size k. After processing, extract and sort heap elements for final result (frequency descending, then lexicographic).

Complexity: O(n log k) time, O(n) space

Key Realizations:

Heap is perfect for top-k problems
Custom comparator handles tie-breaking
O(n log k) time is optimal for heap approach
Final sort ensures correct order

Solution Approach

This problem is similar to LC 347: Top K Frequent Elements, but with two key differences:

Strings instead of integers: Need to handle string comparison
Lexicographic ordering: When frequencies are equal, sort alphabetically

Key Insights:

Frequency Counting: Use hash map to count word frequencies
Custom Sorting: Sort by frequency (descending), then by lexicographic order (ascending)
Top K Selection: Return only the first k elements after sorting

Algorithm:

Count frequencies: Use unordered_map to count occurrences of each word
Collect unique words: Extract all unique words from the map
Sort: Sort by frequency (descending), then lexicographically (ascending)
Return top k: Take first k elements

Solution

Solution: Hash Map + Custom Sorting

class Solution:
def topKFrequent(self, words, k):
    dict[str, int> cnt
    for word in words:
        cnt[word]++
    list[str> rtn
    for([key, value]: cnt) :
    rtn.emplace_back(key)
sort(rtn.begin(), rtn.end(), [](str a, str b):
(a < b if             return cnt[a] == cnt[b] else cnt[a] > cnt[b])
)
rtn.erase(rtn.begin() + k, rtn.end())
return rtn

Algorithm Explanation:

Count Frequencies (Lines 3-6):
- Create unordered_map<string, int> to count word occurrences
- Iterate through all words and increment their counts
Collect Unique Words (Lines 7-10):
- Extract all unique words (keys) from the frequency map
- Store them in result vector rtn
Custom Sort (Lines 11-13):
- Primary sort: By frequency (descending) - cnt[a] > cnt[b]
- Secondary sort: By lexicographic order (ascending) - a < b when frequencies are equal
- Uses lambda with capture-by-reference [&] to access cnt map
Return Top K (Lines 14-15):
- Erase elements after index k to keep only top k
- Return the result

Why This Works:

Hash map counting: Efficiently counts frequencies in O(n) time
Custom comparator: Handles both frequency and lexicographic ordering
Sorting approach: Simple and straightforward for small input sizes

Example Walkthrough:

For words = ["i","love","leetcode","i","love","coding"], k = 2:

Step 1: Count frequencies
cnt = {
    "i": 2,
    "love": 2,
    "leetcode": 1,
    "coding": 1
}

Step 2: Collect unique words
rtn = ["i", "love", "leetcode", "coding"]

Step 3: Sort with custom comparator
Compare pairs:
  - "i" vs "love": cnt["i"] == cnt["love"] (both 2) → "i" < "love" → "i" comes first
  - "leetcode" vs "coding": cnt["leetcode"] == cnt["coding"] (both 1) → "coding" < "leetcode" → "coding" comes first
  - "i"/"love" vs "leetcode"/"coding": cnt["i"] > cnt["leetcode"] → higher frequency comes first

After sorting:
rtn = ["i", "love", "coding", "leetcode"]

Step 4: Take top k = 2
rtn = ["i", "love"]

Result: ["i", "love"]

Complexity Analysis:

Time Complexity: O(n + m log m) where n is total words, m is unique words
- O(n) for frequency counting
- O(m) for collecting unique words
- O(m log m) for sorting m unique words
- O(k) for erasing (can be optimized to O(1) by using resize)
Space Complexity: O(m) where m is number of unique words
- O(m) for frequency map
- O(m) for result vector

Optimization Note:

Instead of erase, we could use resize(k) which is more efficient:

rtn.resize(k)
return rtn

Alternative Approaches

Approach 2: Min Heap (Better for Large k)

For cases where k is much smaller than the number of unique words, a min heap approach would be more efficient:

class Solution:
def topKFrequent(self, words, k):
    dict[str, int> cnt
    for word in words:
        cnt[word]++
    cmp = [](str a, str b) :
    (a > b if             return cnt[a] == cnt[b] * else cnt[a] < cnt[b])
heapq[str, list[str>, decltype(cmp)> pq(cmp)
for([word, freq]: cnt) :
pq.push(word)
if len(pq) > k) pq.pop(:
list[str> rtn
while not not pq:
    rtn.append(pq.top())
    pq.pop()
rtn.reverse()
return rtn

Time Complexity: O(n + m log k) where m is unique words
Space Complexity: O(m + k)

Approach 3: Bucket Sort

Similar to LC 347, but requires additional sorting within each bucket:

class Solution:
def topKFrequent(self, words, k):
    dict[str, int> cnt
    for word in words:
        cnt[word]++
    maxFreq = 0
    for([word, freq]: cnt) :
    maxFreq = max(maxFreq, freq)
list[list[str>> buckets(maxFreq + 1)
for([word, freq]: cnt) :
buckets[freq].append(word)
list[str> rtn
for(i = maxFreq i >= 1  and  len(rtn) < k i -= 1) :
sort(buckets[i].begin(), buckets[i].end())
for word in buckets[i]:
    rtn.append(word)
    if(len(rtn) == k) break
return rtn

Time Complexity: O(n + m log m) in worst case
Space Complexity: O(m)

Key Insights

Custom Comparator: The key is the two-level sorting: frequency first, then lexicographic order
Hash Map Efficiency: unordered_map provides O(1) average case for frequency counting
Sorting Trade-off: Simple sorting works well for small inputs; heap is better for large k
Lexicographic Order: When frequencies are equal, use standard string comparison (<)

Edge Cases

All words have same frequency: Sort purely by lexicographic order
k equals number of unique words: Return all words sorted
Single word repeated: Return that word
Different frequencies, same lexicographic prefix: Frequency takes priority

LC 347: Top K Frequent Elements - Similar problem with integers
LC 215: Kth Largest Element in an Array - Kth largest element
LC 451: Sort Characters By Frequency - Sort by frequency
LC 973: K Closest Points to Origin - Top K with custom ordering

This problem demonstrates the importance of custom comparators for multi-criteria sorting, combining frequency-based and lexicographic ordering.

Robina Li

692. Top K Frequent Words

692. Top K Frequent Words

Problem Statement

Examples

Constraints

Clarification Questions

Interview Deduction Process (20 minutes)

Step 1: Brute-Force Approach (5 minutes)

Step 2: Semi-Optimized Approach (7 minutes)

Step 3: Optimized Solution (8 minutes)

Solution Approach

Key Insights:

Algorithm:

Solution

Solution: Hash Map + Custom Sorting

Algorithm Explanation:

Why This Works:

Example Walkthrough:

Complexity Analysis:

Optimization Note:

Alternative Approaches

Approach 2: Min Heap (Better for Large k)

Approach 3: Bucket Sort

Key Insights

Edge Cases

Related Posts

Recent Posts

692. Top K Frequent Words

Problem Statement

Examples

Constraints

Clarification Questions

Interview Deduction Process (20 minutes)

Step 1: Brute-Force Approach (5 minutes)

Step 2: Semi-Optimized Approach (7 minutes)

Step 3: Optimized Solution (8 minutes)

Solution Approach

Key Insights:

Algorithm:

Solution

Solution: Hash Map + Custom Sorting

Algorithm Explanation:

Why This Works:

Example Walkthrough:

Complexity Analysis:

Optimization Note:

Alternative Approaches

Approach 2: Min Heap (Better for Large k)

Approach 3: Bucket Sort

Key Insights

Edge Cases

Related Problems

Related Posts

Recent Posts