[Medium] 398. Random Pick Index

Given an integer array nums with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.

Implement the Solution class:

Solution(vector<int>& nums) Initializes the object with the array nums.
int pick(int target) Picks a random index i from nums where nums[i] == target. If there are multiple valid i’s, then each index should have an equal probability of returning.

Examples

Example 1:

Input
["Solution", "pick", "pick", "pick"]
[[[1, 2, 3, 3, 3]], [3], [3], [3]]
Output
[null, 4, 4, 2]

Explanation
Solution solution = new Solution([1, 2, 3, 3, 3]);
solution.pick(3); // It should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3); // It should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3); // It should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.

Example 2:

Input
["Solution", "pick", "pick", "pick"]
[[[1, 2, 3, 3, 3]], [1], [3], [3]]
Output
[null, 0, 4, 2]

Explanation
solution.pick(1); // Should return 0. Since there's only one element with value 1 in the array.
solution.pick(3); // Should return either index 2, 3, or 4 randomly.
solution.pick(3); // Should return either index 2, 3, or 4 randomly.

Constraints

1 <= nums.length <= 2 * 10^4
-2^31 <= nums[i] <= 2^31 - 1
target is an integer from nums.
At most 10^4 calls will be made to pick.

Clarification Questions

Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:

Pick operation: What does pick() do? (Assumption: Randomly selects one index where nums[index] == target, returns that index)
Randomness requirement: What is the randomness requirement? (Assumption: All indices with target value should have equal probability of being selected)
Return value: What should pick() return? (Assumption: Integer - randomly selected index where nums[index] == target)
Multiple calls: Are pick() calls independent? (Assumption: Yes - each call is independent random selection)
Target existence: Is target guaranteed to exist? (Assumption: Yes - per constraints, target is an integer from nums)

Interview Deduction Process (20 minutes)

Step 1: Brute-Force Approach (5 minutes)

For each pick(target) call, scan the array to find all indices where nums[i] == target, store them in a list, then randomly select one index from the list. This approach has O(n) time per pick() call, which is inefficient if pick() is called many times.

Step 2: Semi-Optimized Approach (7 minutes)

Preprocess the array once: build a hash map mapping each value to a list of indices where it appears. For pick(target), get the list of indices from the hash map and randomly select one. This reduces pick() to O(1) average time (assuming random selection is O(1)), but requires O(n) space and O(n) preprocessing time.

Step 3: Optimized Solution (8 minutes)

Use hash map preprocessing with reservoir sampling: preprocess to build value -> [indices] mapping. For pick(target), use the hash map to get indices in O(1), then randomly select one index. Alternatively, if we want to avoid preprocessing, we can use reservoir sampling: scan the array once per pick() call, maintaining a random candidate. However, preprocessing is more efficient for multiple calls. The preprocessing approach achieves O(1) average time per pick() call with O(n) preprocessing time and O(n) space, which is optimal for this use case.

Solution: Hash Map Preprocessing (Optimized)

Time Complexity: O(n) preprocessing, O(1) per pick() call
Space Complexity: O(n)

The key insight is to preprocess the array once during construction, storing all indices for each value in a hash map. This allows O(1) random selection per pick() call.

Solution: Hash Map Approach (Optimized for Multiple Picks)

class Solution {
private:
    unordered_map<int, vector<int>> pos;

public:
    Solution(vector<int>& nums) {
        // Preprocess: store all indices for each value
        for(int i = 0; i < (int)nums.size(); i++) {
            pos[nums[i]].push_back(i);
        }
    }
    
    int pick(int target) {
        auto& indices = pos[target];
        // Randomly select one index from the stored indices
        return indices[rand() % indices.size()];
    }
};

How the Algorithm Works

Key Insight: Preprocessing for Fast Lookup

Problem: Need to randomly pick an index where nums[i] == target with equal probability.

Naive Approach: Scan array each time → O(n) per pick() call → TIMEOUT if called many times

Optimized Approach:

Preprocess once: Build hash map of value → list of indices
Each pick() call: O(1) lookup and random selection

Step-by-Step Example: `nums = [1, 2, 3, 3, 3]`

Step 1: Constructor - Build hash map
  pos[1] = [0]
  pos[2] = [1]
  pos[3] = [2, 3, 4]

Step 2: pick(3)
  indices = pos[3] = [2, 3, 4]
  Random index: rand() % 3 could be 0, 1, or 2
  Return indices[random_index] = one of {2, 3, 4}

Step 3: pick(3) again
  Same process, returns a random index from {2, 3, 4}

Visual Representation:

nums = [1, 2, 3, 3, 3]
       0  1  2  3  4

Hash Map:
  1 → [0]
  2 → [1]
  3 → [2, 3, 4]

pick(3): Randomly select from [2, 3, 4]
  - Each index has 1/3 probability
  - Returns 2, 3, or 4 with equal chance

Key Insights

Preprocessing pays off: One-time O(n) cost amortized over many pick() calls
Hash map lookup: O(1) average case for finding indices
Random selection: Use rand() % size for uniform distribution
Space-time tradeoff: Use O(n) space to achieve O(1) time per pick

Algorithm Breakdown

Constructor: Build Index Map

Solution(vector<int>& nums) {
    for(int i = 0; i < (int)nums.size(); i++) {
        pos[nums[i]].push_back(i);
    }
}

Why:

Iterate through array once: O(n) time
For each value, store all its indices
Hash map provides O(1) average lookup time

Pick Function: Random Selection

int pick(int target) {
    auto& indices = pos[target];
    return indices[rand() % indices.size()];
}

Why:

Lookup indices for target: O(1) average case
Random selection: rand() % size gives uniform distribution
Return selected index: O(1) total time

Why Reservoir Sampling Times Out

Solution 1: Reservoir Sampling (O(n) per pick)

class Solution {
    vector<int>& nums;

public:
    Solution(vector<int>& nums) : nums(nums) {}
    
    int pick(int target) {
        int rtn = -1;
        for(int i = 0, cnt = 0; i < (int)nums.size(); ++i) {
            if(nums[i] == target) {
                ++cnt;
                if(rand() % cnt == 0) {
                    rtn = i;
                }
            }
        }
        return rtn;
    }
};

Time Complexity: O(n) per pick() call
Space Complexity: O(1)

Why it times out:

Each pick() call scans entire array: O(n)
If pick() is called k times: O(k × n) total time
With k = 10^4 and n = 2×10^4: 200 million operations → TIMEOUT

When to use:

Single or few pick() calls
Memory-constrained environments
Streaming data (can’t preprocess)

Complexity Comparison

Approach	Constructor	pick()	Total (k calls)	Space	Best For
Hash Map	O(n)	O(1)	O(n + k)	O(n)	Multiple picks
Reservoir Sampling	O(1)	O(n)	O(k × n)	O(1)	Single pick

Key Insight:

Hash map: Amortized cost per pick is O(1) after preprocessing
Reservoir sampling: Per-call cost is O(n), expensive for many calls

Edge Cases

Single occurrence: pick() returns the only index
All same values: All indices have equal probability
Large array: Hash map handles efficiently
Many pick calls: Hash map approach scales better

Alternative Approaches

Approach 2: Reservoir Sampling (For Reference)

Use when: Memory is critical or only few picks needed

class Solution {
    vector<int>& nums;

public:
    Solution(vector<int>& nums) : nums(nums) {}
    
    int pick(int target) {
        int count = 0;
        int result = -1;
        
        for(int i = 0; i < (int)nums.size(); i++) {
            if(nums[i] == target) {
                count++;
                // With probability 1/count, select current index
                if(rand() % count == 0) {
                    result = i;
                }
            }
        }
        
        return result;
    }
};

How Reservoir Sampling Works:

For each occurrence of target, increment count
With probability 1/count, replace current selection
Ensures uniform distribution: each index has 1/k probability (where k = total occurrences)

Proof of Uniformity:

Index 1 selected with probability: 1/1 = 1
Index 2 replaces index 1 with probability: 1/2
Index 3 replaces current with probability: 1/3
Final probability for index k: (1/k) × (k/(k+1)) × … × ((n-1)/n) = 1/n

Approach 3: Two-Pass (Less Efficient)

class Solution {
    vector<int> nums;

public:
    Solution(vector<int>& nums) : nums(nums) {}
    
    int pick(int target) {
        // First pass: collect all indices
        vector<int> indices;
        for(int i = 0; i < nums.size(); i++) {
            if(nums[i] == target) {
                indices.push_back(i);
            }
        }
        
        // Second pass: random selection
        return indices[rand() % indices.size()];
    }
};

Why not optimal:

Rebuilds indices list on each pick() call
O(n) time per call, same as reservoir sampling
No preprocessing benefit

Implementation Details

Why Hash Map is Optimal

For multiple pick() calls:

Preprocessing: O(n) one-time cost
Each pick: O(1) lookup + O(1) random selection
Total for k picks: O(n + k) vs O(k × n) for reservoir sampling

Example:

n = 20,000, k = 10,000 picks

Hash Map:
  Constructor: 20,000 operations
  Picks: 10,000 operations
  Total: 30,000 operations ✓

Reservoir Sampling:
  Constructor: 0 operations
  Picks: 10,000 × 20,000 = 200,000,000 operations ✗ TIMEOUT

Random Number Generation

rand() % indices.size()

Why this works:

rand() returns pseudo-random integer
% size maps to range [0, size-1]
Uniform distribution (assuming good RNG)

Note: For better randomness, consider:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, indices.size() - 1);
return indices[dis(gen)];

Common Mistakes

Using reservoir sampling for many picks: Times out
Not preprocessing: Rebuilding indices each time
Wrong random selection: Not using modulo correctly
Memory concerns: Hash map uses O(n) space, but worth it for speed
Assuming single occurrence: Must handle multiple indices

Optimization Tips

Preprocess in constructor: Build hash map once
Use reference: auto& indices avoids copying
Hash map over array: O(1) lookup vs O(n) scan
Consider when to use each:
- Many picks → Hash map
- Single pick → Reservoir sampling

380. Insert Delete GetRandom O(1) - Similar randomization with hash map
381. Insert Delete GetRandom O(1) - Duplicates allowed - Extension with duplicates
528. Random Pick with Weight - Weighted random selection
710. Random Pick with Blacklist - Random selection with exclusions

Real-World Applications

Load Balancing: Randomly select server from pool
Sampling: Randomly sample data points
Game Development: Random item/spawn selection
Testing: Random test case generation

Pattern Recognition

This problem demonstrates the “Preprocessing for Fast Lookup” pattern:

Identify repeated operations (pick() called many times)
Preprocess data once (build hash map in constructor)
Optimize each operation (O(1) lookup instead of O(n) scan)
Trade space for time (O(n) space for O(1) time)

Why Hash Map Solution is Optimized

Time Analysis:

Constructor: O(n) - one-time cost
pick(): O(1) average case - hash lookup + random selection
Total for k picks: O(n + k) - linear in total operations

Space Analysis:

O(n) - stores all indices (at most n indices total)

Comparison:

Reservoir sampling: O(k × n) time for k picks
Hash map: O(n + k) time for k picks
Improvement: From O(k × n) to O(n + k) = k times faster for k picks

Step-by-Step Trace: `nums = [1, 2, 3, 3, 3]`, `pick(3)` called 3 times

Constructor:
  i=0: nums[0]=1 → pos[1] = [0]
  i=1: nums[1]=2 → pos[2] = [1]
  i=2: nums[2]=3 → pos[3] = [2]
  i=3: nums[3]=3 → pos[3] = [2, 3]
  i=4: nums[4]=3 → pos[3] = [2, 3, 4]

pick(3) call 1:
  indices = pos[3] = [2, 3, 4]
  random = rand() % 3 = 1
  return indices[1] = 3

pick(3) call 2:
  indices = pos[3] = [2, 3, 4]
  random = rand() % 3 = 0
  return indices[0] = 2

pick(3) call 3:
  indices = pos[3] = [2, 3, 4]
  random = rand() % 3 = 2
  return indices[2] = 4

Mathematical Proof: Uniform Distribution

Hash Map Approach:

Each index in indices array has equal probability
rand() % size gives uniform distribution over [0, size-1]
Therefore, each index has probability 1/size ✓

Reservoir Sampling:

For k occurrences, each index i has probability:
- Selected at position i: 1/i
- Not replaced by later indices: (i/(i+1)) × ((i+1)/(i+2)) × ... × ((k-1)/k)
- Product = 1/k ✓

Both approaches guarantee uniform distribution!

This problem demonstrates the importance of choosing the right data structure and algorithm based on usage patterns. For multiple queries, preprocessing with a hash map provides optimal performance.

Robina Li

[Medium] 398. Random Pick Index

[Medium] 398. Random Pick Index

Examples

Constraints

Clarification Questions

Interview Deduction Process (20 minutes)

Solution: Hash Map Preprocessing (Optimized)

Solution: Hash Map Approach (Optimized for Multiple Picks)

How the Algorithm Works

Key Insight: Preprocessing for Fast Lookup

Step-by-Step Example: `nums = [1, 2, 3, 3, 3]`

Key Insights

Algorithm Breakdown

Constructor: Build Index Map

Pick Function: Random Selection

Why Reservoir Sampling Times Out

Solution 1: Reservoir Sampling (O(n) per pick)

Complexity Comparison

Edge Cases

Alternative Approaches

Approach 2: Reservoir Sampling (For Reference)

Approach 3: Two-Pass (Less Efficient)

Implementation Details

Why Hash Map is Optimal

Random Number Generation

Common Mistakes

Optimization Tips

Real-World Applications

Pattern Recognition

Why Hash Map Solution is Optimized

Step-by-Step Trace: `nums = [1, 2, 3, 3, 3]`, `pick(3)` called 3 times

Mathematical Proof: Uniform Distribution

Related Posts

Recent Posts

[Medium] 398. Random Pick Index

Examples

Constraints

Clarification Questions

Interview Deduction Process (20 minutes)

Solution: Hash Map Preprocessing (Optimized)

Solution: Hash Map Approach (Optimized for Multiple Picks)

How the Algorithm Works

Key Insight: Preprocessing for Fast Lookup

Step-by-Step Example: nums = [1, 2, 3, 3, 3]

Key Insights

Algorithm Breakdown

Constructor: Build Index Map

Pick Function: Random Selection

Why Reservoir Sampling Times Out

Solution 1: Reservoir Sampling (O(n) per pick)

Complexity Comparison

Edge Cases

Alternative Approaches

Approach 2: Reservoir Sampling (For Reference)

Approach 3: Two-Pass (Less Efficient)

Implementation Details

Why Hash Map is Optimal

Random Number Generation

Common Mistakes

Optimization Tips

Related Problems

Real-World Applications

Pattern Recognition

Why Hash Map Solution is Optimized

Step-by-Step Trace: nums = [1, 2, 3, 3, 3], pick(3) called 3 times

Mathematical Proof: Uniform Distribution

Related Posts

Recent Posts

Step-by-Step Example: `nums = [1, 2, 3, 3, 3]`

Step-by-Step Trace: `nums = [1, 2, 3, 3, 3]`, `pick(3)` called 3 times