645. Set Mismatch
645. Set Mismatch
Problem Statement
You have a set of integers s, which originally contains all the numbers from 1 to n. Unfortunately, due to some error, one of the numbers in s got duplicated to another number in the set, which results in repetition of one number and loss of another number.
You are given an integer array nums representing the data status of this set after the error.
Find the number that occurs twice and the number that is missing and return them in the form [duplicate, missing].
Examples
Example 1:
Input: nums = [1,2,2,4]
Output: [2,3]
Explanation: 2 is duplicated, 3 is missing.
Example 2:
Input: nums = [1,1]
Output: [1,2]
Explanation: 1 is duplicated, 2 is missing.
Constraints
2 <= nums.length <= 10^41 <= nums[i] <= 10^4- Exactly one number appears twice, exactly one is missing.
Clarification Questions
Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:
-
Array properties: What are the properties of the array? (Assumption: Array contains numbers from 1 to n, with one duplicate and one missing)
-
Output format: What should we return - duplicate first or missing first? (Assumption: Return [duplicate, missing] - duplicate appears first)
-
Single duplicate: Is there exactly one duplicate and one missing? (Assumption: Yes - exactly one number appears twice, one number is missing)
-
Array modification: Can we modify the input array? (Assumption: Typically yes for space optimization, but should clarify)
-
Range: What is the range of numbers? (Assumption: Numbers are from 1 to n, where n is array length)
Interview Deduction Process (10 minutes)
Step 1: Brute-Force Approach (2 minutes)
Use a hash set to track seen numbers. Iterate through the array, if a number is already in the set, it’s the duplicate. After iteration, find the missing number by checking which number from 1 to n is not in the set. This approach uses O(n) extra space and O(n) time.
Step 2: Semi-Optimized Approach (3 minutes)
Use the array itself for marking: since numbers are from 1 to n, we can use the array indices. For each number, mark the corresponding index as negative (or use a different marking scheme). The duplicate will be detected when we try to mark an already-marked index. The missing number is the index that remains unmarked. This uses O(1) extra space but modifies the input array.
Step 3: Optimized Solution (5 minutes)
Use mathematical approach: calculate the expected sum S = n×(n+1)/2 and actual sum. The difference gives us duplicate - missing. Also calculate expected sum of squares and actual sum of squares. Use these two equations to solve for duplicate and missing. Alternatively, use the marking approach with O(1) space if array modification is allowed. The mathematical approach achieves O(n) time with O(1) space without modifying the array, which is optimal. The key insight is leveraging the properties that we have exactly one duplicate and one missing, allowing us to use mathematical relationships to find both.
Solution Approach
This problem requires finding both a duplicate and a missing number in an array that should contain numbers from 1 to n. There are several approaches:
- Mathematical Approach: Use sum and sum of squares to derive equations
- Negative Marking: Mark visited indices in-place
- Hash Map/Count Array: Count occurrences of each number
- XOR Approach: Use bit manipulation
Key Insights:
- Mathematical Relations:
- Sum difference:
sum(nums) - sum(1..n) = duplicate - missing - Square sum difference:
sum(nums²) - sum(1²..n²) = duplicate² - missing² - Can solve system of equations
- Sum difference:
- Negative Marking: Use array indices to track visited numbers
- XOR: Use XOR properties to find both numbers
Solution 1: Mathematical Approach (Sum & Square Sum)
class Solution {
public:
vector<int> findErrorNums(vector<int>& nums) {
const int N = nums.size();
long long x = 0, y = 0;
for(int i = 1; i <= N; i++) {
x += nums[i - 1] - i;
y += (long long) nums[i - 1] * nums[i - 1] - (long long)i * i;
}
int missing = (y - x * x) / (2 * x);
int duplicate = missing + x;
return {duplicate, missing};
}
};
Algorithm Explanation:
- Calculate Differences:
x = sum(nums) - sum(1..n) = duplicate - missingy = sum(nums²) - sum(1²..n²) = duplicate² - missing²
- Derive Equations:
- From
y = duplicate² - missing² = (duplicate - missing)(duplicate + missing) - We have:
duplicate + missing = y / x - And:
duplicate - missing = x
- From
- Solve System:
missing = (y/x - x) / 2 = (y - x²) / (2x)duplicate = missing + x
Example Walkthrough:
Input: nums = [1,2,2,4], N = 4
Expected: [1,2,3,4]
Actual: [1,2,2,4]
Calculate x (sum difference):
i=1: nums[0] - 1 = 1 - 1 = 0
i=2: nums[1] - 2 = 2 - 2 = 0
i=3: nums[2] - 3 = 2 - 3 = -1
i=4: nums[3] - 4 = 4 - 4 = 0
x = 0 + 0 + (-1) + 0 = -1
x = duplicate - missing = -1
Calculate y (square sum difference):
i=1: 1² - 1² = 0
i=2: 2² - 2² = 0
i=3: 2² - 3² = 4 - 9 = -5
i=4: 4² - 4² = 0
y = 0 + 0 + (-5) + 0 = -5
y = duplicate² - missing² = -5
Solve:
missing = (y - x²) / (2x) = (-5 - 1) / (-2) = -6 / -2 = 3 ✓
duplicate = missing + x = 3 + (-1) = 2 ✓
Return: [2, 3] ✓
Complexity Analysis:
- Time Complexity: O(n)
- Single pass through array to calculate sums
- Constant time arithmetic operations
- Space Complexity: O(1)
- Only using a few variables (x, y, missing, duplicate)
- No extra data structures
Solution 2: Negative Marking (In-place)
class Solution {
public:
vector<int> findErrorNums(vector<int>& nums) {
int duplicate = -1;
int n = nums.size();
// First pass: find duplicate by marking visited indices negative
for (int x : nums) {
int idx = abs(x) - 1; // convert to 0-based index
if (nums[idx] < 0) {
// Already negative => this value seen before
duplicate = abs(x);
} else {
nums[idx] = -nums[idx]; // Mark as visited
}
}
// Second pass: find missing number (index with positive value)
int missing = -1;
for (int i = 0; i < n; ++i) {
if (nums[i] > 0) {
missing = i + 1;
break;
}
}
return {duplicate, missing};
}
};
Algorithm Explanation:
- First Pass - Find Duplicate:
- For each number
x, useabs(x) - 1as index - If
nums[idx]is already negative,xis duplicate - Otherwise, mark
nums[idx]as negative (visited)
- For each number
- Second Pass - Find Missing:
- Find index
iwherenums[i] > 0 - Missing number is
i + 1
- Find index
Complexity Analysis:
- Time Complexity: O(n) - Two passes
- Space Complexity: O(1) - Modifies input array in-place
Solution 3: Hash Map / Count Array
class Solution {
public:
vector<int> findErrorNums(vector<int>& nums) {
int n = nums.size();
vector<int> count(n + 1, 0);
int duplicate = -1, missing = -1;
// Count occurrences
for (int x : nums) {
count[x]++;
}
// Find duplicate (count == 2) and missing (count == 0)
for (int i = 1; i <= n; ++i) {
if (count[i] == 2) duplicate = i;
else if (count[i] == 0) missing = i;
}
return {duplicate, missing};
}
};
Complexity Analysis:
- Time Complexity: O(n) - Count then scan
- Space Complexity: O(n) - Count array
Solution 4: XOR Approach
class Solution {
public:
vector<int> findErrorNums(vector<int>& nums) {
int n = nums.size();
int xor_all = 0;
// XOR all numbers in nums and 1..n
for (int i = 0; i < n; i++) {
xor_all ^= nums[i];
xor_all ^= (i + 1);
}
// xor_all = duplicate ^ missing
// Find rightmost set bit
int rightmost_bit = xor_all & (-xor_all);
int xor0 = 0, xor1 = 0;
// Separate numbers into two groups based on rightmost bit
for (int i = 0; i < n; i++) {
if (nums[i] & rightmost_bit) {
xor1 ^= nums[i];
} else {
xor0 ^= nums[i];
}
if ((i + 1) & rightmost_bit) {
xor1 ^= (i + 1);
} else {
xor0 ^= (i + 1);
}
}
// Determine which is duplicate and which is missing
for (int num : nums) {
if (num == xor0) {
return {xor0, xor1};
}
}
return {xor1, xor0};
}
};
Complexity Analysis:
- Time Complexity: O(n) - Multiple passes
- Space Complexity: O(1) - Only variables
Key Insights
- Mathematical Approach: Elegant solution using sum and square sum differences
- Negative Marking: Space-efficient in-place solution
- Hash Map: Simple and intuitive, uses extra space
- XOR: Bit manipulation approach, more complex but interesting
Edge Cases
- Smallest input:
nums = [1,1]→[1,2] - Largest input:
nums = [1,2,3,...,n-1,n,n]→[n, n+1](but n+1 > n, so invalid) - Duplicate at start:
nums = [2,2,3,4]→[2,1] - Duplicate at end:
nums = [1,2,3,3]→[3,4](but 4 > 3, so invalid) - Missing at start:
nums = [2,2,3,4]→[2,1] - Missing at end:
nums = [1,1,2,3]→[1,4](but 4 > 3, so invalid)
Common Mistakes
- Integer overflow: Not using
long longfor square calculations - Index off-by-one: Confusing 0-based vs 1-based indexing
- Sign errors: Incorrect handling of negative values in mathematical approach
- Division by zero: Not checking if
x == 0(shouldn’t happen per constraints) - Wrong order: Returning
[missing, duplicate]instead of[duplicate, missing]
Comparison of Approaches
| Approach | Time | Space | Pros | Cons |
|---|---|---|---|---|
| Mathematical | O(n) | O(1) | Elegant, no modification | Requires careful math, overflow risk |
| Negative Marking | O(n) | O(1) | In-place, efficient | Modifies input array |
| Hash Map | O(n) | O(n) | Simple, clear | Uses extra space |
| XOR | O(n) | O(1) | Bit manipulation | Complex logic |
When to Use Each Approach
- Mathematical: When you want elegant solution, no array modification
- Negative Marking: When space optimization is critical, modification allowed
- Hash Map: When clarity is priority, space is available
- XOR: When learning bit manipulation techniques
Related Problems
- LC 41: First Missing Positive - Find missing positive number
- LC 268: Missing Number - Find single missing number
- LC 442: Find All Duplicates in an Array - Find all duplicates
- LC 448: Find All Numbers Disappeared in an Array - Find all missing numbers
- LC 287: Find the Duplicate Number - Find duplicate (no missing)
This problem demonstrates multiple solution approaches including mathematical derivation, in-place marking, and hash-based counting. The mathematical approach is particularly elegant, using sum and square sum differences to solve a system of equations.