645. Set Mismatch

Problem Statement

You have a set of integers s, which originally contains all the numbers from 1 to n. Unfortunately, due to some error, one of the numbers in s got duplicated to another number in the set, which results in repetition of one number and loss of another number.

You are given an integer array nums representing the data status of this set after the error.

Find the number that occurs twice and the number that is missing and return them in the form [duplicate, missing].

Examples

Example 1:

Input: nums = [1,2,2,4]
Output: [2,3]
Explanation: 2 is duplicated, 3 is missing.

Example 2:

Input: nums = [1,1]
Output: [1,2]
Explanation: 1 is duplicated, 2 is missing.

Constraints

2 <= nums.length <= 10^4
1 <= nums[i] <= 10^4
Exactly one number appears twice, exactly one is missing.

Clarification Questions

Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:

Array properties: What are the properties of the array? (Assumption: Array contains numbers from 1 to n, with one duplicate and one missing)
Output format: What should we return - duplicate first or missing first? (Assumption: Return [duplicate, missing] - duplicate appears first)
Single duplicate: Is there exactly one duplicate and one missing? (Assumption: Yes - exactly one number appears twice, one number is missing)
Array modification: Can we modify the input array? (Assumption: Typically yes for space optimization, but should clarify)
Range: What is the range of numbers? (Assumption: Numbers are from 1 to n, where n is array length)

Interview Deduction Process (10 minutes)

Step 1: Brute-Force Approach (2 minutes)

Use a hash set to track seen numbers. Iterate through the array, if a number is already in the set, it’s the duplicate. After iteration, find the missing number by checking which number from 1 to n is not in the set. This approach uses O(n) extra space and O(n) time.

Step 2: Semi-Optimized Approach (3 minutes)

Use the array itself for marking: since numbers are from 1 to n, we can use the array indices. For each number, mark the corresponding index as negative (or use a different marking scheme). The duplicate will be detected when we try to mark an already-marked index. The missing number is the index that remains unmarked. This uses O(1) extra space but modifies the input array.

Step 3: Optimized Solution (5 minutes)

Use mathematical approach: calculate the expected sum S = n×(n+1)/2 and actual sum. The difference gives us duplicate - missing. Also calculate expected sum of squares and actual sum of squares. Use these two equations to solve for duplicate and missing. Alternatively, use the marking approach with O(1) space if array modification is allowed. The mathematical approach achieves O(n) time with O(1) space without modifying the array, which is optimal. The key insight is leveraging the properties that we have exactly one duplicate and one missing, allowing us to use mathematical relationships to find both.

Solution Approach

This problem requires finding both a duplicate and a missing number in an array that should contain numbers from 1 to n. There are several approaches:

Mathematical Approach: Use sum and sum of squares to derive equations
Negative Marking: Mark visited indices in-place
Hash Map/Count Array: Count occurrences of each number
XOR Approach: Use bit manipulation

Key Insights:

Mathematical Relations:
- Sum difference: sum(nums) - sum(1..n) = duplicate - missing
- Square sum difference: sum(nums²) - sum(1²..n²) = duplicate² - missing²
- Can solve system of equations
Negative Marking: Use array indices to track visited numbers
XOR: Use XOR properties to find both numbers

Solution 1: Mathematical Approach (Sum & Square Sum)

class Solution {
public:
    vector<int> findErrorNums(vector<int>& nums) {
        const int N = nums.size();
        long long x = 0, y = 0;
        for(int i = 1; i <= N; i++) {
            x += nums[i - 1] - i;
            y += (long long) nums[i - 1] * nums[i - 1] - (long long)i * i;
        }
        int missing = (y - x * x) / (2 * x);
        int duplicate = missing + x;
        return {duplicate, missing};
    }
};

Algorithm Explanation:

Calculate Differences:
- x = sum(nums) - sum(1..n) = duplicate - missing
- y = sum(nums²) - sum(1²..n²) = duplicate² - missing²
Derive Equations:
- From y = duplicate² - missing² = (duplicate - missing)(duplicate + missing)
- We have: duplicate + missing = y / x
- And: duplicate - missing = x
Solve System:
- missing = (y/x - x) / 2 = (y - x²) / (2x)
- duplicate = missing + x

Example Walkthrough:

Input: nums = [1,2,2,4], N = 4

Expected: [1,2,3,4]
Actual:   [1,2,2,4]

Calculate x (sum difference):
  i=1: nums[0] - 1 = 1 - 1 = 0
  i=2: nums[1] - 2 = 2 - 2 = 0
  i=3: nums[2] - 3 = 2 - 3 = -1
  i=4: nums[3] - 4 = 4 - 4 = 0
  x = 0 + 0 + (-1) + 0 = -1
  x = duplicate - missing = -1

Calculate y (square sum difference):
  i=1: 1² - 1² = 0
  i=2: 2² - 2² = 0
  i=3: 2² - 3² = 4 - 9 = -5
  i=4: 4² - 4² = 0
  y = 0 + 0 + (-5) + 0 = -5
  y = duplicate² - missing² = -5

Solve:
  missing = (y - x²) / (2x) = (-5 - 1) / (-2) = -6 / -2 = 3 ✓
  duplicate = missing + x = 3 + (-1) = 2 ✓

Return: [2, 3] ✓

Complexity Analysis:

Time Complexity: O(n)
- Single pass through array to calculate sums
- Constant time arithmetic operations
Space Complexity: O(1)
- Only using a few variables (x, y, missing, duplicate)
- No extra data structures

Solution 2: Negative Marking (In-place)

class Solution {
public:
    vector<int> findErrorNums(vector<int>& nums) {
        int duplicate = -1;
        int n = nums.size();

        // First pass: find duplicate by marking visited indices negative
        for (int x : nums) {
            int idx = abs(x) - 1;  // convert to 0-based index
            if (nums[idx] < 0) {
                // Already negative => this value seen before
                duplicate = abs(x);
            } else {
                nums[idx] = -nums[idx];  // Mark as visited
            }
        }

        // Second pass: find missing number (index with positive value)
        int missing = -1;
        for (int i = 0; i < n; ++i) {
            if (nums[i] > 0) {
                missing = i + 1;
                break;
            }
        }

        return {duplicate, missing};
    }
};

Algorithm Explanation:

First Pass - Find Duplicate:
- For each number x, use abs(x) - 1 as index
- If nums[idx] is already negative, x is duplicate
- Otherwise, mark nums[idx] as negative (visited)
Second Pass - Find Missing:
- Find index i where nums[i] > 0
- Missing number is i + 1

Complexity Analysis:

Time Complexity: O(n) - Two passes
Space Complexity: O(1) - Modifies input array in-place

Solution 3: Hash Map / Count Array

class Solution {
public:
    vector<int> findErrorNums(vector<int>& nums) {
        int n = nums.size();
        vector<int> count(n + 1, 0);
        int duplicate = -1, missing = -1;

        // Count occurrences
        for (int x : nums) {
            count[x]++;
        }

        // Find duplicate (count == 2) and missing (count == 0)
        for (int i = 1; i <= n; ++i) {
            if (count[i] == 2) duplicate = i;
            else if (count[i] == 0) missing = i;
        }

        return {duplicate, missing};
    }
};

Complexity Analysis:

Time Complexity: O(n) - Count then scan
Space Complexity: O(n) - Count array

Solution 4: XOR Approach

class Solution {
public:
    vector<int> findErrorNums(vector<int>& nums) {
        int n = nums.size();
        int xor_all = 0;
        
        // XOR all numbers in nums and 1..n
        for (int i = 0; i < n; i++) {
            xor_all ^= nums[i];
            xor_all ^= (i + 1);
        }
        
        // xor_all = duplicate ^ missing
        // Find rightmost set bit
        int rightmost_bit = xor_all & (-xor_all);
        
        int xor0 = 0, xor1 = 0;
        // Separate numbers into two groups based on rightmost bit
        for (int i = 0; i < n; i++) {
            if (nums[i] & rightmost_bit) {
                xor1 ^= nums[i];
            } else {
                xor0 ^= nums[i];
            }
            if ((i + 1) & rightmost_bit) {
                xor1 ^= (i + 1);
            } else {
                xor0 ^= (i + 1);
            }
        }
        
        // Determine which is duplicate and which is missing
        for (int num : nums) {
            if (num == xor0) {
                return {xor0, xor1};
            }
        }
        return {xor1, xor0};
    }
};

Complexity Analysis:

Time Complexity: O(n) - Multiple passes
Space Complexity: O(1) - Only variables

Key Insights

Mathematical Approach: Elegant solution using sum and square sum differences
Negative Marking: Space-efficient in-place solution
Hash Map: Simple and intuitive, uses extra space
XOR: Bit manipulation approach, more complex but interesting

Edge Cases

Smallest input: nums = [1,1] → [1,2]
Largest input: nums = [1,2,3,...,n-1,n,n] → [n, n+1] (but n+1 > n, so invalid)
Duplicate at start: nums = [2,2,3,4] → [2,1]
Duplicate at end: nums = [1,2,3,3] → [3,4] (but 4 > 3, so invalid)
Missing at start: nums = [2,2,3,4] → [2,1]
Missing at end: nums = [1,1,2,3] → [1,4] (but 4 > 3, so invalid)

Common Mistakes

Integer overflow: Not using long long for square calculations
Index off-by-one: Confusing 0-based vs 1-based indexing
Sign errors: Incorrect handling of negative values in mathematical approach
Division by zero: Not checking if x == 0 (shouldn’t happen per constraints)
Wrong order: Returning [missing, duplicate] instead of [duplicate, missing]

Comparison of Approaches

Approach	Time	Space	Pros	Cons
Mathematical	O(n)	O(1)	Elegant, no modification	Requires careful math, overflow risk
Negative Marking	O(n)	O(1)	In-place, efficient	Modifies input array
Hash Map	O(n)	O(n)	Simple, clear	Uses extra space
XOR	O(n)	O(1)	Bit manipulation	Complex logic

When to Use Each Approach

Mathematical: When you want elegant solution, no array modification
Negative Marking: When space optimization is critical, modification allowed
Hash Map: When clarity is priority, space is available
XOR: When learning bit manipulation techniques

LC 41: First Missing Positive - Find missing positive number
LC 268: Missing Number - Find single missing number
LC 442: Find All Duplicates in an Array - Find all duplicates
LC 448: Find All Numbers Disappeared in an Array - Find all missing numbers
LC 287: Find the Duplicate Number - Find duplicate (no missing)

This problem demonstrates multiple solution approaches including mathematical derivation, in-place marking, and hash-based counting. The mathematical approach is particularly elegant, using sum and square sum differences to solve a system of equations.

Robina Li

645. Set Mismatch

645. Set Mismatch

Problem Statement

Examples

Constraints

Clarification Questions

Interview Deduction Process (10 minutes)

Solution Approach

Key Insights:

Solution 1: Mathematical Approach (Sum & Square Sum)

Algorithm Explanation:

Example Walkthrough:

Complexity Analysis:

Solution 2: Negative Marking (In-place)

Algorithm Explanation:

Complexity Analysis:

Solution 3: Hash Map / Count Array

Complexity Analysis:

Solution 4: XOR Approach

Complexity Analysis:

Key Insights

Edge Cases

Common Mistakes

Comparison of Approaches

When to Use Each Approach

Related Posts

Recent Posts

645. Set Mismatch

Problem Statement

Examples

Constraints

Clarification Questions

Interview Deduction Process (10 minutes)

Solution Approach

Key Insights:

Solution 1: Mathematical Approach (Sum & Square Sum)

Algorithm Explanation:

Example Walkthrough:

Complexity Analysis:

Solution 2: Negative Marking (In-place)

Algorithm Explanation:

Complexity Analysis:

Solution 3: Hash Map / Count Array

Complexity Analysis:

Solution 4: XOR Approach

Complexity Analysis:

Key Insights

Edge Cases

Common Mistakes

Comparison of Approaches

When to Use Each Approach

Related Problems

Related Posts

Recent Posts