NP-Hard Introduction: The Subset Sum Problem
Introduction
The Subset Sum Problem is a fundamental number problem that serves as an excellent example of NP-completeness. Unlike the graph problems we’ve explored (Clique, Independent Set, Vertex Cover), Subset Sum deals with numbers and arithmetic, providing a different perspective on NP-completeness. It’s particularly interesting because it has a pseudo-polynomial time dynamic programming solution, making it a great example of the distinction between weakly NP-complete and strongly NP-complete problems.
What is Subset Sum?
The Subset Sum Problem asks: Given a set of integers and a target sum, does there exist a subset whose elements sum to exactly the target?
Problem Definition
Subset Sum Decision Problem:
Input:
- A set of integers S = {a_1, a_2, …, a_n}
- A target integer t
Output: YES if there exists a subset S’ subseteq S such that ∑_{a_i ∈ S’} a_i = t, NO otherwise
Subset Sum Optimization Problem:
Input: A set of integers S = {a_1, a_2, …, a_n} and target t
Output: The maximum sum ≤ t achievable by any subset, or the subset itself
Example
Consider the set S = {3, 34, 4, 12, 5, 2} and target t = 9.
- Subset {4, 5} sums to 4 + 5 = 9 ✓
- Subset {3, 4, 2} sums to 3 + 4 + 2 = 9 ✓
- Subset {12} sums to 12 > 9 ✗
- Subset {3, 5} sums to 8 < 9 ✗
So the answer is YES - there exists a subset that sums to exactly 9.
Visual Example
For S = {2, 3, 7, 8, 10} and t = 11:
- Try {2, 3, 7} = 12 (too large)
- Try {3, 8} = 11 ✓ (found it!)
- Try {2, 3, 6} (6 not in set)
- Try {11} (11 not in set)
Answer: YES, subset {3, 8} sums to 11.
Why Subset Sum is in NP
To show that Subset Sum is NP-complete, we first need to show it’s in NP.
Subset Sum ∈ NP:
Given a candidate solution (a subset S’), we can verify in polynomial time:
- Check that S’ subseteq S: O(n) time
- Sum the elements: O(n) time
- Check if sum equals t: O(1) time
Total verification time: O(n), which is polynomial in the input size. Therefore, Subset Sum is in NP.
NP-Completeness: Reduction from Partition
The most common proof that Subset Sum is NP-complete reduces from the Partition Problem.
Partition Problem
Partition Problem:
- Input: A set of integers A = {a_1, a_2, …, a_n}
- Output: YES if A can be partitioned into two subsets with equal sum, NO otherwise
Partition is known to be NP-complete (can be proven by reduction from 3-SAT or Subset Sum itself).
Reduction from Partition to Subset Sum
Reduction:
- Given a Partition instance: set A = {a₁, a₂, …, aₙ}
- Compute s = ∑ᵢ₌₁ⁿ aᵢ (total sum)
- Return Subset Sum instance: set A and target t = s/2
Example:
Consider a Partition instance:
- Set A = {3, 1, 4, 2, 2}
- Total sum: s = 3 + 1 + 4 + 2 + 2 = 12
- Target for partition: s/2 = 6
Reduction to Subset Sum:
- Subset Sum instance: set A = {3, 1, 4, 2, 2}, target t = 6
Verification:
Partition → Subset Sum:
- Partition solution: {3, 1, 2} and {4, 2} (both sum to 6)
- Subset Sum solution: {3, 1, 2} sums to 6 ✓
- Therefore, Subset Sum returns YES
Subset Sum → Partition:
- Subset Sum solution: {3, 1, 2} sums to 6
- Complement: {4, 2} sums to 12 - 6 = 6 ✓
- Therefore, Partition returns YES (partition: {3, 1, 2} and {4, 2})
Correctness:
- If Partition has a solution, then A can be split into two subsets each summing to s/2
- Therefore, one of these subsets sums to s/2, so Subset Sum with target s/2 returns YES
- If Subset Sum with target s/2 returns YES, then there’s a subset summing to s/2
- The complement of this subset sums to s - s/2 = s/2, so Partition returns YES
Polynomial Time:
- Computing s takes O(n) time
- The reduction is trivial
Therefore, Subset Sum is NP-complete.
Alternative Reduction: Direct from 3-SAT
We can also prove Subset Sum is NP-complete by directly reducing from 3-SAT.
Construction
For a 3-SAT formula φ = C_1 ∧ C_2 ∧ … ∧ C_m with variables x₁, x₂, …, x_n:
Key Idea: Encode the satisfiability constraints using numbers in base 10 (or any base > number of clauses).
- Create numbers:
- For each variable x_i, create two numbers: v_i (for x_i = TRUE) and ̅{v_i} (for x_i = FALSE)
- Each number has n + m digits:
- First n digits: 1 in position i (for variable x_i), 0 elsewhere
- Last m digits: 1 in position j if the literal appears in clause C_j, 0 otherwise
- Create target number t:
- First n digits: all 1’s (each variable must be assigned TRUE or FALSE)
- Last m digits: all 1’s (each clause must be satisfied)
- Set S: All v_i and ̅{v_i} numbers
Why This Works
Intuition:
- The first n digits ensure we pick exactly one literal per variable (either v_i or ̅{v_i})
- The last m digits ensure each clause has at least one true literal
- A subset summing to t corresponds to a satisfying assignment
Formal Proof:
Forward Direction (3-SAT satisfiable → Subset Sum solvable):
- If φ is satisfiable, pick v_i if x_i = TRUE, else pick ̅{v_i}
- Sum these numbers: first n digits sum to all 1’s, last m digits sum to at least all 1’s (each clause satisfied)
- Add slack numbers (0-padded) if needed to reach exactly t
Reverse Direction (Subset Sum solvable → 3-SAT satisfiable):
- If subset sums to t, the first n digits force exactly one choice per variable
- The last m digits being all 1’s means each clause has at least one true literal
- This gives a satisfying assignment
Pseudo-Polynomial Dynamic Programming Solution
Despite being NP-complete, Subset Sum has a pseudo-polynomial time dynamic programming solution. This makes it weakly NP-complete (as opposed to strongly NP-complete problems that remain hard even when numbers are polynomially bounded).
DP Algorithm
Subproblem: dp[i][s] = true if there exists a subset of {a_1, a_2, …, a_i} that sums to exactly s.
Recurrence:
- Base case: dp[0][0] = true, dp[0][s] = false for s > 0
- Recurrence: dp[i][s] = dp[i-1][s] ∨ dp[i-1][s - a_i] (if s ≥ a_i)
- Either don’t include a_i: dp[i-1][s]
- Or include a_i: dp[i-1][s - a_i]
Algorithm:
Algorithm: SubsetSumDP(S, t)
1. n = `|S|`
2. Let dp[0..n][0..t] be a 2D boolean array
3. dp[0][0] = true
4. for i = 1 to n:
5. for s = 0 to t:
6. dp[i][s] = dp[i-1][s]
7. if s >= a_i:
8. dp[i][s] = dp[i][s] OR dp[i-1][s - a_i]
9. return dp[n][t]
Time Complexity: O(n · t) Space Complexity: O(n · t) (can be optimized to O(t) using 1D array)
Why “Pseudo-Polynomial”?
- The algorithm runs in O(n · t) time
- If t is exponential in n (e.g., t = 2^n), then O(n · t) = O(n · 2^n) is exponential
- If t is polynomial in n (e.g., t = n^2), then O(n · t) = O(n^3) is polynomial
- The complexity depends on the value of t, not just its representation size
- Since t requires log t bits to represent, if t is exponential, log t is polynomial, but t itself is exponential
Space Optimization
We can optimize space to O(t):
Algorithm: SubsetSumDPOptimized(S, t)
1. Let dp[0..t] be a boolean array
2. dp[0] = true
3. for i = 1 to n:
4. for s = t down to a_i: // iterate backwards!
5. dp[s] = dp[s] OR dp[s - a_i]
6. return dp[t]
Key: Iterate backwards to avoid using updated values in the same iteration.
Relationship to Other Problems
The Subset Sum Problem is closely related to several other important problems:
Partition
As we saw:
- Partition reduces to Subset Sum (set target to half the total sum)
- Subset Sum can also reduce to Partition (add elements to make partition work)
- They are polynomially equivalent
Knapsack
0/1 Knapsack Problem:
- Items with weights and values
- Maximize value subject to weight constraint
- Subset Sum is a special case where weight = value and we want exact sum
Reduction:
- Subset Sum instance: S = {a_1, …, a_n}, target t
- Knapsack instance: n items, item i has weight a_i and value a_i, capacity t, maximize value
- Subset Sum solvable ↔ Knapsack optimal value = t
Coin Change
Coin Change Problem:
- Given coin denominations and target amount
- Can we make exact change?
- Subset Sum is a special case (unlimited coins would be different)
Number Partitioning
Related to Partition, asking if numbers can be partitioned into k equal-sum subsets.
Practical Implications
Why Subset Sum Matters
The Subset Sum Problem is NP-complete, which means:
- No Known Polynomial-Time Algorithm: For the general case with arbitrary numbers
- Pseudo-Polynomial Solution: Dynamic programming works when target is small
- Practical Algorithms: DP is very efficient for practical instances with bounded targets
- Approximation: Can use greedy or other heuristics for optimization version
Real-World Applications
Subset Sum has numerous applications:
- Resource Allocation: Allocating resources to match a target exactly
- Budget Planning: Finding combinations that sum to a budget
- Cryptography: Some cryptographic schemes rely on Subset Sum hardness
- Scheduling: Allocating time slots to match requirements
- Financial Planning: Portfolio selection, investment matching
- Data Analysis: Finding subsets with specific aggregate properties
Runtime Analysis
Brute Force Approach
Algorithm: Try all 2ⁿ subsets
- Time Complexity: O(2^n · n)
- Space Complexity: O(n) for storing current subset
- Analysis: For each subset, sum elements (O(n) time)
Dynamic Programming (Pseudo-Polynomial)
Algorithm: DP table dp[i][s] = true if sum s achievable with first i elements
- Time Complexity: O(n · t) where t is target sum
- Space Complexity: O(n · t) (can be optimized to O(t))
- Subproblem: dp[i][s] = dp[i-1][s] ∨ dp[i-1][s-a_i] (if s ≥ a_i)
- Why Pseudo-Polynomial: Depends on value of t, not just input size. If t = 2^n, then O(n · 2^n) is exponential.
Space-Optimized DP
Algorithm: Use 1D array, iterate backwards
- Time Complexity: O(n · t)
- Space Complexity: O(t)
- Key: Iterate backwards to avoid using updated values in same iteration
Meet-in-the-Middle
Algorithm: Split array in half, find all sums for each half, combine
- Time Complexity: O(2^{n/2} · n)
- Space Complexity: O(2^{n/2})
- Improvement: Reduces exponential factor from 2ⁿ to 2^{n/2}
Approximation Algorithms
Greedy Approach:
- Time Complexity: O(n log n) (sort first)
- Space Complexity: O(n)
- Approximation: No guaranteed ratio, but works well in practice
Verification Complexity
Given a candidate subset:
- Time Complexity: O(n) - sum elements and compare to target
- Space Complexity: O(1) additional space
- This polynomial-time verifiability shows Subset Sum is in NP
When DP Works Well
The dynamic programming solution is practical when:
- Target t is relatively small (e.g., t ≤ 10^6)
- Numbers are bounded
- Need exact solutions (not approximations)
When DP Fails
The DP solution becomes impractical when:
- Target t is very large (exponential in n)
- Numbers are very large
- Need to handle floating-point numbers (requires different approach)
Key Takeaways
- Subset Sum is NP-Complete: Proven by reduction from Partition or directly from 3-SAT
- Weakly NP-Complete: Has a pseudo-polynomial time DP solution, distinguishing it from strongly NP-complete problems
- Dynamic Programming: O(n · t) time solution works well when target is bounded
- Space Optimization: Can reduce space from O(n · t) to O(t) with careful iteration
- Related Problems: Connected to Partition, Knapsack, and Coin Change problems
Reduction Summary
Partition ≤ₚ Subset Sum:
- Given Partition instance: set A with sum s
- Return Subset Sum instance: set A and target t = s/2
- Partition solvable ↔ Subset Sum with target s/2 solvable
3-SAT ≤ₚ Subset Sum:
- Encode variables and clauses as digits in base representation
- Satisfying assignment ↔ Subset summing to target with all 1’s in key positions
Both reductions are polynomial-time, establishing Subset Sum as NP-complete.
Further Reading
- Garey & Johnson: “Computers and Intractability” - Standard reference for NP-completeness proofs
- Dynamic Programming: CLRS or other algorithms textbooks cover the DP solution in detail
- Weakly vs Strongly NP-Complete: Understanding the distinction and when pseudo-polynomial algorithms exist
- Approximation Algorithms: Greedy and other approximation approaches for optimization version
Practice Problems
-
Solve by hand: For S = {2, 3, 7, 8, 10} and t = 11, find a subset that sums to 11. How many such subsets exist?
-
Prove the reduction: Show that Partition reduces to Subset Sum. Is the reverse reduction also possible?
-
DP implementation: Implement the dynamic programming algorithm for Subset Sum. Test it on various inputs and analyze its performance.
-
Space optimization: Implement the space-optimized version using a 1D array. Why must we iterate backwards?
- Time complexity analysis: For the DP algorithm, what is the time complexity if:
- t = O(n)?
- t = O(2^n)?
- t = O(n^2)?
-
Extension: Modify the algorithm to return the actual subset (not just YES/NO). How does this affect time and space complexity?
-
Related problem: Research the Partition Problem. How does it relate to Subset Sum? Can you reduce Subset Sum to Partition?
- Approximation: Design a greedy approximation algorithm for the optimization version of Subset Sum (maximize sum ≤ target). What approximation ratio does it achieve?
Understanding the Subset Sum Problem provides crucial insight into weakly NP-complete problems and the power of dynamic programming. The pseudo-polynomial solution demonstrates that NP-completeness doesn’t always mean “impossible in practice” - it depends on the problem structure and input characteristics.