Introduction

The Subset Sum Problem is a fundamental number problem that serves as an excellent example of NP-completeness. Unlike the graph problems we’ve explored (Clique, Independent Set, Vertex Cover), Subset Sum deals with numbers and arithmetic, providing a different perspective on NP-completeness. It’s particularly interesting because it has a pseudo-polynomial time dynamic programming solution, making it a great example of the distinction between weakly NP-complete and strongly NP-complete problems.

What is Subset Sum?

The Subset Sum Problem asks: Given a set of integers and a target sum, does there exist a subset whose elements sum to exactly the target?

Problem Definition

Subset Sum Decision Problem:

Input:

  • A set of integers S = {a_1, a_2, …, a_n}
  • A target integer t

Output: YES if there exists a subset S’ subseteq S such that ∑_{a_i ∈ S’} a_i = t, NO otherwise

Subset Sum Optimization Problem:

Input: A set of integers S = {a_1, a_2, …, a_n} and target t

Output: The maximum sum ≤ t achievable by any subset, or the subset itself

Example

Consider the set S = {3, 34, 4, 12, 5, 2} and target t = 9.

  • Subset {4, 5} sums to 4 + 5 = 9 ✓
  • Subset {3, 4, 2} sums to 3 + 4 + 2 = 9 ✓
  • Subset {12} sums to 12 > 9 ✗
  • Subset {3, 5} sums to 8 < 9 ✗

So the answer is YES - there exists a subset that sums to exactly 9.

Visual Example

For S = {2, 3, 7, 8, 10} and t = 11:

  • Try {2, 3, 7} = 12 (too large)
  • Try {3, 8} = 11 ✓ (found it!)
  • Try {2, 3, 6} (6 not in set)
  • Try {11} (11 not in set)

Answer: YES, subset {3, 8} sums to 11.

Why Subset Sum is in NP

To show that Subset Sum is NP-complete, we first need to show it’s in NP.

Subset Sum ∈ NP:

Given a candidate solution (a subset S’), we can verify in polynomial time:

  1. Check that S’ subseteq S: O(n) time
  2. Sum the elements: O(n) time
  3. Check if sum equals t: O(1) time

Total verification time: O(n), which is polynomial in the input size. Therefore, Subset Sum is in NP.

NP-Completeness: Reduction from Partition

The most common proof that Subset Sum is NP-complete reduces from the Partition Problem.

Partition Problem

Partition Problem:

  • Input: A set of integers A = {a_1, a_2, …, a_n}
  • Output: YES if A can be partitioned into two subsets with equal sum, NO otherwise

Partition is known to be NP-complete (can be proven by reduction from 3-SAT or Subset Sum itself).

Reduction from Partition to Subset Sum

Reduction:

  1. Given a Partition instance: set A = {a₁, a₂, …, aₙ}
  2. Compute s = ∑ᵢ₌₁ⁿ aᵢ (total sum)
  3. Return Subset Sum instance: set A and target t = s/2

Example:

Consider a Partition instance:

  • Set A = {3, 1, 4, 2, 2}
  • Total sum: s = 3 + 1 + 4 + 2 + 2 = 12
  • Target for partition: s/2 = 6

Reduction to Subset Sum:

  • Subset Sum instance: set A = {3, 1, 4, 2, 2}, target t = 6

Verification:

Partition → Subset Sum:

  • Partition solution: {3, 1, 2} and {4, 2} (both sum to 6)
  • Subset Sum solution: {3, 1, 2} sums to 6 ✓
  • Therefore, Subset Sum returns YES

Subset Sum → Partition:

  • Subset Sum solution: {3, 1, 2} sums to 6
  • Complement: {4, 2} sums to 12 - 6 = 6 ✓
  • Therefore, Partition returns YES (partition: {3, 1, 2} and {4, 2})

Correctness:

  • If Partition has a solution, then A can be split into two subsets each summing to s/2
  • Therefore, one of these subsets sums to s/2, so Subset Sum with target s/2 returns YES
  • If Subset Sum with target s/2 returns YES, then there’s a subset summing to s/2
  • The complement of this subset sums to s - s/2 = s/2, so Partition returns YES

Polynomial Time:

  • Computing s takes O(n) time
  • The reduction is trivial

Therefore, Subset Sum is NP-complete.

Alternative Reduction: Direct from 3-SAT

We can also prove Subset Sum is NP-complete by directly reducing from 3-SAT.

Construction

For a 3-SAT formula φ = C_1 ∧ C_2 ∧ … ∧ C_m with variables x₁, x₂, …, x_n:

Key Idea: Encode the satisfiability constraints using numbers in base 10 (or any base > number of clauses).

  1. Create numbers:
    • For each variable x_i, create two numbers: v_i (for x_i = TRUE) and ̅{v_i} (for x_i = FALSE)
    • Each number has n + m digits:
      • First n digits: 1 in position i (for variable x_i), 0 elsewhere
      • Last m digits: 1 in position j if the literal appears in clause C_j, 0 otherwise
  2. Create target number t:
    • First n digits: all 1’s (each variable must be assigned TRUE or FALSE)
    • Last m digits: all 1’s (each clause must be satisfied)
  3. Set S: All v_i and ̅{v_i} numbers

Why This Works

Intuition:

  • The first n digits ensure we pick exactly one literal per variable (either v_i or ̅{v_i})
  • The last m digits ensure each clause has at least one true literal
  • A subset summing to t corresponds to a satisfying assignment

Formal Proof:

Forward Direction (3-SAT satisfiable → Subset Sum solvable):

  • If φ is satisfiable, pick v_i if x_i = TRUE, else pick ̅{v_i}
  • Sum these numbers: first n digits sum to all 1’s, last m digits sum to at least all 1’s (each clause satisfied)
  • Add slack numbers (0-padded) if needed to reach exactly t

Reverse Direction (Subset Sum solvable → 3-SAT satisfiable):

  • If subset sums to t, the first n digits force exactly one choice per variable
  • The last m digits being all 1’s means each clause has at least one true literal
  • This gives a satisfying assignment

Pseudo-Polynomial Dynamic Programming Solution

Despite being NP-complete, Subset Sum has a pseudo-polynomial time dynamic programming solution. This makes it weakly NP-complete (as opposed to strongly NP-complete problems that remain hard even when numbers are polynomially bounded).

DP Algorithm

Subproblem: dp[i][s] = true if there exists a subset of {a_1, a_2, …, a_i} that sums to exactly s.

Recurrence:

  • Base case: dp[0][0] = true, dp[0][s] = false for s > 0
  • Recurrence: dp[i][s] = dp[i-1][s] ∨ dp[i-1][s - a_i] (if s ≥ a_i)
    • Either don’t include a_i: dp[i-1][s]
    • Or include a_i: dp[i-1][s - a_i]

Algorithm:

Algorithm: SubsetSumDP(S, t)
1. n = `|S|`
2. Let dp[0..n][0..t] be a 2D boolean array
3. dp[0][0] = true
4. for i = 1 to n:
5.     for s = 0 to t:
6.         dp[i][s] = dp[i-1][s]
7.         if s >= a_i:
8.             dp[i][s] = dp[i][s] OR dp[i-1][s - a_i]
9. return dp[n][t]

Time Complexity: O(n · t) Space Complexity: O(n · t) (can be optimized to O(t) using 1D array)

Why “Pseudo-Polynomial”?

  • The algorithm runs in O(n · t) time
  • If t is exponential in n (e.g., t = 2^n), then O(n · t) = O(n · 2^n) is exponential
  • If t is polynomial in n (e.g., t = n^2), then O(n · t) = O(n^3) is polynomial
  • The complexity depends on the value of t, not just its representation size
  • Since t requires log t bits to represent, if t is exponential, log t is polynomial, but t itself is exponential

Space Optimization

We can optimize space to O(t):

Algorithm: SubsetSumDPOptimized(S, t)
1. Let dp[0..t] be a boolean array
2. dp[0] = true
3. for i = 1 to n:
4.     for s = t down to a_i:  // iterate backwards!
5.         dp[s] = dp[s] OR dp[s - a_i]
6. return dp[t]

Key: Iterate backwards to avoid using updated values in the same iteration.

Relationship to Other Problems

The Subset Sum Problem is closely related to several other important problems:

Partition

As we saw:

  • Partition reduces to Subset Sum (set target to half the total sum)
  • Subset Sum can also reduce to Partition (add elements to make partition work)
  • They are polynomially equivalent

Knapsack

0/1 Knapsack Problem:

  • Items with weights and values
  • Maximize value subject to weight constraint
  • Subset Sum is a special case where weight = value and we want exact sum

Reduction:

  • Subset Sum instance: S = {a_1, …, a_n}, target t
  • Knapsack instance: n items, item i has weight a_i and value a_i, capacity t, maximize value
  • Subset Sum solvable ↔ Knapsack optimal value = t

Coin Change

Coin Change Problem:

  • Given coin denominations and target amount
  • Can we make exact change?
  • Subset Sum is a special case (unlimited coins would be different)

Number Partitioning

Related to Partition, asking if numbers can be partitioned into k equal-sum subsets.

Practical Implications

Why Subset Sum Matters

The Subset Sum Problem is NP-complete, which means:

  1. No Known Polynomial-Time Algorithm: For the general case with arbitrary numbers
  2. Pseudo-Polynomial Solution: Dynamic programming works when target is small
  3. Practical Algorithms: DP is very efficient for practical instances with bounded targets
  4. Approximation: Can use greedy or other heuristics for optimization version

Real-World Applications

Subset Sum has numerous applications:

  1. Resource Allocation: Allocating resources to match a target exactly
  2. Budget Planning: Finding combinations that sum to a budget
  3. Cryptography: Some cryptographic schemes rely on Subset Sum hardness
  4. Scheduling: Allocating time slots to match requirements
  5. Financial Planning: Portfolio selection, investment matching
  6. Data Analysis: Finding subsets with specific aggregate properties

Runtime Analysis

Brute Force Approach

Algorithm: Try all 2ⁿ subsets

  • Time Complexity: O(2^n · n)
  • Space Complexity: O(n) for storing current subset
  • Analysis: For each subset, sum elements (O(n) time)

Dynamic Programming (Pseudo-Polynomial)

Algorithm: DP table dp[i][s] = true if sum s achievable with first i elements

  • Time Complexity: O(n · t) where t is target sum
  • Space Complexity: O(n · t) (can be optimized to O(t))
  • Subproblem: dp[i][s] = dp[i-1][s] ∨ dp[i-1][s-a_i] (if s ≥ a_i)
  • Why Pseudo-Polynomial: Depends on value of t, not just input size. If t = 2^n, then O(n · 2^n) is exponential.

Space-Optimized DP

Algorithm: Use 1D array, iterate backwards

  • Time Complexity: O(n · t)
  • Space Complexity: O(t)
  • Key: Iterate backwards to avoid using updated values in same iteration

Meet-in-the-Middle

Algorithm: Split array in half, find all sums for each half, combine

  • Time Complexity: O(2^{n/2} · n)
  • Space Complexity: O(2^{n/2})
  • Improvement: Reduces exponential factor from 2ⁿ to 2^{n/2}

Approximation Algorithms

Greedy Approach:

  • Time Complexity: O(n log n) (sort first)
  • Space Complexity: O(n)
  • Approximation: No guaranteed ratio, but works well in practice

Verification Complexity

Given a candidate subset:

  • Time Complexity: O(n) - sum elements and compare to target
  • Space Complexity: O(1) additional space
  • This polynomial-time verifiability shows Subset Sum is in NP

When DP Works Well

The dynamic programming solution is practical when:

  • Target t is relatively small (e.g., t ≤ 10^6)
  • Numbers are bounded
  • Need exact solutions (not approximations)

When DP Fails

The DP solution becomes impractical when:

  • Target t is very large (exponential in n)
  • Numbers are very large
  • Need to handle floating-point numbers (requires different approach)

Key Takeaways

  1. Subset Sum is NP-Complete: Proven by reduction from Partition or directly from 3-SAT
  2. Weakly NP-Complete: Has a pseudo-polynomial time DP solution, distinguishing it from strongly NP-complete problems
  3. Dynamic Programming: O(n · t) time solution works well when target is bounded
  4. Space Optimization: Can reduce space from O(n · t) to O(t) with careful iteration
  5. Related Problems: Connected to Partition, Knapsack, and Coin Change problems

Reduction Summary

Partition ≤ₚ Subset Sum:

  • Given Partition instance: set A with sum s
  • Return Subset Sum instance: set A and target t = s/2
  • Partition solvable ↔ Subset Sum with target s/2 solvable

3-SAT ≤ₚ Subset Sum:

  • Encode variables and clauses as digits in base representation
  • Satisfying assignment ↔ Subset summing to target with all 1’s in key positions

Both reductions are polynomial-time, establishing Subset Sum as NP-complete.

Further Reading

  • Garey & Johnson: “Computers and Intractability” - Standard reference for NP-completeness proofs
  • Dynamic Programming: CLRS or other algorithms textbooks cover the DP solution in detail
  • Weakly vs Strongly NP-Complete: Understanding the distinction and when pseudo-polynomial algorithms exist
  • Approximation Algorithms: Greedy and other approximation approaches for optimization version

Practice Problems

  1. Solve by hand: For S = {2, 3, 7, 8, 10} and t = 11, find a subset that sums to 11. How many such subsets exist?

  2. Prove the reduction: Show that Partition reduces to Subset Sum. Is the reverse reduction also possible?

  3. DP implementation: Implement the dynamic programming algorithm for Subset Sum. Test it on various inputs and analyze its performance.

  4. Space optimization: Implement the space-optimized version using a 1D array. Why must we iterate backwards?

  5. Time complexity analysis: For the DP algorithm, what is the time complexity if:
    • t = O(n)?
    • t = O(2^n)?
    • t = O(n^2)?
  6. Extension: Modify the algorithm to return the actual subset (not just YES/NO). How does this affect time and space complexity?

  7. Related problem: Research the Partition Problem. How does it relate to Subset Sum? Can you reduce Subset Sum to Partition?

  8. Approximation: Design a greedy approximation algorithm for the optimization version of Subset Sum (maximize sum ≤ target). What approximation ratio does it achieve?

Understanding the Subset Sum Problem provides crucial insight into weakly NP-complete problems and the power of dynamic programming. The pseudo-polynomial solution demonstrates that NP-completeness doesn’t always mean “impossible in practice” - it depends on the problem structure and input characteristics.