NP-Hard Introduction: The Subset Sum Problem

Introduction

The Subset Sum Problem is a fundamental number problem that serves as an excellent example of NP-completeness. Unlike the graph problems we’ve explored (Clique, Independent Set, Vertex Cover), Subset Sum deals with numbers and arithmetic, providing a different perspective on NP-completeness. It’s particularly interesting because it has a pseudo-polynomial time dynamic programming solution, making it a great example of the distinction between weakly NP-complete and strongly NP-complete problems.

What is Subset Sum?

The Subset Sum Problem asks: Given a set of integers and a target sum, does there exist a subset whose elements sum to exactly the target?

Problem Definition

Subset Sum Decision Problem:

Input:

A set of integers S = {a_1, a_2, …, a_n}
A target integer t

Output: YES if there exists a subset S’ subseteq S such that ∑_{a_i ∈ S’} a_i = t, NO otherwise

Subset Sum Optimization Problem:

Input: A set of integers S = {a_1, a_2, …, a_n} and target t

Output: The maximum sum ≤ t achievable by any subset, or the subset itself

Example

Consider the set S = {3, 34, 4, 12, 5, 2} and target t = 9.

Subset {4, 5} sums to 4 + 5 = 9 ✓
Subset {3, 4, 2} sums to 3 + 4 + 2 = 9 ✓
Subset {12} sums to 12 > 9 ✗
Subset {3, 5} sums to 8 < 9 ✗

So the answer is YES - there exists a subset that sums to exactly 9.

Visual Example

For S = {2, 3, 7, 8, 10} and t = 11:

Try {2, 3, 7} = 12 (too large)
Try {3, 8} = 11 ✓ (found it!)
Try {2, 3, 6} (6 not in set)
Try {11} (11 not in set)

Answer: YES, subset {3, 8} sums to 11.

Why Subset Sum is in NP

To show that Subset Sum is NP-complete, we first need to show it’s in NP.

Subset Sum ∈ NP:

Given a candidate solution (a subset S’), we can verify in polynomial time:

Check that S’ subseteq S: O(n) time
Sum the elements: O(n) time
Check if sum equals t: O(1) time

Total verification time: O(n), which is polynomial in the input size. Therefore, Subset Sum is in NP.

NP-Completeness: Reduction from Partition

The most common proof that Subset Sum is NP-complete reduces from the Partition Problem.

Partition Problem

Partition Problem:

Input: A set of integers A = {a_1, a_2, …, a_n}
Output: YES if A can be partitioned into two subsets with equal sum, NO otherwise

Partition is known to be NP-complete (can be proven by reduction from 3-SAT or Subset Sum itself).

Reduction from Partition to Subset Sum

Reduction:

Given a Partition instance: set A = {a₁, a₂, …, aₙ}
Compute s = ∑ᵢ₌₁ⁿ aᵢ (total sum)
Return Subset Sum instance: set A and target t = s/2

Example:

Consider a Partition instance:

Set A = {3, 1, 4, 2, 2}
Total sum: s = 3 + 1 + 4 + 2 + 2 = 12
Target for partition: s/2 = 6

Reduction to Subset Sum:

Subset Sum instance: set A = {3, 1, 4, 2, 2}, target t = 6

Verification:

Partition → Subset Sum:

Partition solution: {3, 1, 2} and {4, 2} (both sum to 6)
Subset Sum solution: {3, 1, 2} sums to 6 ✓
Therefore, Subset Sum returns YES

Subset Sum → Partition:

Subset Sum solution: {3, 1, 2} sums to 6
Complement: {4, 2} sums to 12 - 6 = 6 ✓
Therefore, Partition returns YES (partition: {3, 1, 2} and {4, 2})

Correctness:

If Partition has a solution, then A can be split into two subsets each summing to s/2
Therefore, one of these subsets sums to s/2, so Subset Sum with target s/2 returns YES
If Subset Sum with target s/2 returns YES, then there’s a subset summing to s/2
The complement of this subset sums to s - s/2 = s/2, so Partition returns YES

Polynomial Time:

Computing s takes O(n) time
The reduction is trivial

Therefore, Subset Sum is NP-complete.

Alternative Reduction: Direct from 3-SAT

We can also prove Subset Sum is NP-complete by directly reducing from 3-SAT.

Construction

For a 3-SAT formula φ = C_1 ∧ C_2 ∧ … ∧ C_m with variables x₁, x₂, …, x_n:

Key Idea: Encode the satisfiability constraints using numbers in base 10 (or any base > number of clauses).

Create numbers:
- For each variable x_i, create two numbers: v_i (for x_i = TRUE) and ̅{v_i} (for x_i = FALSE)
- Each number has n + m digits:
  - First n digits: 1 in position i (for variable x_i), 0 elsewhere
  - Last m digits: 1 in position j if the literal appears in clause C_j, 0 otherwise
Create target number t:
- First n digits: all 1’s (each variable must be assigned TRUE or FALSE)
- Last m digits: all 1’s (each clause must be satisfied)
Set S: All v_i and ̅{v_i} numbers

Why This Works

Intuition:

The first n digits ensure we pick exactly one literal per variable (either v_i or ̅{v_i})
The last m digits ensure each clause has at least one true literal
A subset summing to t corresponds to a satisfying assignment

Formal Proof:

Forward Direction (3-SAT satisfiable → Subset Sum solvable):

If φ is satisfiable, pick v_i if x_i = TRUE, else pick ̅{v_i}
Sum these numbers: first n digits sum to all 1’s, last m digits sum to at least all 1’s (each clause satisfied)
Add slack numbers (0-padded) if needed to reach exactly t

Reverse Direction (Subset Sum solvable → 3-SAT satisfiable):

If subset sums to t, the first n digits force exactly one choice per variable
The last m digits being all 1’s means each clause has at least one true literal
This gives a satisfying assignment

Pseudo-Polynomial Dynamic Programming Solution

Despite being NP-complete, Subset Sum has a pseudo-polynomial time dynamic programming solution. This makes it weakly NP-complete (as opposed to strongly NP-complete problems that remain hard even when numbers are polynomially bounded).

DP Algorithm

Subproblem: dp[i][s] = true if there exists a subset of {a_1, a_2, …, a_i} that sums to exactly s.

Recurrence:

Base case: dp[0][0] = true, dp[0][s] = false for s > 0
Recurrence: dp[i][s] = dp[i-1][s] ∨ dp[i-1][s - a_i] (if s ≥ a_i)
- Either don’t include a_i: dp[i-1][s]
- Or include a_i: dp[i-1][s - a_i]

Algorithm:

Algorithm: SubsetSumDP(S, t)
n = `|S|`
Let dp[0..n][0..t] be a 2D boolean array
dp[0][0] = true
for i = 1 to n:
    for s = 0 to t:
        dp[i][s] = dp[i-1][s]
        if s >= a_i:
            dp[i][s] = dp[i][s] OR dp[i-1][s - a_i]
return dp[n][t]

Time Complexity: O(n · t) Space Complexity: O(n · t) (can be optimized to O(t) using 1D array)

Why “Pseudo-Polynomial”?

The algorithm runs in O(n · t) time
If t is exponential in n (e.g., t = 2^n), then O(n · t) = O(n · 2^n) is exponential
If t is polynomial in n (e.g., t = n^2), then O(n · t) = O(n^3) is polynomial
The complexity depends on the value of t, not just its representation size
Since t requires log t bits to represent, if t is exponential, log t is polynomial, but t itself is exponential

Space Optimization

We can optimize space to O(t):

Algorithm: SubsetSumDPOptimized(S, t)
Let dp[0..t] be a boolean array
dp[0] = true
for i = 1 to n:
    for s = t down to a_i:  // iterate backwards!
        dp[s] = dp[s] OR dp[s - a_i]
return dp[t]

Key: Iterate backwards to avoid using updated values in the same iteration.

Relationship to Other Problems

The Subset Sum Problem is closely related to several other important problems:

Partition

As we saw:

Partition reduces to Subset Sum (set target to half the total sum)
Subset Sum can also reduce to Partition (add elements to make partition work)
They are polynomially equivalent

Knapsack

0/1 Knapsack Problem:

Items with weights and values
Maximize value subject to weight constraint
Subset Sum is a special case where weight = value and we want exact sum

Reduction:

Subset Sum instance: S = {a_1, …, a_n}, target t
Knapsack instance: n items, item i has weight a_i and value a_i, capacity t, maximize value
Subset Sum solvable ↔ Knapsack optimal value = t

Coin Change

Coin Change Problem:

Given coin denominations and target amount
Can we make exact change?
Subset Sum is a special case (unlimited coins would be different)

Number Partitioning

Related to Partition, asking if numbers can be partitioned into k equal-sum subsets.

Practical Implications

Why Subset Sum Matters

The Subset Sum Problem is NP-complete, which means:

No Known Polynomial-Time Algorithm: For the general case with arbitrary numbers
Pseudo-Polynomial Solution: Dynamic programming works when target is small
Practical Algorithms: DP is very efficient for practical instances with bounded targets
Approximation: Can use greedy or other heuristics for optimization version

Real-World Applications

Subset Sum has numerous applications:

Resource Allocation: Allocating resources to match a target exactly
Budget Planning: Finding combinations that sum to a budget
Cryptography: Some cryptographic schemes rely on Subset Sum hardness
Scheduling: Allocating time slots to match requirements
Financial Planning: Portfolio selection, investment matching
Data Analysis: Finding subsets with specific aggregate properties

Runtime Analysis

Brute Force Approach

Algorithm: Try all 2ⁿ subsets

Time Complexity: O(2^n · n)
Space Complexity: O(n) for storing current subset
Analysis: For each subset, sum elements (O(n) time)

Dynamic Programming (Pseudo-Polynomial)

Algorithm: DP table dp[i][s] = true if sum s achievable with first i elements

Time Complexity: O(n · t) where t is target sum
Space Complexity: O(n · t) (can be optimized to O(t))
Subproblem: dp[i][s] = dp[i-1][s] ∨ dp[i-1][s-a_i] (if s ≥ a_i)
Why Pseudo-Polynomial: Depends on value of t, not just input size. If t = 2^n, then O(n · 2^n) is exponential.

Space-Optimized DP

Algorithm: Use 1D array, iterate backwards

Time Complexity: O(n · t)
Space Complexity: O(t)
Key: Iterate backwards to avoid using updated values in same iteration

Meet-in-the-Middle

Algorithm: Split array in half, find all sums for each half, combine

Time Complexity: O(2^{n/2} · n)
Space Complexity: O(2^{n/2})
Improvement: Reduces exponential factor from 2ⁿ to 2^{n/2}

Approximation Algorithms

Greedy Approach:

Time Complexity: O(n log n) (sort first)
Space Complexity: O(n)
Approximation: No guaranteed ratio, but works well in practice

Verification Complexity

Given a candidate subset:

Time Complexity: O(n) - sum elements and compare to target
Space Complexity: O(1) additional space
This polynomial-time verifiability shows Subset Sum is in NP

When DP Works Well

The dynamic programming solution is practical when:

Target t is relatively small (e.g., t ≤ 10^6)
Numbers are bounded
Need exact solutions (not approximations)

When DP Fails

The DP solution becomes impractical when:

Target t is very large (exponential in n)
Numbers are very large
Need to handle floating-point numbers (requires different approach)

Key Takeaways

Subset Sum is NP-Complete: Proven by reduction from Partition or directly from 3-SAT
Weakly NP-Complete: Has a pseudo-polynomial time DP solution, distinguishing it from strongly NP-complete problems
Dynamic Programming: O(n · t) time solution works well when target is bounded
Space Optimization: Can reduce space from O(n · t) to O(t) with careful iteration
Related Problems: Connected to Partition, Knapsack, and Coin Change problems

Reduction Summary

Partition ≤ₚ Subset Sum:

Given Partition instance: set A with sum s
Return Subset Sum instance: set A and target t = s/2
Partition solvable ↔ Subset Sum with target s/2 solvable

3-SAT ≤ₚ Subset Sum:

Encode variables and clauses as digits in base representation
Satisfying assignment ↔ Subset summing to target with all 1’s in key positions

Both reductions are polynomial-time, establishing Subset Sum as NP-complete.

Practice Problems

Solve by hand: For S = {2, 3, 7, 8, 10} and t = 11, find a subset that sums to 11. How many such subsets exist?
Prove the reduction: Show that Partition reduces to Subset Sum. Is the reverse reduction also possible?
DP implementation: Implement the dynamic programming algorithm for Subset Sum. Test it on various inputs and analyze its performance.
Space optimization: Implement the space-optimized version using a 1D array. Why must we iterate backwards?
Time complexity analysis: For the DP algorithm, what is the time complexity if:
- t = O(n)?
- t = O(2^n)?
- t = O(n^2)?
Extension: Modify the algorithm to return the actual subset (not just YES/NO). How does this affect time and space complexity?
Related problem: Research the Partition Problem. How does it relate to Subset Sum? Can you reduce Subset Sum to Partition?
Approximation: Design a greedy approximation algorithm for the optimization version of Subset Sum (maximize sum ≤ target). What approximation ratio does it achieve?

Understanding the Subset Sum Problem provides crucial insight into weakly NP-complete problems and the power of dynamic programming. The pseudo-polynomial solution demonstrates that NP-completeness doesn’t always mean “impossible in practice” - it depends on the problem structure and input characteristics.