686. Repeated String Match
686. Repeated String Match
Problem Statement
Given two strings a and b, return the minimum number of times you should repeat string a so that string b is a substring of it. If it is impossible for b to be a substring of a after repeating it, return -1.
Notice: String "abc" repeated 0 times is "", repeated 1 time is "abc", and repeated 2 times is "abcabc".
Examples
Example 1:
Input: a = "abcd", b = "cdabcdab"
Output: 3
Explanation: We return 3 because by repeating a three times "abcdabcdabcd", b is a substring of it.
Example 2:
Input: a = "a", b = "aa"
Output: 2
Example 3:
Input: a = "a", b = "a"
Output: 1
Example 4:
Input: a = "abc", b = "wxyz"
Output: -1
Constraints
1 <= a.length, b.length <= 10^4aandbconsist of lowercase English letters.
Clarification Questions
Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:
-
Repetition definition: What does “repeated string match” mean? (Assumption: Repeat string ‘a’ multiple times until ‘b’ becomes a substring of the repeated string)
-
Minimum repetitions: What are we trying to find? (Assumption: Minimum number of times to repeat ‘a’ so that ‘b’ is a substring)
-
Substring matching: Does ‘b’ need to be an exact substring? (Assumption: Yes - ‘b’ must appear as a contiguous substring in the repeated ‘a’)
-
No match: What should we return if ‘b’ can never be a substring? (Assumption: Return -1 - impossible to form ‘b’ as substring)
-
Case sensitivity: Are character comparisons case-sensitive? (Assumption: Based on constraints, only lowercase letters, so case doesn’t matter)
Interview Deduction Process (20 minutes)
Step 1: Brute-Force Approach (5 minutes)
Initial Thought: “I need to find minimum repetitions. Let me try repeating string a and checking if b is substring.”
Naive Solution: Repeat string a multiple times, check if b is substring using simple string matching. Try repetitions from 1 to some upper bound.
Complexity: O(n × m) per repetition check, where n = len(a), m = len(b)
Issues:
- Simple substring matching is inefficient
- May need many repetitions
- Doesn’t leverage string matching algorithms
- Timeout for large strings
Step 2: Semi-Optimized Approach (7 minutes)
Insight: “I can use KMP algorithm for efficient substring matching, or optimize repetition checking.”
Improved Solution: Use KMP algorithm for substring matching. Repeat a until length >= b, then check if b is substring. Can also use Rabin-Karp for alternative.
Complexity: O(n + m) per check with KMP, but need multiple repetitions
Improvements:
- KMP makes substring matching efficient
- Still need to determine upper bound for repetitions
- Much faster than naive matching
Step 3: Optimized Solution (8 minutes)
Final Optimization: “I can determine upper bound: need at most ceil(m/n) + 1 repetitions. Use KMP for matching.”
Best Solution: Calculate upper bound: need at most ceil(len(b)/len(a)) + 1 repetitions. Use KMP algorithm for efficient substring matching. Can also use Rabin-Karp as alternative.
Complexity: O(n + m) time with KMP, O(n + m) space
Key Realizations:
- Upper bound is ceil(m/n) + 1 repetitions
- KMP algorithm is optimal for substring matching
- O(n + m) time is optimal for string matching
- Rabin-Karp is alternative with similar complexity
Solution Approach
This problem requires finding the minimum number of repetitions of string a such that string b becomes a substring. This is a string matching problem that can be efficiently solved using advanced algorithms like Knuth-Morris-Pratt (KMP) or Rabin-Karp.
Key Insights:
- Minimum Repetitions: Need at least
⌈b.length / a.length⌉repetitions - Maximum Repetitions: At most
⌈b.length / a.length⌉ + 1repetitions needed - Circular Matching: Since
ais repeated, we can use circular string matching - Efficient Algorithms: KMP or Rabin-Karp provide O(n + m) time complexity
Algorithm Options:
- KMP Algorithm: Preprocess pattern to avoid backtracking
- Rabin-Karp: Use rolling hash for efficient substring matching
- Naive Approach: O(n × m) - check each position
String Matching Algorithms
Knuth-Morris-Pratt (KMP) Algorithm
KMP is an efficient string-searching algorithm that preprocesses the pattern to create a prefix function (also called LPS - Longest Proper Prefix which is also a Suffix). This allows skipping unnecessary comparisons.
Key Concepts:
- Prefix Function (π/LPS): For each position
iin pattern,π[i]is the length of the longest proper prefix that is also a suffix ofpattern[0..i] - No Backtracking: When a mismatch occurs, we don’t reset to the beginning but use the prefix function to determine the next position
- Time Complexity: O(n + m) where n = text length, m = pattern length
How KMP Works:
- Preprocessing Phase: Build prefix function for pattern
- For each position, find longest prefix-suffix match
- Use previous values to compute current value efficiently
- Search Phase: Match pattern in text
- Compare characters from left to right
- On mismatch, use prefix function to skip ahead
- Never backtrack in text
Prefix Function Example:
For pattern "ababaca":
Pattern: a b a b a c a
Index: 0 1 2 3 4 5 6
π[i]: 0 0 1 2 3 0 1
Explanation:
- π[0] = 0 (no proper prefix)
- π[1] = 0 ("ab" has no prefix-suffix match)
- π[2] = 1 ("aba" has "a" as prefix-suffix)
- π[3] = 2 ("abab" has "ab" as prefix-suffix)
- π[4] = 3 ("ababa" has "aba" as prefix-suffix)
- π[5] = 0 ("ababac" has no prefix-suffix match)
- π[6] = 1 ("ababaca" has "a" as prefix-suffix)
Rabin-Karp Algorithm
Rabin-Karp uses rolling hash to efficiently compute hash values for substrings. It compares hash values first, then verifies with character-by-character comparison if hashes match.
Key Concepts:
- Rolling Hash: Compute hash of substring in O(1) time using previous hash
- Hash Function: Use polynomial rolling hash:
hash = (hash * base + char) % mod - Collision Handling: When hashes match, verify with actual string comparison
- Time Complexity: Average O(n + m), worst case O(n × m) if many hash collisions
How Rabin-Karp Works:
- Precompute Pattern Hash: Calculate hash of pattern string
- Rolling Hash in Text:
- Compute hash of first window
- Slide window and update hash in O(1)
- Compare hashes, verify if match
- Base and Modulo: Use large base and prime modulo to reduce collisions
Rolling Hash Formula:
For substring s[i..i+m-1]:
hash = (s[i] * base^(m-1) + s[i+1] * base^(m-2) + ... + s[i+m-1]) % mod
To slide window from i to i+1:
new_hash = ((old_hash - s[i] * base^(m-1)) * base + s[i+m]) % mod
Solution 1: KMP Algorithm (Recommended)
Solution: KMP with Circular String Matching
class Solution:
def strStr(self, haystack, needle):
n = len(haystack), m = len(needle)
if(m == 0) return 0
list[int> pi(m)
for(i = 1, j = 0 i < m i += 1) :
while j > 0 and needle[i] != needle[j]:
j = pi[j - 1]
if needle[i] == needle[j]:
j += 1
pi[i] = j
for(i = 0, j = 0 i - j < n i += 1) :
while j > 0 and haystack[i % n] != needle[j]:
j = pi[j - 1]
if haystack[i % n] == needle[j]:
j += 1
if j == m:
return i - m + 1
return -1
def repeatedStringMatch(self, a, b):
an = len(a), bn = len(b)
idx = strStr(a, b)
if(idx == -1) return -1
if an - idx >= bn:
return 1
return (bn + idx - an - 1) / an + 2
Algorithm Explanation:
- Prefix Function Construction (Lines 7-15):
- Build
piarray for patternneedle pi[i]= length of longest prefix-suffix forneedle[0..i]- Use previous values to compute efficiently
- Build
- KMP Search with Circular Matching (Lines 16-28):
- Search for
needlein circularhaystack - Use
i % nto handle circular indexing - On mismatch:
j = pi[j - 1](skip using prefix function) - On match: increment
j - When
j == m: pattern found at positioni - m + 1
- Search for
- Calculate Minimum Repetitions (Lines 30-38):
- If pattern found at
idx:- If remaining characters in first repetition ≥
bn: return 1 - Otherwise: calculate repetitions needed
- Formula:
(bn + idx - an - 1) / an + 2
- If remaining characters in first repetition ≥
- If pattern found at
Example Walkthrough:
For a = "abcd", b = "cdabcdab":
Step 1: Build prefix function for "cdabcdab"
Pattern: c d a b c d a b
π[i]: 0 0 0 0 1 2 3 4
Step 2: KMP search in circular "abcd"
Text (circular): a b c d a b c d a b c d ...
Pattern: c d a b c d a b
i=0: 'a' vs 'c' → mismatch, j=0
i=1: 'b' vs 'c' → mismatch, j=0
i=2: 'c' vs 'c' → match, j=1
i=3: 'd' vs 'd' → match, j=2
i=4: 'a' vs 'a' → match, j=3
i=5: 'b' vs 'b' → match, j=4
i=6: 'c' vs 'c' → match, j=5
i=7: 'd' vs 'd' → match, j=6
i=8: 'a' vs 'a' → match, j=7
i=9: 'b' vs 'b' → match, j=8 → FOUND!
idx = 9 - 8 + 1 = 2
an = 4, bn = 8
an - idx = 4 - 2 = 2 < 8
repetitions = (8 + 2 - 4 - 1) / 4 + 2 = 5/4 + 2 = 1 + 2 = 3
Solution 2: Rabin-Karp Algorithm
Solution: Rabin-Karp with Rolling Hash
class Solution:
long long BASE = 256
long long MOD = 1e9 + 7
def computeHash(self, s, start, len):
long long hash = 0
for(i = 0 i < len i += 1) :
hash = (hash BASE + s[start + i]) % MOD
return hash
def updateHash(self, oldHash, remove, add, power):
oldHash = (oldHash - (remove power) % MOD + MOD) % MOD
oldHash = (oldHash BASE + add) % MOD
return oldHash
def verifyMatch(self, text, start, pattern):
for(i = 0 i < len(pattern) i += 1) :
if(text[start + i] != pattern[i]) return False
return True
def rabinKarpSearch(self, text, pattern, circular):
n = len(text), m = len(pattern)
if(m == 0) return 0
if(n < m) return -1
# Compute pattern hash
long long patternHash = computeHash(pattern, 0, m)
# Compute power for rolling hash
long long power = 1
for(i = 0 i < m - 1 i += 1) :
power = (power BASE) % MOD
# Compute initial window hash
long long textHash = computeHash(text, 0, m)
# Check first window
if textHash == patternHash and verifyMatch(text, 0, pattern):
return 0
# Rolling hash
(n + m - 1 if maxIterations = circular else n - m + 1)
for(i = 1 i < maxIterations i += 1) :
removeIdx = (i - 1) % n
addIdx = (i + m - 1) % n
textHash = updateHash(textHash, text[removeIdx], text[addIdx], power)
if textHash == patternHash:
start = i % n
if verifyMatch(text, start, pattern):
return i
return -1
def repeatedStringMatch(self, a, b):
an = len(a), bn = len(b)
# Try circular search
idx = rabinKarpSearch(a, b, True)
if(idx == -1) return -1
if an - idx >= bn:
return 1
return (bn + idx - an - 1) / an + 2
Algorithm Explanation:
- Hash Computation (Lines 6-11):
- Compute hash for substring using polynomial rolling hash
- Formula:
hash = (hash * BASE + char) % MOD
- Rolling Hash Update (Lines 13-17):
- Remove leftmost character: subtract
remove * power - Add rightmost character: multiply by BASE and add
- Handle modulo arithmetic correctly
- Remove leftmost character: subtract
- Verification (Lines 19-24):
- When hashes match, verify with character-by-character comparison
- Prevents false positives from hash collisions
- Rabin-Karp Search (Lines 26-58):
- Precompute pattern hash and power value
- Slide window and update hash in O(1)
- Check hash matches, verify if needed
- Support circular string matching
KMP Template
Here’s the general template for KMP algorithm:
class KMP:
# Build prefix function (LPS array)
def buildPrefixFunction(self, pattern):
m = len(pattern)
list[int> pi(m, 0)
for(i = 1, j = 0 i < m i += 1) :
# Mismatch: backtrack using prefix function
while j > 0 and pattern[i] != pattern[j]:
j = pi[j - 1]
# Match: extend prefix
if pattern[i] == pattern[j]:
j += 1
pi[i] = j
return pi
# Search for pattern in text
def search(self, text, pattern):
n = len(text), m = len(pattern)
if(m == 0) return 0
list[int> pi = buildPrefixFunction(pattern)
for(i = 0, j = 0 i < n i += 1) :
# Mismatch: use prefix function to skip
while j > 0 and text[i] != pattern[j]:
j = pi[j - 1]
# Match: advance both pointers
if text[i] == pattern[j]:
j += 1
# Pattern found
if j == m:
return i - m + 1 # Return starting index
return -1 # Not found
Key Template Components:
- Prefix Function (π/LPS):
pi[i]= longest prefix-suffix length forpattern[0..i]- Built in O(m) time
- Search Algorithm:
- No backtracking in text
- Use prefix function to skip on mismatch
- Time complexity: O(n + m)
- Circular Matching:
- Use
i % nfor circular text - Adjust loop condition:
i - j < n
- Use
Rabin-Karp Template
Here’s the general template for Rabin-Karp algorithm:
class RabinKarp:
long long BASE = 256
long long MOD = 1e9 + 7
def computeHash(self, s, start, len):
long long hash = 0
for(i = 0 i < len i += 1) :
hash = (hash BASE + s[start + i]) % MOD
return hash
def updateHash(self, oldHash, remove, add, power):
oldHash = (oldHash - (remove power) % MOD + MOD) % MOD
oldHash = (oldHash BASE + add) % MOD
return oldHash
def search(self, text, pattern):
n = len(text), m = len(pattern)
if(m == 0) return 0
if(n < m) return -1
# Precompute pattern hash
long long patternHash = computeHash(pattern, 0, m)
# Precompute BASE^(m-1) for rolling hash
long long power = 1
for(i = 0 i < m - 1 i += 1) :
power = (power BASE) % MOD
# Initial window hash
long long textHash = computeHash(text, 0, m)
# Check first window
if textHash == patternHash:
if(text.substr(0, m) == pattern) return 0
# Rolling hash: slide window
for(i = 1 i <= n - m i += 1) :
textHash = updateHash(textHash, text[i-1], text[i+m-1], power)
if textHash == patternHash:
if(text.substr(i, m) == pattern) return i
return -1
Key Template Components:
- Hash Function:
- Polynomial rolling hash:
hash = (hash * BASE + char) % MOD - BASE = 256 (for ASCII), MOD = large prime
- Polynomial rolling hash:
- Rolling Hash:
- Update hash in O(1) when sliding window
- Remove left char, add right char
- Collision Handling:
- Always verify hash matches with actual string comparison
- Prevents false positives
Complexity Analysis
Solution 1: KMP
Time Complexity: O(n + m)
- Prefix function: O(m) - build LPS array
- Search phase: O(n) - each character visited at most twice
- Total: O(n + m) where n = a.length, m = b.length
Space Complexity: O(m)
- Prefix function array: O(m)
- Total: O(m)
Solution 2: Rabin-Karp
Time Complexity: O(n + m) average, O(n × m) worst case
- Hash computation: O(m) for pattern
- Rolling hash: O(n) for text (O(1) per window)
- Verification: O(m) per hash match (rare collisions)
- Total: O(n + m) average, O(n × m) worst case with many collisions
Space Complexity: O(1)
- Hash variables: O(1)
- Total: O(1) excluding input strings
Key Points
- KMP is Optimal: Guaranteed O(n + m) time, no worst-case degradation
- Rabin-Karp: Average O(n + m), but can degrade with hash collisions
- Circular Matching: Use modulo arithmetic for repeated strings
- Minimum Repetitions: At least
⌈b.length / a.length⌉, at most⌈b.length / a.length⌉ + 1 - Prefix Function: Key to KMP’s efficiency - avoids backtracking
- Rolling Hash: Key to Rabin-Karp’s efficiency - O(1) hash updates
Comparison: KMP vs Rabin-Karp
| Aspect | KMP | Rabin-Karp |
|---|---|---|
| Time Complexity | O(n + m) guaranteed | O(n + m) average, O(n × m) worst |
| Space Complexity | O(m) | O(1) |
| Preprocessing | O(m) for prefix function | O(m) for pattern hash |
| Backtracking | None (no text backtracking) | None (sliding window) |
| Collision Handling | Not needed | Required (verify matches) |
| Implementation | More complex | Simpler |
| Recommended | ✅ Yes (guaranteed performance) | ⚠️ Good for average case |
Alternative Approaches
Approach 1: KMP (Current Solution 1)
- Time: O(n + m)
- Space: O(m)
- Best for: Guaranteed performance, no worst-case degradation
Approach 2: Rabin-Karp (Current Solution 2)
- Time: O(n + m) average
- Space: O(1)
- Use case: When space is limited, average case is acceptable
Approach 3: Naive String Matching
- Time: O(n × m)
- Space: O(1)
- Use case: Simple but inefficient for large inputs
Related Problems
- 28. Find the Index of the First Occurrence in a String - KMP application
- 214. Shortest Palindrome - KMP for palindrome
- 1392. Longest Happy Prefix - Prefix function
- 187. Repeated DNA Sequences - Rolling hash
Tags
String Matching, KMP, Knuth-Morris-Pratt, Rabin-Karp, Rolling Hash, Prefix Function, Medium