686. Repeated String Match
686. Repeated String Match
Problem Statement
Given two strings a and b, return the minimum number of times you should repeat string a so that string b is a substring of it. If it is impossible for b to be a substring of a after repeating it, return -1.
Notice: String "abc" repeated 0 times is "", repeated 1 time is "abc", and repeated 2 times is "abcabc".
Examples
Example 1:
Input: a = "abcd", b = "cdabcdab"
Output: 3
Explanation: We return 3 because by repeating a three times "abcdabcdabcd", b is a substring of it.
Example 2:
Input: a = "a", b = "aa"
Output: 2
Example 3:
Input: a = "a", b = "a"
Output: 1
Example 4:
Input: a = "abc", b = "wxyz"
Output: -1
Constraints
1 <= a.length, b.length <= 10^4aandbconsist of lowercase English letters.
Clarification Questions
Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:
-
Repetition definition: What does “repeated string match” mean? (Assumption: Repeat string ‘a’ multiple times until ‘b’ becomes a substring of the repeated string)
-
Minimum repetitions: What are we trying to find? (Assumption: Minimum number of times to repeat ‘a’ so that ‘b’ is a substring)
-
Substring matching: Does ‘b’ need to be an exact substring? (Assumption: Yes - ‘b’ must appear as a contiguous substring in the repeated ‘a’)
-
No match: What should we return if ‘b’ can never be a substring? (Assumption: Return -1 - impossible to form ‘b’ as substring)
-
Case sensitivity: Are character comparisons case-sensitive? (Assumption: Based on constraints, only lowercase letters, so case doesn’t matter)
Interview Deduction Process (20 minutes)
Step 1: Brute-Force Approach (5 minutes)
Initial Thought: “I need to find minimum repetitions. Let me try repeating string a and checking if b is substring.”
Naive Solution: Repeat string a multiple times, check if b is substring using simple string matching. Try repetitions from 1 to some upper bound.
Complexity: O(n × m) per repetition check, where n = len(a), m = len(b)
Issues:
- Simple substring matching is inefficient
- May need many repetitions
- Doesn’t leverage string matching algorithms
- Timeout for large strings
Step 2: Semi-Optimized Approach (7 minutes)
Insight: “I can use KMP algorithm for efficient substring matching, or optimize repetition checking.”
Improved Solution: Use KMP algorithm for substring matching. Repeat a until length >= b, then check if b is substring. Can also use Rabin-Karp for alternative.
Complexity: O(n + m) per check with KMP, but need multiple repetitions
Improvements:
- KMP makes substring matching efficient
- Still need to determine upper bound for repetitions
- Much faster than naive matching
Step 3: Optimized Solution (8 minutes)
Final Optimization: “I can determine upper bound: need at most ceil(m/n) + 1 repetitions. Use KMP for matching.”
Best Solution: Calculate upper bound: need at most ceil(len(b)/len(a)) + 1 repetitions. Use KMP algorithm for efficient substring matching. Can also use Rabin-Karp as alternative.
Complexity: O(n + m) time with KMP, O(n + m) space
Key Realizations:
- Upper bound is ceil(m/n) + 1 repetitions
- KMP algorithm is optimal for substring matching
- O(n + m) time is optimal for string matching
- Rabin-Karp is alternative with similar complexity
Solution Approach
This problem requires finding the minimum number of repetitions of string a such that string b becomes a substring. This is a string matching problem that can be efficiently solved using advanced algorithms like Knuth-Morris-Pratt (KMP) or Rabin-Karp.
Key Insights:
- Minimum Repetitions: Need at least
⌈b.length / a.length⌉repetitions - Maximum Repetitions: At most
⌈b.length / a.length⌉ + 1repetitions needed - Circular Matching: Since
ais repeated, we can use circular string matching - Efficient Algorithms: KMP or Rabin-Karp provide O(n + m) time complexity
Algorithm Options:
- KMP Algorithm: Preprocess pattern to avoid backtracking
- Rabin-Karp: Use rolling hash for efficient substring matching
- Naive Approach: O(n × m) - check each position
String Matching Algorithms
Knuth-Morris-Pratt (KMP) Algorithm
KMP is an efficient string-searching algorithm that preprocesses the pattern to create a prefix function (also called LPS - Longest Proper Prefix which is also a Suffix). This allows skipping unnecessary comparisons.
Key Concepts:
- Prefix Function (π/LPS): For each position
iin pattern,π[i]is the length of the longest proper prefix that is also a suffix ofpattern[0..i] - No Backtracking: When a mismatch occurs, we don’t reset to the beginning but use the prefix function to determine the next position
- Time Complexity: O(n + m) where n = text length, m = pattern length
How KMP Works:
- Preprocessing Phase: Build prefix function for pattern
- For each position, find longest prefix-suffix match
- Use previous values to compute current value efficiently
- Search Phase: Match pattern in text
- Compare characters from left to right
- On mismatch, use prefix function to skip ahead
- Never backtrack in text
Prefix Function Example:
For pattern "ababaca":
Pattern: a b a b a c a
Index: 0 1 2 3 4 5 6
π[i]: 0 0 1 2 3 0 1
Explanation:
- π[0] = 0 (no proper prefix)
- π[1] = 0 ("ab" has no prefix-suffix match)
- π[2] = 1 ("aba" has "a" as prefix-suffix)
- π[3] = 2 ("abab" has "ab" as prefix-suffix)
- π[4] = 3 ("ababa" has "aba" as prefix-suffix)
- π[5] = 0 ("ababac" has no prefix-suffix match)
- π[6] = 1 ("ababaca" has "a" as prefix-suffix)
Rabin-Karp Algorithm
Rabin-Karp uses rolling hash to efficiently compute hash values for substrings. It compares hash values first, then verifies with character-by-character comparison if hashes match.
Key Concepts:
- Rolling Hash: Compute hash of substring in O(1) time using previous hash
- Hash Function: Use polynomial rolling hash:
hash = (hash * base + char) % mod - Collision Handling: When hashes match, verify with actual string comparison
- Time Complexity: Average O(n + m), worst case O(n × m) if many hash collisions
How Rabin-Karp Works:
- Precompute Pattern Hash: Calculate hash of pattern string
- Rolling Hash in Text:
- Compute hash of first window
- Slide window and update hash in O(1)
- Compare hashes, verify if match
- Base and Modulo: Use large base and prime modulo to reduce collisions
Rolling Hash Formula:
For substring s[i..i+m-1]:
hash = (s[i] * base^(m-1) + s[i+1] * base^(m-2) + ... + s[i+m-1]) % mod
To slide window from i to i+1:
new_hash = ((old_hash - s[i] * base^(m-1)) * base + s[i+m]) % mod
Solution 1: KMP Algorithm (Recommended)
Solution: KMP with Circular String Matching
class Solution {
private:
int strStr(string &haystack, string &needle) {
const int n = haystack.size(), m = needle.size();
if(m == 0) return 0;
vector<int> pi(m);
for(int i = 1, j = 0; i < m; i++) {
while(j > 0 && needle[i] != needle[j]) {
j = pi[j - 1];
}
if(needle[i] == needle[j]) {
j++;
}
pi[i] = j;
}
for(int i = 0, j = 0; i - j < n; i++) {
while(j > 0 && haystack[i % n] != needle[j]) {
j = pi[j - 1];
}
if(haystack[i % n] == needle[j]) {
j++;
}
if(j == m) {
return i - m + 1;
}
}
return -1;
}
public:
int repeatedStringMatch(string a, string b) {
const int an = a.size(), bn = b.size();
int idx = strStr(a, b);
if(idx == -1) return -1;
if(an - idx >= bn) {
return 1;
}
return (bn + idx - an - 1) / an + 2;
}
};
Algorithm Explanation:
- Prefix Function Construction (Lines 7-15):
- Build
piarray for patternneedle pi[i]= length of longest prefix-suffix forneedle[0..i]- Use previous values to compute efficiently
- Build
- KMP Search with Circular Matching (Lines 16-28):
- Search for
needlein circularhaystack - Use
i % nto handle circular indexing - On mismatch:
j = pi[j - 1](skip using prefix function) - On match: increment
j - When
j == m: pattern found at positioni - m + 1
- Search for
- Calculate Minimum Repetitions (Lines 30-38):
- If pattern found at
idx:- If remaining characters in first repetition ≥
bn: return 1 - Otherwise: calculate repetitions needed
- Formula:
(bn + idx - an - 1) / an + 2
- If remaining characters in first repetition ≥
- If pattern found at
Example Walkthrough:
For a = "abcd", b = "cdabcdab":
Step 1: Build prefix function for "cdabcdab"
Pattern: c d a b c d a b
π[i]: 0 0 0 0 1 2 3 4
Step 2: KMP search in circular "abcd"
Text (circular): a b c d a b c d a b c d ...
Pattern: c d a b c d a b
i=0: 'a' vs 'c' → mismatch, j=0
i=1: 'b' vs 'c' → mismatch, j=0
i=2: 'c' vs 'c' → match, j=1
i=3: 'd' vs 'd' → match, j=2
i=4: 'a' vs 'a' → match, j=3
i=5: 'b' vs 'b' → match, j=4
i=6: 'c' vs 'c' → match, j=5
i=7: 'd' vs 'd' → match, j=6
i=8: 'a' vs 'a' → match, j=7
i=9: 'b' vs 'b' → match, j=8 → FOUND!
idx = 9 - 8 + 1 = 2
an = 4, bn = 8
an - idx = 4 - 2 = 2 < 8
repetitions = (8 + 2 - 4 - 1) / 4 + 2 = 5/4 + 2 = 1 + 2 = 3
Solution 2: Rabin-Karp Algorithm
Solution: Rabin-Karp with Rolling Hash
class Solution {
private:
const long long BASE = 256;
const long long MOD = 1e9 + 7;
long long computeHash(const string& s, int start, int len) {
long long hash = 0;
for(int i = 0; i < len; i++) {
hash = (hash * BASE + s[start + i]) % MOD;
}
return hash;
}
long long updateHash(long long oldHash, char remove, char add, long long power) {
oldHash = (oldHash - (remove * power) % MOD + MOD) % MOD;
oldHash = (oldHash * BASE + add) % MOD;
return oldHash;
}
bool verifyMatch(const string& text, int start, const string& pattern) {
for(int i = 0; i < pattern.size(); i++) {
if(text[start + i] != pattern[i]) return false;
}
return true;
}
int rabinKarpSearch(const string& text, const string& pattern, bool circular) {
int n = text.size(), m = pattern.size();
if(m == 0) return 0;
if(n < m) return -1;
// Compute pattern hash
long long patternHash = computeHash(pattern, 0, m);
// Compute power for rolling hash
long long power = 1;
for(int i = 0; i < m - 1; i++) {
power = (power * BASE) % MOD;
}
// Compute initial window hash
long long textHash = computeHash(text, 0, m);
// Check first window
if(textHash == patternHash && verifyMatch(text, 0, pattern)) {
return 0;
}
// Rolling hash
int maxIterations = circular ? n + m - 1 : n - m + 1;
for(int i = 1; i < maxIterations; i++) {
int removeIdx = (i - 1) % n;
int addIdx = (i + m - 1) % n;
textHash = updateHash(textHash, text[removeIdx], text[addIdx], power);
if(textHash == patternHash) {
int start = i % n;
if(verifyMatch(text, start, pattern)) {
return i;
}
}
}
return -1;
}
public:
int repeatedStringMatch(string a, string b) {
int an = a.size(), bn = b.size();
// Try circular search
int idx = rabinKarpSearch(a, b, true);
if(idx == -1) return -1;
if(an - idx >= bn) {
return 1;
}
return (bn + idx - an - 1) / an + 2;
}
};
Algorithm Explanation:
- Hash Computation (Lines 6-11):
- Compute hash for substring using polynomial rolling hash
- Formula:
hash = (hash * BASE + char) % MOD
- Rolling Hash Update (Lines 13-17):
- Remove leftmost character: subtract
remove * power - Add rightmost character: multiply by BASE and add
- Handle modulo arithmetic correctly
- Remove leftmost character: subtract
- Verification (Lines 19-24):
- When hashes match, verify with character-by-character comparison
- Prevents false positives from hash collisions
- Rabin-Karp Search (Lines 26-58):
- Precompute pattern hash and power value
- Slide window and update hash in O(1)
- Check hash matches, verify if needed
- Support circular string matching
KMP Template
Here’s the general template for KMP algorithm:
class KMP {
private:
// Build prefix function (LPS array)
vector<int> buildPrefixFunction(const string& pattern) {
int m = pattern.size();
vector<int> pi(m, 0);
for(int i = 1, j = 0; i < m; i++) {
// Mismatch: backtrack using prefix function
while(j > 0 && pattern[i] != pattern[j]) {
j = pi[j - 1];
}
// Match: extend prefix
if(pattern[i] == pattern[j]) {
j++;
}
pi[i] = j;
}
return pi;
}
public:
// Search for pattern in text
int search(const string& text, const string& pattern) {
int n = text.size(), m = pattern.size();
if(m == 0) return 0;
vector<int> pi = buildPrefixFunction(pattern);
for(int i = 0, j = 0; i < n; i++) {
// Mismatch: use prefix function to skip
while(j > 0 && text[i] != pattern[j]) {
j = pi[j - 1];
}
// Match: advance both pointers
if(text[i] == pattern[j]) {
j++;
}
// Pattern found
if(j == m) {
return i - m + 1; // Return starting index
}
}
return -1; // Not found
}
};
Key Template Components:
- Prefix Function (π/LPS):
pi[i]= longest prefix-suffix length forpattern[0..i]- Built in O(m) time
- Search Algorithm:
- No backtracking in text
- Use prefix function to skip on mismatch
- Time complexity: O(n + m)
- Circular Matching:
- Use
i % nfor circular text - Adjust loop condition:
i - j < n
- Use
Rabin-Karp Template
Here’s the general template for Rabin-Karp algorithm:
class RabinKarp {
private:
const long long BASE = 256;
const long long MOD = 1e9 + 7;
long long computeHash(const string& s, int start, int len) {
long long hash = 0;
for(int i = 0; i < len; i++) {
hash = (hash * BASE + s[start + i]) % MOD;
}
return hash;
}
long long updateHash(long long oldHash, char remove, char add, long long power) {
oldHash = (oldHash - (remove * power) % MOD + MOD) % MOD;
oldHash = (oldHash * BASE + add) % MOD;
return oldHash;
}
public:
int search(const string& text, const string& pattern) {
int n = text.size(), m = pattern.size();
if(m == 0) return 0;
if(n < m) return -1;
// Precompute pattern hash
long long patternHash = computeHash(pattern, 0, m);
// Precompute BASE^(m-1) for rolling hash
long long power = 1;
for(int i = 0; i < m - 1; i++) {
power = (power * BASE) % MOD;
}
// Initial window hash
long long textHash = computeHash(text, 0, m);
// Check first window
if(textHash == patternHash) {
if(text.substr(0, m) == pattern) return 0;
}
// Rolling hash: slide window
for(int i = 1; i <= n - m; i++) {
textHash = updateHash(textHash, text[i-1], text[i+m-1], power);
if(textHash == patternHash) {
if(text.substr(i, m) == pattern) return i;
}
}
return -1;
}
};
Key Template Components:
- Hash Function:
- Polynomial rolling hash:
hash = (hash * BASE + char) % MOD - BASE = 256 (for ASCII), MOD = large prime
- Polynomial rolling hash:
- Rolling Hash:
- Update hash in O(1) when sliding window
- Remove left char, add right char
- Collision Handling:
- Always verify hash matches with actual string comparison
- Prevents false positives
Complexity Analysis
Solution 1: KMP
Time Complexity: O(n + m)
- Prefix function: O(m) - build LPS array
- Search phase: O(n) - each character visited at most twice
- Total: O(n + m) where n = a.length, m = b.length
Space Complexity: O(m)
- Prefix function array: O(m)
- Total: O(m)
Solution 2: Rabin-Karp
Time Complexity: O(n + m) average, O(n × m) worst case
- Hash computation: O(m) for pattern
- Rolling hash: O(n) for text (O(1) per window)
- Verification: O(m) per hash match (rare collisions)
- Total: O(n + m) average, O(n × m) worst case with many collisions
Space Complexity: O(1)
- Hash variables: O(1)
- Total: O(1) excluding input strings
Key Points
- KMP is Optimal: Guaranteed O(n + m) time, no worst-case degradation
- Rabin-Karp: Average O(n + m), but can degrade with hash collisions
- Circular Matching: Use modulo arithmetic for repeated strings
- Minimum Repetitions: At least
⌈b.length / a.length⌉, at most⌈b.length / a.length⌉ + 1 - Prefix Function: Key to KMP’s efficiency - avoids backtracking
- Rolling Hash: Key to Rabin-Karp’s efficiency - O(1) hash updates
Comparison: KMP vs Rabin-Karp
| Aspect | KMP | Rabin-Karp |
|---|---|---|
| Time Complexity | O(n + m) guaranteed | O(n + m) average, O(n × m) worst |
| Space Complexity | O(m) | O(1) |
| Preprocessing | O(m) for prefix function | O(m) for pattern hash |
| Backtracking | None (no text backtracking) | None (sliding window) |
| Collision Handling | Not needed | Required (verify matches) |
| Implementation | More complex | Simpler |
| Recommended | ✅ Yes (guaranteed performance) | ⚠️ Good for average case |
Alternative Approaches
Approach 1: KMP (Current Solution 1)
- Time: O(n + m)
- Space: O(m)
- Best for: Guaranteed performance, no worst-case degradation
Approach 2: Rabin-Karp (Current Solution 2)
- Time: O(n + m) average
- Space: O(1)
- Use case: When space is limited, average case is acceptable
Approach 3: Naive String Matching
- Time: O(n × m)
- Space: O(1)
- Use case: Simple but inefficient for large inputs
Related Problems
- 28. Find the Index of the First Occurrence in a String - KMP application
- 214. Shortest Palindrome - KMP for palindrome
- 1392. Longest Happy Prefix - Prefix function
- 187. Repeated DNA Sequences - Rolling hash
Tags
String Matching, KMP, Knuth-Morris-Pratt, Rabin-Karp, Rolling Hash, Prefix Function, Medium