[Medium] 2539. Count the Number of Good Subsequences
[Medium] 2539. Count the Number of Good Subsequences
Given a string s, count how many non-empty subsequences are good.
A subsequence is good if every character that appears in it appears the same number of times.
Examples:
"aabb"— good counts include patterns where each present letter appears twice (e.g. twoas and twobs)."aab"— not good if bothaandbappear but with different frequencies (e.g. twoas and oneb).
Return the answer modulo 10^9 + 7.
Problem summary
Brute force over subsequences is hopeless. The trick is to count combinatorially using character frequencies.
Key insight
Instead of generating subsequences, fix a target frequency k, and count subsequences where every chosen character appears exactly k times (and you may omit any letter entirely).
Step-by-step
-
Frequency map:
freq[c] =number of occurrences ofcins. -
For a fixed
k: For each character withfreq[c] ≥ k, you may either skip that letter or take exactlykcopies of it. The number of ways to choose thosekcopies fromfreq[c]positions isC(freq[c], k).Independently across letters, the contribution for that letter is
(1 + C(freq[c], k)): the1is “do not include this letter,” andC(freq[c], k)is “include it exactlyktimes.” -
Multiply and subtract empty: The product over all letters gives all combinations including “choose nothing from everyone.” Subtract
1to remove the empty subsequence:ways(k) = ∏_{c : freq[c] ≥ k} (1 + C(freq[c], k)) - 1 -
Sum over
k: Letmax_f = max(freq.values()). The answer is:answer = Σ_{k=1}^{max_f} ways(k)(all moduloMOD).
Group characters by frequency (same formula, less redundancy)
Letters that share the same total count f behave identically for a fixed k. Let freqCount[f] be how many distinct characters have frequency f in s (i.e. Counter(freq.values())).
Then the product over characters becomes a product over distinct frequencies:
ways(k) = ∏_{f ≥ k} (1 + C(f, k))^{freqCount[f]} - 1
The inner loop runs over distinct frequency values (at most the alphabet size) and uses modular exponentiation instead of repeating the same factor once per letter.
Complexity (all correct variants)
- Time:
O(n)to scan the string and build counters; plus preprocessing for binomials; plus roughlyO(max_f × 26)for the double loop overkand letters (or over distinct frequencies — still bounded by a constant alphabet). - Space: dominated by how you store
C(n, k):Θ(max_f²)for a full Pascal table vsΘ(max_f)for factorial / inverse factorial arrays.
Approaches and trade-offs
Two independent choices:
- How you compute
C(n, k) mod MOD— Pascal row/column DP vs factorials + modular inverse. - How you multiply letter contributions — one factor per character vs one factor per distinct frequency (with an exponent when several letters share the same count).
| Approach | Precompute C |
Inner loop (per k) |
Space | Notes |
|---|---|---|---|---|
| Pascal triangle | O(max_f²) time, O(max_f²) space |
O(26) chars |
Huge if max_f ≈ 10⁴ |
Easy to write; impractical here. |
| Factorial + inverse | O(max_f) time, O(max_f) space |
O(26) over freq |
Linear in max_f |
Standard for prime modulus. |
Factorial + freq_count + pow |
same as above | O(F) distinct freqs, F ≤ 26 |
same | Same math; cleaner when many letters share one frequency. |
1) Pascal’s triangle (simple, wrong scale for large max_f)
from collections import Counter
MOD = 10**9 + 7
class Solution:
def countGoodSubsequences(self, s: str) -> int:
freq = Counter(s)
max_f = max(freq.values())
n = max_f
C = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(n + 1):
C[i][0] = 1
for j in range(1, i + 1):
C[i][j] = (C[i - 1][j - 1] + C[i - 1][j]) % MOD
result = 0
for k in range(1, max_f + 1):
ways = 1
for c in freq:
if freq[c] >= k:
ways = ways * (1 + C[freq[c]][k]) % MOD
ways = (ways - 1) % MOD
result = (result + ways) % MOD
return result
Trade-off: No modular inverses; very readable. But building (max_f + 1)² entries is O(max_f²) memory — with max_f up to 10⁴, that is on the order of 10⁸ cells (and similar time), which is not acceptable for this problem’s constraints.
2) Factorials + Fermat inverse (per character)
from collections import Counter
MOD = 10**9 + 7
class Solution:
def countGoodSubsequences(self, s: str) -> int:
freq = Counter(s)
max_f = max(freq.values())
fact = [1] * (max_f + 1)
invfact = [1] * (max_f + 1)
for i in range(1, max_f + 1):
fact[i] = fact[i - 1] * i % MOD
invfact[max_f] = pow(fact[max_f], MOD - 2, MOD)
for i in range(max_f - 1, -1, -1):
invfact[i] = invfact[i + 1] * (i + 1) % MOD
def comb(f: int, k: int) -> int:
return fact[f] * invfact[k] % MOD * invfact[f - k] % MOD
rtn = 0
for k in range(1, max_f + 1):
ways = 1
for f in freq.values():
if f >= k:
ways = ways * (1 + comb(f, k)) % MOD
ways = (ways - 1) % MOD
rtn = (rtn + ways) % MOD
return rtn
Trade-off: O(max_f) extra space and O(max_f) preprocessing; each comb in O(1). Requires MOD prime (here it is). Inner loop is one factor per letter that appears (at most 26).
3) Same as (2), but group identical frequencies (freq_count + pow)
If you build freq_count = Counter(freq.values()), you must multiply (1 + C(f, k)) once per distinct frequency f, raised to the number of letters with that frequency — not by iterating freq.values() again (that would repeat the same f many times and overcount).
from collections import Counter
MOD = 10**9 + 7
class Solution:
def countGoodSubsequences(self, s: str) -> int:
freq = Counter(s)
freq_count = Counter(freq.values())
max_f = max(freq_count)
fact = [1] * (max_f + 1)
invfact = [1] * (max_f + 1)
for i in range(1, max_f + 1):
fact[i] = fact[i - 1] * i % MOD
invfact[max_f] = pow(fact[max_f], MOD - 2, MOD)
for i in range(max_f - 1, -1, -1):
invfact[i] = invfact[i + 1] * (i + 1) % MOD
def comb(f: int, k: int) -> int:
return fact[f] * invfact[k] % MOD * invfact[f - k] % MOD
rtn = 0
for k in range(1, max_f + 1):
ways = 1
for f, cnt in freq_count.items():
if f >= k:
ways = ways * pow(1 + comb(f, k), cnt, MOD) % MOD
ways = (ways - 1) % MOD
rtn = (rtn + ways) % MOD
return rtn
Trade-off: Same asymptotic cost as (2) for this alphabet size; clearer algebra and fewer redundant multiplications when several characters share one frequency. Prefer this version for submissions: correct grouping, good constants, and O(max_f) precomputation.
Takeaway
For large max_f, avoid a dense O(max_f²) Pascal table; use factorials + inverse factorials modulo a prime. Whether you loop freq.values() or freq_count + pow is the same big‑O here, but freq_count matches the grouped formula directly and stays tidy when many letters share one frequency.