433. Minimum Genetic Mutation

Problem Statement

A gene string is represented by an 8-character string consisting only of 'A', 'C', 'G', and 'T'.

You are given a start gene string startGene, an end gene string endGene, and a list of valid gene strings bank.

You may mutate the gene from one string to another by changing exactly one character at a time. Each intermediate gene string must be in bank.

Return the minimum number of mutations needed to change startGene to endGene. If it is impossible, return -1.

Examples

Example 1:

Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
Output: 1

Example 2:

Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
Output: 2
# One shortest sequence:
# AACCGGTT → AACCGGTA → AAACGGTA

Example 3:

Input: startGene = "AAAAACCC", endGene = "AACCCCCC", bank = ["AAAACCCC","AAACCCCC","AACCCCCC"]
Output: 3

Constraints

startGene.length == 8
endGene.length == 8
1 <= bank.length <= 10^4
bank[i].length == 8
startGene, endGene, and bank[i] consist only of 'A', 'C', 'G', 'T'

Clarification Questions

Single-character mutation: Each step must change exactly one position?
Assumption: Yes — Hamming distance between consecutive genes must be 1.
Validity: Every intermediate gene must belong to bank?
Assumption: Yes, and we can only traverse through genes in bank.
Reusing genes: Can we revisit the same gene multiple times?
Assumption: We could, but it’s never helpful for shortest path, so we will mark visited genes and avoid revisiting.

Abstraction

Model this as a shortest path problem in an unweighted graph:

Each valid gene string is a node (including startGene if not in bank).
There is an edge between two genes if they differ by exactly one character.
We want the minimum number of edges from startGene to endGene.

This is a classic BFS-on-strings problem, similar to Word Ladder but with a fixed 4-letter alphabet and fixed length 8.

Baseline (Naive) — Why it’s suboptimal

Naive idea:

From current gene, scan all strings in bank and check if they differ by one character, then recurse/DFS.

This can:

Revisit many genes, causing exponential blowup.
Not guarantee we find the shortest path first (DFS explores deep before shallow).

Optimized Approach — BFS over gene space

Since each mutation step has equal cost 1, BFS from startGene to endGene guarantees the minimum number of mutations.

Strategy:

Put all bank genes into a set for O(1) membership.\n2. If endGene is not in bank, return -1 (cannot end at an invalid gene).\n3. BFS from startGene, level by level, treating each mutation as an edge.\n4. For a gene curr, generate all possible one-character mutations by trying the 4 bases 'A','C','G','T' at each position. When a mutation is in bank_set, push it to the queue and remove it from the set (mark visited).\n5. When we pop endGene, return the current steps count.

Complexity

Let L = 8 (fixed gene length), N = len(bank).
For each gene we process, we generate L * 4 = 32 potential neighbors and check membership in a set.
In the worst case we process each bank gene at most once.
Time: O(N * L * 4) ≈ O(N).
Space: O(N) for the bank set and BFS queue.

Key Insights

BFS for shortest path — All edges cost 1 mutation; BFS finds the minimum steps.
Generate neighbors by position + base — Instead of precomputing full adjacency, generate possible mutations on the fly.
Use a set for bank / visited — O(1) checks for valid, unvisited genes.

Python Solution (BFS)

from collections import deque
from typing import List


class Solution:
    def minMutation(self, startGene: str, endGene: str, bank: List[str]) -> int:
        bank_set = set(bank)
        if endGene not in bank_set:
            return -1

        q = deque([startGene])
        steps = 0
        genes = ["A", "C", "G", "T"]

        while q:
            for _ in range(len(q)):
                curr = q.popleft()
                if curr == endGene:
                    return steps
                for i in range(len(curr)):
                    for g in genes:
                        mutation = curr[:i] + g + curr[i + 1 :]
                        if mutation in bank_set:
                            q.append(mutation)
                            bank_set.remove(mutation)
            steps += 1
        return -1

Algorithm Explanation

We treat each gene as a node in a graph, with edges between genes differing by one character. We start BFS from startGene with steps = 0. For each level:

Pop all genes at the current distance.
If any equals endGene, return steps.
For each gene, try mutating each position with each of the 4 characters; if the resulting gene is in bank_set, we add it to the queue and remove it from bank_set to mark it visited.

After exhausting the queue without seeing endGene, we return -1 (no valid mutation sequence).

Complexity Analysis

Time: O(N * L * 4) where N = len(bank) and L = 8; effectively O(N).
Space: O(N) for bank_set and the BFS queue.

Edge Cases

endGene not in bank → immediately return -1.
startGene == endGene → if you want to be strict, return 0 (though LeetCode usually ensures startGene ≠ endGene).
Duplicate genes in bank are handled by the set.\n\n## Common Mistakes\n\n- Not checking endGene in bank first — Then you may do unnecessary work or incorrectly return a non-negative value.
Revisiting genes — Without marking visited (removing from set), BFS can cycle and blow up in time.
Precomputing all edges — You can, but generating neighbors on the fly is simpler and efficient enough.\n\n## Related Problems\n\n- LC 127: Word Ladder — Similar BFS over string space for shortest transformation sequence.
LC 1091: Shortest Path in Binary Matrix — BFS for shortest path in a grid.
LC 752: Open the Lock — BFS over state space with digits.\n*** End Patch”}``` завершается ошибкой because ApplyPatch expects just patch, not JSON. I already provided correctly: good. Let’s wait.}–>

Robina Li

433. Minimum Genetic Mutation

433. Minimum Genetic Mutation

Problem Statement

Examples

Constraints

Clarification Questions

Abstraction

Baseline (Naive) — Why it’s suboptimal

Optimized Approach — BFS over gene space

Complexity

Key Insights

Python Solution (BFS)

Algorithm Explanation

Complexity Analysis

Edge Cases

Related Posts

Recent Posts