1233. Remove Sub-Folders from the Filesystem
1233. Remove Sub-Folders from the Filesystem
Problem Statement
Given a list of folders folder, return the folders after removing all sub-folders in those folders. You may return the answer in any order.
If a folder[i] is located within another folder[j], it is called a sub-folder of it.
The format of a path is one or more concatenated strings of the form: '/' followed by one or more lowercase English letters.
For example, "/leetcode" and "/leetcode/problems" are valid paths while an empty string and "/" are not.
Examples
Example 1:
Input: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]
Output: ["/a","/c/d","/c/f"]
Explanation: Folders "/a/b" is a subfolder of "/a" and "/c/d/e" is inside of folder "/c/d" in our filesystem.
Example 2:
Input: folder = ["/a","/a/b/c","/a/b/d"]
Output: ["/a"]
Explanation: Folders "/a/b/c" and "/a/b/d" will be removed because they are subfolders of "/a".
Example 3:
Input: folder = ["/a/b/c","/a/b/ca","/a/b/d"]
Output: ["/a/b/c","/a/b/ca","/a/b/d"]
Explanation: None of the folders are subfolders of another folder.
Constraints
1 <= folder.length <= 4 * 10^42 <= folder[i].length <= 100folder[i]contains only lowercase letters and'/'folder[i]always starts with'/'- Each folder name is unique
Clarification Questions
Before diving into the solution, here are 5 important clarifications and assumptions to discuss during an interview:
-
Subfolder definition: What exactly defines a subfolder? (Assumption: Folder A is a subfolder of folder B if A’s path starts with B’s path followed by ‘/’, e.g., “/a/b” is subfolder of “/a”)
-
Path format: Are all paths absolute (starting with ‘/’)? (Assumption: Yes - all paths start with ‘/’ and don’t have trailing ‘/’)
-
Case sensitivity: Are folder names case-sensitive? (Assumption: Yes - only lowercase letters are allowed per constraints)
-
Empty folders: Can there be empty folder paths? (Assumption: No - minimum length is 2, so at least “/a” format)
-
Duplicate folders: Can the same folder path appear multiple times? (Assumption: Based on constraints, folder names are unique, so no duplicates)
Interview Deduction Process (20 minutes)
Step 1: Brute-Force Approach (5 minutes)
For each folder path, check if it’s a subfolder of any other folder by checking if any other path is a prefix of it. Remove all folders that are subfolders of others. This requires comparing each folder with every other folder, giving O(n² × m) complexity where n is the number of folders and m is the average path length. This is too slow for large inputs.
Step 2: Semi-Optimized Approach (7 minutes)
Sort the folder paths lexicographically. After sorting, if a folder is a subfolder of another, its parent must appear before it in the sorted order. Iterate through sorted paths, and for each path, check if it starts with the previous path followed by ‘/’. If yes, it’s a subfolder and should be removed. This reduces complexity to O(n log n + n × m) for sorting and checking, which is better but can be optimized further.
Step 3: Optimized Solution (8 minutes)
Use a Trie data structure to efficiently check prefix relationships. Insert all folder paths into a trie, marking nodes that represent complete folder paths. When inserting, if we encounter a node already marked as a complete path, the current path is a subfolder and should be skipped. Alternatively, after building the trie, traverse it and collect only paths that are not subfolders. This achieves O(n × m) time complexity where m is the average path length, as we process each character once. The trie naturally handles prefix checking efficiently, making it the optimal approach for this problem.
Solution Approach
This problem requires identifying and removing folder paths that are subfolders of other folders. A folder is a subfolder if its path is a prefix of another folder’s path.
Key Insights:
- Prefix Matching: A folder is a subfolder if its path is a prefix of another folder
- Trie Approach: Build a trie of all folders, then check if any prefix is marked as end
- Sorting Approach: Sort folders lexicographically, then check if current folder is a prefix of previous
- Path Parsing: Split paths by
'/'to process folder names individually
Solution 1: Trie-Based Approach
struct TrieNode{
bool isEnd;
unordered_map<string, TrieNode*> children;
TrieNode(): isEnd(false){}
};
class Solution {
public:
Solution(): root(new TrieNode()) {}
~Solution() {deleteTrie(root);}
vector<string> removeSubfolders(vector<string>& folder) {
for(auto& path: folder) {
TrieNode* curr = root;
istringstream iss(path);
string folderName;
while(getline(iss, folderName, '/')) {
if(folderName.empty()) continue;
if(!curr->children.contains(folderName)) {
curr->children[folderName] = new TrieNode();
}
curr = curr->children[folderName];
}
curr->isEnd = true;
}
vector<string> rtn;
for(auto& path: folder) {
TrieNode* curr = root;
istringstream iss(path);
string folderName;
bool isSubFolder = false;
while(getline(iss, folderName, '/')) {
if(folderName.empty()) continue;
auto it = curr->children.find(folderName);
if(it == curr->children.end()) break;
TrieNode* nextNode = it->second;
if(nextNode->isEnd && iss.rdbuf()->in_avail() != 0) {
isSubFolder = true;
break;
}
curr = nextNode;
}
if(!isSubFolder) rtn.push_back(path);
}
return rtn;
}
private:
TrieNode* root;
void deleteTrie(TrieNode* node) {
if(!node) return;
for(auto& [_, child]: node->children) {
deleteTrie(child);
}
delete node;
}
};
Algorithm Explanation:
TrieNode Structure:
isEnd: Marks if this node represents the end of a folder pathchildren: Map from folder name to child TrieNode
Build Trie (First Pass):
- Parse Each Path: Split path by
'/'usingistringstreamandgetline - Skip Empty: Ignore empty strings (from leading
/) - Create Nodes: For each folder name, create node if missing
- Mark End: Mark final node as
isEnd = true
Check Subfolders (Second Pass):
- Traverse Path: For each folder path, traverse trie following folder names
- Check Prefix: If we encounter a node with
isEnd = trueand there are more folders remaining (iss.rdbuf()->in_avail() != 0), it’s a subfolder - Add to Result: Only add folders that are not subfolders
Memory Management:
- Destructor: Recursively deletes all nodes to prevent memory leaks
deleteTrie: Helper function for recursive deletion
Example Walkthrough:
Input: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]
Step 1: Build Trie
"/a": root -> "a" (isEnd=true)
"/a/b": root -> "a" -> "b" (isEnd=true)
"/c/d": root -> "c" -> "d" (isEnd=true)
"/c/d/e": root -> "c" -> "d" -> "e" (isEnd=true)
"/c/f": root -> "c" -> "f" (isEnd=true)
Step 2: Check Each Path
"/a": Traverse "a", isEnd=true, no more folders → NOT subfolder ✓
"/a/b": Traverse "a", isEnd=true, more folders exist → IS subfolder ✗
"/c/d": Traverse "c" -> "d", isEnd=true, no more → NOT subfolder ✓
"/c/d/e": Traverse "c" -> "d", isEnd=true, more exist → IS subfolder ✗
"/c/f": Traverse "c" -> "f", isEnd=true, no more → NOT subfolder ✓
Result: ["/a","/c/d","/c/f"] ✓
Complexity Analysis:
- Time Complexity: O(n × m)
- Building trie: O(n × m) where n = number of folders, m = average path length
- Checking subfolders: O(n × m)
- Overall: O(n × m)
- Space Complexity: O(n × m)
- Trie structure: O(n × m) in worst case
- Result array: O(n)
- Overall: O(n × m)
Solution 2: Sorting-Based Approach
class Solution {
public:
vector<string> removeSubfolders(vector<string>& folder) {
sort(folder.begin(), folder.end());
vector<string> rtn;
for(const string& f: folder) {
if(rtn.empty() || f.compare(0, rtn.back().size(), rtn.back()) != 0 ||
f[rtn.back().size()] != '/'){
rtn.push_back(f);
}
}
return rtn;
}
};
Algorithm Explanation:
- Sort Folders: Sort all folders lexicographically
- After sorting, parent folders come before their subfolders
- Example:
["/a", "/a/b", "/c/d"]stays in order
- Check Prefix: For each folder:
- If result is empty, add it
- Otherwise, check if current folder is a prefix of the last added folder
- Key Check:
f.compare(0, rtn.back().size(), rtn.back()) != 0- Compares first
rtn.back().size()characters offwithrtn.back() - Returns non-zero if they differ
- Compares first
- Additional Check:
f[rtn.back().size()] != '/'- Ensures the next character after prefix is
/(not just a prefix match) - Prevents false positives like
/a/bcmatching/a/b
- Ensures the next character after prefix is
- Add to Result: Only add if not a subfolder
Example Walkthrough:
Input: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]
Step 1: Sort
["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]
Step 2: Process
"/a": rtn empty → add → rtn = ["/a"]
"/a/b": compare("/a/b", 0, 2, "/a") = 0 (matches)
"/a/b"[2] = '/' → IS subfolder ✗
"/c/d": compare("/c/d", 0, 2, "/a") != 0 → NOT subfolder ✓
rtn = ["/a", "/c/d"]
"/c/d/e": compare("/c/d/e", 0, 4, "/c/d") = 0 (matches)
"/c/d/e"[4] = '/' → IS subfolder ✗
"/c/f": compare("/c/f", 0, 4, "/c/d") != 0 → NOT subfolder ✓
rtn = ["/a", "/c/d", "/c/f"]
Result: ["/a","/c/d","/c/f"] ✓
Why the Additional Check is Needed:
Consider: folder = ["/a/b", "/a/bc"]
- Without
f[rtn.back().size()] != '/'check:/a/bcwould be incorrectly identified as subfolder of/a/b
- With the check:
"/a/bc"[4] = 'c'(not/) → NOT subfolder ✓
Complexity Analysis:
- Time Complexity: O(n log n + n × m)
- Sorting: O(n log n)
- Processing: O(n × m) where m = average path length
- Overall: O(n log n + n × m)
- Space Complexity: O(1) excluding output
- Sorting: O(1) if in-place
- Result array: O(n)
- Overall: O(n) for output
Key Insights
- Prefix Matching: A folder is a subfolder if its path is a prefix of another folder
- Trie Approach:
- Efficient for checking if any prefix exists
- More memory intensive but clearer logic
- Sorting Approach:
- Simpler code, more efficient
- Leverages lexicographic ordering property
- Path Parsing: Use
istringstreamandgetlineto split by/ - Edge Case: Check that prefix match is followed by
/to avoid false positives
Edge Cases
- Single folder:
["/a"]→["/a"] - No subfolders:
["/a/b", "/c/d"]→["/a/b", "/c/d"] - All subfolders:
["/a", "/a/b", "/a/b/c"]→["/a"] - Similar prefixes:
["/a/b", "/a/bc"]→ Both kept (not subfolders) - Deep nesting:
["/a", "/a/b", "/a/b/c", "/a/b/c/d"]→["/a"]
Common Mistakes
- False prefix match:
/a/bcmatching/a/bwithout checking next character - Wrong comparison: Using
startsWithwithout verifying/boundary - Memory leaks: Not deleting trie nodes in C++
- Empty string handling: Not skipping empty strings from leading
/ - Sorting order: Not understanding lexicographic ordering
Comparison of Approaches
| Aspect | Trie Approach | Sorting Approach |
|---|---|---|
| Time | O(n × m) | O(n log n + n × m) |
| Space | O(n × m) | O(n) |
| Code Complexity | More verbose | Simpler |
| Memory Management | Requires cleanup | No cleanup needed |
| Clarity | More explicit | More concise |
| Best For | When paths are long | When simplicity preferred |
Related Problems
- LC 208: Implement Trie (Prefix Tree) - Trie basics
- LC 648: Replace Words - Trie for prefix replacement
- LC 720: Longest Word in Dictionary - Trie traversal
- LC 14: Longest Common Prefix - Prefix matching
This problem demonstrates two approaches to prefix matching: Trie for explicit path structure representation and Sorting for leveraging lexicographic ordering. The sorting approach is generally preferred for its simplicity and efficiency.