System Design: URL Shortener
Requirements
- Shorten and resolve URLs; custom aliases; TTL; analytics optional.
APIs
POST /v1/shorten { long_url, custom? } -> { short_code }
GET /{short_code} -> 302 Location: long_url
High-level
Clients → API → Cache (Redis) → DB (MySQL/Cassandra) → Analytics (Kafka)
Data model
urls(code PK, long_url, created_at, expires_at, owner)
Code generation
- Base62 from numeric id; or hash(long_url) + collision check; reserve words blacklist.
Consistency
- Strong on create (unique code constraint); eventual for analytics counts.
Capacity
- 1B reads/day (~11.6k rps avg, 100k rps peak); 100M writes/day (~1.1k rps).
- Cache hit > 95% for resolves; Redis with TTL and negative caching.
SLOs
- Resolve P95 < 50 ms; Shorten P95 < 200 ms; 99.9% availability.
Failures
- Cache miss storm → single-flight; warmup popular codes; rate limit.
Evolution
- Shard by code prefix; add geo DNS; move analytics to stream processing.
DDL sketch
CREATE TABLE urls (
code varchar(10) PRIMARY KEY,
long_url text NOT NULL,
created_at timestamptz NOT NULL,
expires_at timestamptz,
owner bigint
);
Cache strategy
- Cache‑aside with negative caching for 404s; TTL skew to avoid stampedes; single‑flight locks per code.
Failure drills
- DB hot partition → rehash codes; add read replicas; protect with rate limit per IP.
What Interviewers Look For
URL Shortener Skills
- Code Generation
- Base62 encoding
- Collision handling
- Custom aliases
- Red Flags: Collisions, no custom aliases, inefficient encoding
- Caching Strategy
- Cache-aside pattern
- Negative caching
- TTL management
- Red Flags: No caching, cache stampedes, poor hit rate
- High Read/Write Ratio
- Read-heavy optimization
- Cache hit rate > 95%
- Red Flags: Poor read performance, low cache hit rate, slow resolves
Distributed Systems Skills
- Scalability Design
- Sharding by code prefix
- Horizontal scaling
- Red Flags: No sharding, vertical scaling, bottlenecks
- Consistency Models
- Strong consistency for creates
- Eventual for analytics
- Red Flags: Wrong consistency, no understanding
- Idempotency
- Unique code constraint
- Safe retries
- Red Flags: No idempotency, duplicate codes, race conditions
Problem-Solving Approach
- Cache Stampede Prevention
- Single-flight locks
- Warmup strategies
- Red Flags: Cache stampedes, no protection, poor performance
- Edge Cases
- Hot partitions
- Cache misses
- Expired URLs
- Red Flags: Ignoring edge cases, no handling
- Trade-off Analysis
- Consistency vs performance
- Storage vs cost
- Red Flags: No trade-offs, dogmatic choices
System Design Skills
- Component Design
- Shorten service
- Resolve service
- Analytics service
- Red Flags: Monolithic, unclear boundaries
- Database Design
- Proper indexing
- Sharding strategy
- Red Flags: Missing indexes, no sharding, poor queries
- Analytics Design
- Click tracking
- Stream processing
- Red Flags: No analytics, synchronous processing, bottlenecks
Communication Skills
- Architecture Explanation
- Can explain code generation
- Understands caching strategy
- Red Flags: No understanding, vague explanations
- Scale Explanation
- Can explain scaling strategies
- Understands bottlenecks
- Red Flags: No understanding, vague
Meta-Specific Focus
- High-Throughput Systems Expertise
- Read-heavy optimization
- Caching expertise
- Key: Show high-throughput systems expertise
- Simple but Scalable Design
- Clean architecture
- Efficient operations
- Key: Demonstrate simple but scalable design