System Design: Notification/Push Service
Requirements
- Multi-channel: mobile push (APNs/FCM), email, SMS; templates, preferences, retries, dedupe.
Architecture
Producers → Topic (Kafka) → Router → Channel Workers (APNs/FCM/SMTP/SMS) → Providers ↘ Store (Postgres) for receipts, logs, preferences
SLOs
- Enqueue→provider ACK P95 < 300 ms; Exactly-once user experience via dedupe keys.
Capacity
- 100k notifications/sec fanout bursts; workers horizontally scaled; provider quotas respected.
Failure modes
- Provider outage → circuit break and requeue with exponential backoff; channel fallback.
What Interviewers Look For
Notification Systems Skills
- Multi-Channel Support
- APNs/FCM for mobile push
- Email/SMS channels
- Template management
- Red Flags: Single channel, no templates, poor management
- Reliability & Delivery
- Exactly-once semantics
- Retry mechanisms
- Deduplication
- Red Flags: Duplicate notifications, no retry, unreliable delivery
- Provider Integration
- Provider quotas
- Circuit breakers
- Fallback strategies
- Red Flags: No quota management, no circuit breakers, provider failures
Distributed Systems Skills
- Message Queue Design
- Kafka for high throughput
- Topic partitioning
- Consumer groups
- Red Flags: Wrong queue choice, no partitioning, bottlenecks
- Scalability Design
- Horizontal scaling
- Worker pools
- Load distribution
- Red Flags: Vertical scaling, bottlenecks, poor distribution
- SLO Management
- Latency targets (< 300ms)
- Throughput targets (100k/sec)
- Red Flags: No SLOs, high latency, low throughput
Problem-Solving Approach
- Failure Handling
- Provider outages
- Network failures
- Rate limiting
- Red Flags: Ignoring failures, no handling, poor recovery
- Edge Cases
- Duplicate notifications
- Provider quotas
- Channel failures
- Red Flags: Ignoring edge cases, no handling
- Trade-off Analysis
- Latency vs reliability
- Cost vs features
- Red Flags: No trade-offs, dogmatic choices
System Design Skills
- Component Design
- Router service
- Channel workers
- Preference store
- Red Flags: Monolithic, unclear boundaries
- Storage Design
- Receipt tracking
- Preference management
- Logging
- Red Flags: No tracking, no preferences, no logs
- Monitoring
- Delivery rates
- Provider health
- Latency metrics
- Red Flags: No monitoring, no metrics, no visibility
Communication Skills
- Architecture Explanation
- Can explain notification flow
- Understands multi-channel
- Red Flags: No understanding, vague explanations
- Reliability Explanation
- Can explain retry logic
- Understands deduplication
- Red Flags: No understanding, vague
Meta-Specific Focus
- Notification Systems Expertise
- Multi-channel knowledge
- Reliability focus
- Key: Show notification systems expertise
- Scale & Performance
- High throughput design
- Low latency
- Key: Demonstrate scale expertise