Design a Pub/Sub System - System Design Interview

Introduction

Designing a publish-subscribe (pub/sub) messaging system is a complex distributed systems problem that tests your ability to build low-latency, real-time messaging infrastructure. The system must support publishers sending messages to topics and subscribers receiving those messages in near real-time, with the key characteristic that messages are ephemeral—they are lost if no subscribers are currently listening.

This post provides a detailed walkthrough of designing a pub/sub system, covering key architectural decisions, topic routing, connection management, ephemeral delivery guarantees, hot-topic handling, backpressure, and horizontal scaling. This is a common system design interview question that tests your understanding of distributed systems, WebSocket connections, message fan-out, and real-time communication patterns.

Problem Statement
Requirements
- Functional Requirements
- Non-Functional Requirements
Capacity Estimation
- Traffic Estimates
- Storage Estimates
Core Entities
API
Data Flow
High-Level Design
Deep Dive
What Interviewers Look For
Summary

Problem Statement

Design a publish-subscribe messaging system where:

Publishers can send messages to topics
Subscribers can listen to topics and receive messages in real-time
Messages are ephemeral—lost if no subscribers are currently listening
Low-latency delivery to connected subscribers
Support for topic management and permissions
Handle hot topics with many subscribers
Support horizontal scaling

Scale Requirements:

1M+ concurrent subscribers
100K+ topics
1M+ messages per second
< 50ms message delivery latency (P95)
Support topics with 100K+ subscribers
Handle connection churn (frequent connect/disconnect)

Key Characteristics:

Ephemeral Delivery: Messages only delivered to currently connected subscribers
At-Most-Once: Messages may be lost (no durability)
Real-Time: Low-latency delivery (< 50ms)
No Retention: Messages not stored or replayed
Topic-Based: Messages routed by topic

Requirements

Functional Requirements

Core Features:

Topic Management: Create topics, manage permissions (who can publish/subscribe)
Publish Messages: Publishers send messages to topics
Subscribe to Topics: Subscribers connect and listen to topics
Unsubscribe: Subscribers can unsubscribe from topics
Real-Time Delivery: Messages delivered immediately to connected subscribers
Presence: See who’s subscribed to a topic
Ephemeral Delivery: Messages dropped if no subscribers connected

Topic Permissions:

Public: Anyone can publish/subscribe
Private: Only authorized users can publish/subscribe
Publish-Only: Anyone can publish, only authorized can subscribe
Subscribe-Only: Only authorized can publish, anyone can subscribe

Out of Scope:

Message persistence/durability
Message replay/history
Message ordering guarantees (best-effort)
Message acknowledgments
Dead letter queues

Non-Functional Requirements

Availability: 99.9% uptime
Performance:
- Message delivery latency: < 50ms (P95)
- Topic creation: < 100ms
- Subscribe/unsubscribe: < 100ms
Scalability: Handle 1M+ concurrent subscribers, 1M+ messages/second
Consistency: Eventually consistent for presence (who’s subscribed)
Ephemeral: No message storage, messages lost if no subscribers
Real-Time: Messages delivered immediately to connected subscribers

Capacity Estimation

Traffic Estimates

Concurrent Subscribers: 1M+
Topics: 100K+
Messages per Second: 1M+ (peak)
Average Subscribers per Topic: 10
Hot Topics: 100K+ subscribers per topic
Connection Churn: 10% per minute (100K connects/disconnects per minute)

Storage Estimates

Topic Metadata:

100K topics × 1KB = 100MB
Topic permissions, settings

Connection State:

1M connections × 256 bytes = 256MB (in-memory)
Connection-to-topic mappings

Presence Data:

1M subscribers × 128 bytes = 128MB (in-memory)
Who’s subscribed to which topics

Total Storage: ~500MB (mostly in-memory, no message storage)

Core Entities

Topic

Attributes: topic_id, name, owner_id, permissions, created_at, subscriber_count
Permissions: PUBLIC, PRIVATE, PUBLISH_ONLY, SUBSCRIBE_ONLY
Relationships: Has many subscribers, receives many messages

Subscription

Attributes: subscription_id, topic_id, user_id, subscribed_at, connection_id
Relationships: Links user to topic via connection
Purpose: Track active subscriptions

Message

Attributes: message_id, topic_id, publisher_id, payload, timestamp
Relationships: Belongs to topic, sent by publisher
Purpose: Ephemeral message (not stored)

Connection

Attributes: connection_id, user_id, connected_at, last_heartbeat, topics
Relationships: Has many subscriptions
Purpose: WebSocket/SSE connection

API

1. Create Topic

POST /api/v1/topics
Headers:
  - Authorization: Bearer <token>
Body:
  - name: string
  - permissions: string (PUBLIC, PRIVATE, PUBLISH_ONLY, SUBSCRIBE_ONLY)
Response:
  - topic_id: string
  - name: string
  - permissions: string

2. Publish Message

POST /api/v1/topics/{topic_id}/publish
Headers:
  - Authorization: Bearer <token>
Body:
  - payload: object (JSON)
Response:
  - message_id: string
  - delivered_to: integer (number of subscribers)

WS /ws/subscribe?topic_id={topic_id}&token={auth_token}
Messages:
  - Incoming: {"type": "message", "topic_id": "...", "payload": {...}}
  - Incoming: {"type": "presence", "topic_id": "...", "subscribers": [...]}
  - Outgoing: {"type": "subscribe", "topic_id": "..."}
  - Outgoing: {"type": "unsubscribe", "topic_id": "..."}

GET /api/v1/topics/{topic_id}/subscribe
Headers:
  - Authorization: Bearer <token>
  - Accept: text/event-stream
Response:
  - Stream of Server-Sent Events
  - data: {"type": "message", "payload": {...}}

5. Unsubscribe from Topic

DELETE /api/v1/topics/{topic_id}/subscribe
Headers:
  - Authorization: Bearer <token>
Response:
  - success: boolean

6. Get Topic Info

GET /api/v1/topics/{topic_id}
Headers:
  - Authorization: Bearer <token>
Response:
  - topic_id: string
  - name: string
  - subscriber_count: integer
  - permissions: string

7. List Subscriptions

GET /api/v1/topics/{topic_id}/subscribers
Headers:
  - Authorization: Bearer <token>
Response:
  - subscribers: array of user objects
  - count: integer

Data Flow

Publish Message Flow

1. Publisher → API Gateway
2. API Gateway → Auth Service (validate token)
3. API Gateway → Pub/Sub Service
4. Pub/Sub Service:
   a. Validate topic exists
   b. Check publish permission
   c. Get list of active subscribers for topic
   d. If no subscribers: Drop message (ephemeral)
   e. If subscribers exist:
      - Create message object
      - Fan-out to all subscriber connections
      - Return delivery count

1. Client → WebSocket/SSE Connection
2. Connection Manager:
   a. Authenticate connection
   b. Create connection record
   c. Add subscription to topic
   d. Update topic subscriber count
   e. Send initial presence info
   f. Start delivering messages

Message Delivery Flow

1. Publisher publishes message to topic
2. Pub/Sub Service:
   a. Get all active subscriptions for topic
   b. For each subscription:
      - Get connection
      - Check connection is alive (heartbeat)
      - Send message via WebSocket/SSE
   c. If connection dead: Remove subscription
   d. Return delivery count

Unsubscribe Flow

1. Client → Unsubscribe request
2. Connection Manager:
   a. Remove subscription from topic
   b. Update topic subscriber count
   c. Stop delivering messages
   d. Return success

High-Level Design

┌─────────────────────────────────────────────────────────┐
│                    Publishers                            │
│              (Send Messages to Topics)                  │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ HTTP POST
                     │
┌────────────────────▼────────────────────────────────────┐
│                  API Gateway / LB                        │
│              (Rate Limiting, Auth)                       │
└────────────────────┬────────────────────────────────────┘
                     │
                     │
┌────────────────────▼────────────────────────────────────┐
│              Pub/Sub Service                            │
│  (Topic Management, Message Routing, Fan-Out)           │
└──────┬───────────────────────────────────┬──────────────┘
       │                                   │
       │                                   │
┌──────▼──────────┐              ┌─────────▼───────────┐
│  Topic Router   │              │  Connection Manager │
│  (Route by Topic)│              │  (WebSocket/SSE)    │
└──────┬──────────┘              └─────────┬───────────┘
       │                                   │
       │                                   │
┌──────▼───────────────────────────────────▼───────────┐
│         Redis Cluster                                  │
│  (Topic Metadata, Subscriptions, Presence)            │
└──────┬─────────────────────────────────────────────────┘
       │
       │
┌──────▼─────────────────────────────────────────────────┐
│         Message Queue (Optional)                      │
│  (For Cross-Server Fan-Out, Hot Topics)                │
└───────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│                    Subscribers                          │
│              (WebSocket/SSE Connections)                │
└─────────────────────────────────────────────────────────┘

Deep Dive

Component Design

1. Pub/Sub Service

Responsibilities: Topic management, message routing, fan-out
Optimization:
- Fast topic lookups
- Efficient subscriber enumeration
- Message batching for hot topics

2. Topic Router

Responsibilities: Route messages to correct topic handlers
Optimization:
- Hash-based routing
- Topic sharding
- Load balancing

3. Connection Manager

Responsibilities: Manage WebSocket/SSE connections, subscriptions
Optimization:
- Connection pooling
- Efficient connection-to-topic mapping
- Heartbeat management

4. Redis Cluster

Topic Metadata: Topic info, permissions
Subscriptions: Topic → [connection_ids]
Presence: Who’s subscribed to which topics
Connection State: Connection info, heartbeat

Topic Routing

Topic Sharding

Challenge: Distribute topics across multiple servers.

Solution: Hash-based sharding by topic_id

class TopicRouter:
    def __init__(self, num_shards=100):
        self.num_shards = num_shards
        self.shards = [TopicShard(i) for i in range(num_shards)]
    
    def get_shard(self, topic_id):
        """Route topic to shard"""
        shard_index = hash(topic_id) % self.num_shards
        return self.shards[shard_index]
    
    def publish(self, topic_id, message):
        """Route publish to correct shard"""
        shard = self.get_shard(topic_id)
        return shard.publish(topic_id, message)

Benefits:

Even distribution of topics
Independent scaling per shard
Isolated failures

Topic Metadata Storage

Redis Hash:

# Topic metadata
topic:{topic_id} = {
    "name": "chat-room-1",
    "owner_id": "user_123",
    "permissions": "PUBLIC",
    "subscriber_count": 1000,
    "created_at": "2025-11-25T10:00:00Z"
}

Connection Management

WebSocket Connection

Connection Lifecycle:

Connect: Client opens WebSocket connection
Authenticate: Validate token, create connection record
Subscribe: Client subscribes to topics
Heartbeat: Ping/pong every 30 seconds
Disconnect: Clean up subscriptions

Connection Mapping:

class ConnectionManager:
    def __init__(self):
        self.connections = {}  # connection_id -> Connection
        self.topic_subscriptions = {}  # topic_id -> [connection_ids]
        self.user_connections = {}  # user_id -> [connection_ids]
    
    def add_connection(self, connection_id, user_id, websocket):
        """Add new connection"""
        connection = Connection(
            connection_id=connection_id,
            user_id=user_id,
            websocket=websocket,
            connected_at=now()
        )
        self.connections[connection_id] = connection
        
        if user_id not in self.user_connections:
            self.user_connections[user_id] = []
        self.user_connections[user_id].append(connection_id)
    
    def subscribe(self, connection_id, topic_id):
        """Subscribe connection to topic"""
        if topic_id not in self.topic_subscriptions:
            self.topic_subscriptions[topic_id] = []
        
        if connection_id not in self.topic_subscriptions[topic_id]:
            self.topic_subscriptions[topic_id].append(connection_id)
        
        # Update Redis
        redis.sadd(f"topic:{topic_id}:subscribers", connection_id)
    
    def unsubscribe(self, connection_id, topic_id):
        """Unsubscribe connection from topic"""
        if topic_id in self.topic_subscriptions:
            self.topic_subscriptions[topic_id].remove(connection_id)
        
        # Update Redis
        redis.srem(f"topic:{topic_id}:subscribers", connection_id)

Heartbeat Management

Purpose: Detect dead connections

Implementation:

class ConnectionManager:
    def heartbeat(self, connection_id):
        """Update connection heartbeat"""
        if connection_id in self.connections:
            self.connections[connection_id].last_heartbeat = now()
            redis.setex(
                f"connection:{connection_id}:heartbeat",
                60,  # 60 second TTL
                str(now())
            )
    
    def check_connections(self):
        """Check for dead connections"""
        for connection_id, connection in self.connections.items():
            if now() - connection.last_heartbeat > 60:
                # Connection dead, cleanup
                self.remove_connection(connection_id)
    
    def remove_connection(self, connection_id):
        """Remove dead connection"""
        connection = self.connections[connection_id]
        
        # Unsubscribe from all topics
        for topic_id in connection.subscribed_topics:
            self.unsubscribe(connection_id, topic_id)
        
        # Remove connection
        del self.connections[connection_id]
        self.user_connections[connection.user_id].remove(connection_id)

Message Fan-Out

Direct Fan-Out

For Small Topics (< 1000 subscribers):

def publish_message(topic_id, message):
    """Publish message to topic"""
    # Get all subscribers
    subscribers = get_topic_subscribers(topic_id)
    
    if not subscribers:
        # No subscribers, drop message (ephemeral)
        return {"delivered_to": 0}
    
    # Fan-out to all subscribers
    delivered = 0
    for connection_id in subscribers:
        connection = get_connection(connection_id)
        if connection and connection.is_alive():
            try:
                connection.send(message)
                delivered += 1
            except:
                # Connection dead, remove subscription
                unsubscribe(connection_id, topic_id)
    
    return {"delivered_to": delivered}

Distributed Fan-Out (Hot Topics)

For Large Topics (> 1000 subscribers):

def publish_message_hot_topic(topic_id, message):
    """Publish to hot topic using message queue"""
    # Publish to Redis pub/sub or Kafka
    redis.publish(f"topic:{topic_id}", json.dumps(message))
    
    # Or use Kafka for better scalability
    kafka.produce(f"topic-{topic_id}", message)

Redis Pub/Sub:

# Publisher
redis.publish(f"topic:{topic_id}", json.dumps(message))

# Subscriber (each server)
redis_subscriber = redis.pubsub()
redis_subscriber.subscribe(f"topic:{topic_id}")

for message in redis_subscriber.listen():
    # Fan-out to local connections
    fan_out_to_local_connections(topic_id, message)

Ephemeral Delivery

No Message Storage

Key Design: Messages are never stored, only delivered to active subscribers.

Implementation:

def publish_message(topic_id, message):
    """Publish message (ephemeral)"""
    # Get active subscribers
    subscribers = get_active_subscribers(topic_id)
    
    if not subscribers:
        # No subscribers, message is dropped
        return {"delivered_to": 0, "dropped": True}
    
    # Deliver to active subscribers only
    delivered = fan_out_to_subscribers(subscribers, message)
    
    # Message is not stored anywhere
    return {"delivered_to": delivered, "dropped": False}

Benefits:

No storage overhead
Simple implementation
Low latency (no disk I/O)

Trade-offs:

Messages lost if no subscribers
No message replay
No durability guarantees

Hot-Topic Handling

Challenge

Topics with 100K+ subscribers create fan-out bottlenecks.

Solution: Multi-Level Fan-Out

Approach 1: Redis Pub/Sub

# Publisher publishes to Redis channel
redis.publish(f"topic:{topic_id}", message)

# Each server subscribes to Redis channel
# Then fans out to local connections
redis_subscriber.subscribe(f"topic:{topic_id}")
for message in redis_subscriber.listen():
    local_subscribers = get_local_subscribers(topic_id)
    fan_out_to_connections(local_subscribers, message)

Approach 2: Kafka Topics

# Publisher publishes to Kafka topic
kafka.produce(f"pubsub-topic-{topic_id}", message)

# Each server consumes from Kafka
# Then fans out to local connections
for message in kafka.consume(f"pubsub-topic-{topic_id}"):
    local_subscribers = get_local_subscribers(topic_id)
    fan_out_to_connections(local_subscribers, message)

Approach 3: Topic Partitioning

# Partition hot topic across multiple servers
def get_subscriber_shard(topic_id, connection_id):
    """Route subscriber to shard"""
    return hash(f"{topic_id}:{connection_id}") % num_shards

# Each shard handles subset of subscribers
# Publisher sends to all shards
for shard in range(num_shards):
    shard_connections = get_shard_subscribers(topic_id, shard)
    fan_out_to_connections(shard_connections, message)

Backpressure

Challenge

Subscribers can’t keep up with message rate.

Solution: Connection-Level Backpressure

WebSocket Backpressure:

class Connection:
    def __init__(self):
        self.send_queue = queue.Queue(maxsize=1000)
        self.is_ready = True
    
    def send(self, message):
        """Send message with backpressure"""
        try:
            self.send_queue.put_nowait(message)
            if self.is_ready:
                self.flush_queue()
        except queue.Full:
            # Queue full, drop message or disconnect
            self.handle_backpressure()
    
    def flush_queue(self):
        """Flush queued messages"""
        self.is_ready = False
        while not self.send_queue.empty():
            message = self.send_queue.get()
            try:
                self.websocket.send(message)
            except:
                # Connection dead, stop flushing
                return
        self.is_ready = True

Server-Level Backpressure:

def publish_message(topic_id, message):
    """Publish with server-level backpressure"""
    subscribers = get_topic_subscribers(topic_id)
    
    # Check server load
    if server_load > 0.8:
        # High load, drop low-priority messages
        if message.priority == "LOW":
            return {"delivered_to": 0, "dropped": "backpressure"}
    
    # Fan-out with rate limiting
    delivered = 0
    for connection_id in subscribers:
        connection = get_connection(connection_id)
        if connection and connection.can_send():
            connection.send(message)
            delivered += 1
    
    return {"delivered_to": delivered}

Protocols

WebSocket

Advantages:

Full-duplex communication
Low overhead
Binary support

Implementation:

# Server
import websockets

async def handle_websocket(websocket, path):
    connection_id = generate_id()
    user_id = authenticate(websocket)
    
    connection_manager.add_connection(connection_id, user_id, websocket)
    
    try:
        async for message in websocket:
            data = json.loads(message)
            if data['type'] == 'subscribe':
                connection_manager.subscribe(connection_id, data['topic_id'])
            elif data['type'] == 'unsubscribe':
                connection_manager.unsubscribe(connection_id, data['topic_id'])
    except websockets.exceptions.ConnectionClosed:
        connection_manager.remove_connection(connection_id)

Server-Sent Events (SSE)

Advantages:

Simpler than WebSocket
Automatic reconnection
HTTP-based

Implementation:

# Server
from flask import Response, stream_with_context

@app.route('/api/v1/topics/<topic_id>/subscribe')
def subscribe_sse(topic_id):
    user_id = authenticate(request)
    
    def event_stream():
        connection_id = generate_id()
        connection_manager.add_connection(connection_id, user_id, None)
        connection_manager.subscribe(connection_id, topic_id)
        
        try:
            while True:
                # Get messages for this topic
                message = message_queue.get(topic_id, connection_id)
                yield f"data: {json.dumps(message)}\n\n"
        except GeneratorExit:
            connection_manager.remove_connection(connection_id)
    
    return Response(stream_with_context(event_stream()), 
                   mimetype='text/event-stream')

Failure Handling

Connection Failures

Scenario: Connection drops during message delivery.

Solution:

Detect dead connections via heartbeat
Remove subscription on disconnect
Continue delivering to other subscribers

Server Failures

Scenario: Server crashes, connections lost.

Solution:

Clients reconnect automatically
Re-subscribe to topics on reconnect
Load balancer routes to healthy servers

Topic Shard Failures

Scenario: Shard handling topic fails.

Solution:

Failover to replica shard
Re-route topic to different shard
Clients reconnect and re-subscribe

Trade-offs and Optimizations

Trade-offs

Durability vs Latency
- Choice: Ephemeral (no durability)
- Reason: Low latency, simplicity
- Benefit: Fast delivery, no storage overhead
At-Most-Once vs At-Least-Once
- Choice: At-most-once (may lose messages)
- Reason: Simpler, no deduplication needed
- Benefit: Lower latency, simpler implementation
Ordering vs Performance
- Choice: Best-effort ordering
- Reason: Performance over strict ordering
- Benefit: Better throughput, lower latency

Optimizations

Message Batching
- Batch multiple messages for same topic
- Reduce WebSocket overhead
- Better throughput
Connection Pooling
- Reuse WebSocket connections
- Reduce connection overhead
- Better resource utilization
Topic Sharding
- Distribute topics across servers
- Better load distribution
- Isolated failures
Caching
- Cache topic metadata
- Cache subscription lists
- Reduce database queries

What Interviewers Look For

Distributed Systems Skills

Low-Latency Fan-Out
- Efficient message distribution
- Direct fan-out for small topics
- Distributed fan-out for hot topics
- Red Flags: Inefficient fan-out, no hot-topic handling, high latency
Connection Management
- WebSocket/SSE connection handling
- Heartbeat management
- Connection lifecycle
- Red Flags: No heartbeat, dead connections, poor lifecycle management
Ephemeral Delivery
- No message storage
- Drop messages if no subscribers
- Clear delivery guarantees
- Red Flags: Storing messages, durability, over-engineering

Problem-Solving Approach

Trade-off Awareness
- At-most-once vs at-least-once
- Durability vs latency
- Ordering vs performance
- Red Flags: No trade-offs, wrong trade-offs, over-engineering
Hot-Topic Handling
- Recognize hot topics
- Multi-level fan-out
- Topic partitioning
- Red Flags: No hot-topic handling, single-level fan-out, bottlenecks
Backpressure
- Connection-level backpressure
- Server-level backpressure
- Message dropping strategies
- Red Flags: No backpressure, resource exhaustion, no dropping

System Design Skills

Component Design
- Clear service boundaries
- Proper abstractions
- Efficient interfaces
- Red Flags: Monolithic design, poor abstractions, inefficient APIs
Scalability
- Horizontal scaling
- Topic sharding
- Load balancing
- Red Flags: Vertical scaling only, no sharding, no load balancing
Protocols
- WebSocket vs SSE trade-offs
- Protocol selection
- Implementation details
- Red Flags: Wrong protocol, no protocol choice, poor implementation

Communication Skills

Clear Explanation
- Explains ephemeral delivery
- Discusses trade-offs
- Justifies design decisions
- Red Flags: Unclear explanations, no justification, confusing
Architecture Diagrams
- Clear component diagram
- Shows message flow
- Fan-out architecture
- Red Flags: No diagrams, unclear diagrams, missing components

Meta-Specific Focus

Real-Time Systems
- Understanding of low-latency requirements
- WebSocket expertise
- Connection management
- Key: Demonstrate real-time systems expertise
Pub/Sub Patterns
- Topic-based routing
- Message fan-out
- Ephemeral delivery
- Key: Show pub/sub pattern mastery
Production-Grade Robustness
- Failure handling
- Backpressure
- Hot-topic handling
- Key: Demonstrate production thinking

Summary

Designing a pub/sub system requires careful consideration of ephemeral delivery, low-latency fan-out, connection management, and horizontal scaling. Key design decisions include:

Architecture Highlights:

Topic-based routing with hash-based sharding
WebSocket/SSE for real-time delivery
Ephemeral message delivery (no storage)
Multi-level fan-out for hot topics
Connection management with heartbeat

Key Patterns:

Ephemeral Delivery: Messages dropped if no subscribers
At-Most-Once: No durability, messages may be lost
Topic Routing: Hash-based sharding for scalability
Connection Management: WebSocket/SSE with heartbeat
Hot-Topic Handling: Multi-level fan-out (Redis/Kafka)

Scalability Solutions:

Horizontal scaling (multiple servers)
Topic sharding (distribute topics)
Connection pooling (efficient connections)
Message queue for cross-server fan-out

Trade-offs:

Durability vs latency (ephemeral for low latency)
At-most-once vs at-least-once (at-most-once for simplicity)
Ordering vs performance (best-effort ordering)

This design handles 1M+ concurrent subscribers, 1M+ messages/second, and maintains < 50ms message delivery latency while ensuring messages are ephemeral and dropped if no subscribers are listening. The system is scalable, fault-tolerant, and optimized for real-time pub/sub messaging.

Introduction

Table of Contents

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Traffic Estimates

Storage Estimates

Core Entities

Topic

Subscription

Message

Connection

API

1. Create Topic

2. Publish Message

3. Subscribe to Topic (WebSocket)

4. Subscribe to Topic (SSE)

5. Unsubscribe from Topic

6. Get Topic Info

7. List Subscriptions

Data Flow

Publish Message Flow

Subscribe Flow

Message Delivery Flow

Unsubscribe Flow

High-Level Design

Deep Dive

Component Design

1. Pub/Sub Service

2. Topic Router

3. Connection Manager

4. Redis Cluster

Topic Routing

Topic Sharding

Topic Metadata Storage

Connection Management

WebSocket Connection

Heartbeat Management

Message Fan-Out

Direct Fan-Out

Distributed Fan-Out (Hot Topics)

Ephemeral Delivery

No Message Storage

Hot-Topic Handling

Challenge

Solution: Multi-Level Fan-Out

Backpressure

Challenge

Solution: Connection-Level Backpressure

Protocols

WebSocket

Server-Sent Events (SSE)

Failure Handling

Connection Failures

Server Failures

Topic Shard Failures

Trade-offs and Optimizations

Trade-offs

Optimizations

What Interviewers Look For

Distributed Systems Skills

Problem-Solving Approach

System Design Skills

Communication Skills

Meta-Specific Focus

Summary

Related Posts

Recent Posts