Design Instagram - System Design Interview

Introduction

Instagram is a photo and video-sharing social networking service where users can upload, edit, and share content with followers. Designing Instagram requires handling massive scale: billions of users, millions of uploads per day, and petabytes of media storage.

This post provides a detailed walkthrough of designing Instagram, covering key features, scalability challenges, and architectural decisions. This is one of the most common system design interview questions that tests your understanding of distributed systems, media storage, feed generation, and real-time features.

Problem Statement
Requirements
- Functional Requirements
- Non-Functional Requirements
Capacity Estimation
Core Entities
API
Data Flow
Database Design
- Schema Design
- Database Sharding Strategy
High-Level Design
Deep Dive
Summary

Problem Statement

Design Instagram with the following features:

Users can upload photos and videos
Users can follow other users
Users see a feed of photos/videos from users they follow
Users can like and comment on posts
Users can search for photos by tags/usernames
Support for stories (temporary content that expires after 24 hours)

Scale Requirements:

1 billion+ users
500 million+ daily active users
100 million+ photos/videos uploaded per day
23 billion+ photos viewed per day
Average photo size: 200KB
Average video size: 2MB

Requirements

Functional Requirements

Core Features:

User Management: Registration, authentication, profiles
Media Upload: Upload photos and videos
Feed Generation: Display posts from followed users
Social Graph: Follow/unfollow users
Interactions: Like, comment on posts
Search: Search by username, hashtags, locations
Stories: Temporary content that expires after 24 hours
Notifications: Real-time notifications for likes, comments, follows

Out of Scope:

Direct messaging (DM) feature
Video streaming (assume simple video playback)
Advanced filters and editing
Live streaming
Reels/IGTV (focus on basic feed)

Non-Functional Requirements

Availability: 99.9% uptime
Reliability: No data loss, all uploads must succeed
Performance:
- Feed load time: < 200ms
- Photo upload: < 3 seconds
- Video upload: < 30 seconds
Scalability: Handle 100M+ uploads per day
Consistency: Eventually consistent is acceptable for feed
Durability: All media must be stored reliably

Capacity Estimation

Traffic Estimates

Daily Active Users (DAU): 500 million
Daily uploads: 100 million (photos + videos)
Read:Write ratio: 100:1 (23B views / 100M uploads)
Average user views: 46 photos/videos per day
Peak traffic: 3x average = 300M uploads/day peak

Storage Estimates

Photos:

100M photos/day × 200KB = 20TB/day
5 years retention: 20TB × 365 × 5 = 36.5PB

Videos:

20M videos/day × 2MB = 40TB/day
5 years retention: 40TB × 365 × 5 = 73PB

Total Media Storage: ~110PB over 5 years

Metadata Storage:

User data: 1B users × 1KB = 1TB
Posts metadata: 100M/day × 365 × 5 × 500 bytes = 91TB
Social graph: 1B users × 500 followers avg × 8 bytes = 4TB
Total metadata: ~100TB

Bandwidth Estimates

Upload bandwidth: 20TB/day (photos) + 40TB/day (videos) = 60TB/day = 694MB/s
Download bandwidth: 60TB/day × 100 (read:write ratio) = 6PB/day = 69GB/s

Core Entities

User

Attributes: user_id, username, email, password_hash, full_name, bio, profile_picture_url
Relationships: Follows other users, creates posts, likes/comments on posts

Post

Attributes: post_id, user_id, media_type, media_url, thumbnail_url, caption, location, like_count, comment_count, created_at
Relationships: Belongs to user, has likes and comments

Follow

Attributes: follower_id, followee_id, created_at
Relationships: Links users (follower → followee)

Like

Attributes: like_id, user_id, post_id, created_at
Relationships: Links user to post

Comment

Attributes: comment_id, post_id, user_id, text, created_at
Relationships: Belongs to post and user

Story

Attributes: story_id, user_id, media_url, media_type, expires_at, created_at
Relationships: Belongs to user, expires after 24 hours

API

1. Upload Photo/Video

POST /api/v1/media/upload
Parameters:
  - file: photo/video file
  - user_id: user ID
  - caption: optional text caption
  - location: optional location data
Response:
  - media_id: unique media identifier
  - upload_url: URL for direct upload

2. Get User Feed

GET /api/v1/feed
Parameters:
  - user_id: user ID
  - max_id: pagination cursor (optional)
  - count: number of posts to return (default: 20)
Response:
  - posts: array of post objects
  - next_max_id: cursor for next page

3. Follow User

POST /api/v1/follow
Parameters:
  - user_id: current user ID
  - follow_user_id: user to follow
Response:
  - success: boolean

4. Like Post

POST /api/v1/posts/{post_id}/like
Parameters:
  - user_id: user ID
  - post_id: post ID
Response:
  - like_count: updated like count

5. Comment on Post

POST /api/v1/posts/{post_id}/comments
Parameters:
  - user_id: user ID
  - post_id: post ID
  - text: comment text
Response:
  - comment_id: unique comment identifier

6. Get User Stories

GET /api/v1/stories/{user_id}
Parameters:
  - user_id: user ID
Response:
  - stories: array of story objects

Data Flow

Upload Flow

Client uploads media file → Load Balancer
Load Balancer → API Gateway
API Gateway → Upload Service
Upload Service validates file and generates media ID
Upload Service → Object Storage (S3) for media storage
Upload Service → Database for metadata storage
Upload Service → Message Queue (for async processing)
Message Queue → Thumbnail Service (async)
Upload Service → Cache invalidation (user feed)
Response returned to client

Feed Generation Flow

Client requests feed → Load Balancer
Load Balancer → API Gateway
API Gateway → Feed Service
Feed Service checks Redis cache
If cache miss:
- Fetch followed users from Social Graph Service
- Fetch posts from Database (or cache)
- Merge and sort posts
- Cache result in Redis
Return feed to client

Like/Comment Flow

Client submits like/comment → API Gateway
API Gateway → Interaction Service
Interaction Service → Database (write)
Interaction Service → Cache invalidation (post cache)
Interaction Service → Message Queue (for notifications)
Message Queue → Notification Service
Response returned to client

Database Design

Schema Design

Users Table:

CREATE TABLE users (
    user_id BIGINT PRIMARY KEY,
    username VARCHAR(255) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    full_name VARCHAR(255),
    bio TEXT,
    profile_picture_url VARCHAR(512),
    created_at TIMESTAMP,
    updated_at TIMESTAMP,
    INDEX idx_username (username),
    INDEX idx_email (email)
);

Posts Table:

CREATE TABLE posts (
    post_id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    media_type ENUM('photo', 'video') NOT NULL,
    media_url VARCHAR(512) NOT NULL,
    thumbnail_url VARCHAR(512),
    caption TEXT,
    location VARCHAR(255),
    like_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    created_at TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Follows Table:

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id),
    INDEX idx_follower (follower_id),
    INDEX idx_followee (followee_id),
    FOREIGN KEY (follower_id) REFERENCES users(user_id),
    FOREIGN KEY (followee_id) REFERENCES users(user_id)
);

Likes Table:

CREATE TABLE likes (
    like_id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    post_id BIGINT NOT NULL,
    created_at TIMESTAMP,
    UNIQUE KEY unique_like (user_id, post_id),
    INDEX idx_post_id (post_id),
    INDEX idx_user_id (user_id),
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    FOREIGN KEY (post_id) REFERENCES posts(post_id)
);

Comments Table:

CREATE TABLE comments (
    comment_id BIGINT PRIMARY KEY,
    post_id BIGINT NOT NULL,
    user_id BIGINT NOT NULL,
    text TEXT NOT NULL,
    created_at TIMESTAMP,
    INDEX idx_post_id (post_id),
    FOREIGN KEY (post_id) REFERENCES posts(post_id),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Stories Table:

CREATE TABLE stories (
    story_id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    media_url VARCHAR(512) NOT NULL,
    media_type ENUM('photo', 'video') NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    created_at TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_expires_at (expires_at),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Database Sharding Strategy

Shard by User ID:

Use consistent hashing to distribute users across shards
User data, posts, and follows stored on same shard for locality
Enables efficient feed generation for a user’s own posts

Challenges:

Cross-shard queries for feed generation (posts from multiple users)
Need to aggregate data from multiple shards

High-Level Design

┌─────────────┐
│   Client    │
│  (Mobile/   │
│    Web)     │
└──────┬──────┘
       │
       ▼
┌─────────────────────────────────────────┐
│         Load Balancer                   │
└──────┬──────────────────┬───────────────┘
       │                  │
       ▼                  ▼
┌──────────────┐   ┌──────────────┐
│  API Gateway │   │  API Gateway │
└──────┬───────┘   └──────┬───────┘
       │                  │
       ├──────────────────┼──────────────────┐
       │                  │                  │
       ▼                  ▼                  ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ Upload       │  │ Feed        │  │ Social Graph │
│ Service      │  │ Service     │  │ Service      │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       ▼                 ▼                 ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ Object       │  │ Cache        │  │ Database     │
│ Storage      │  │ (Redis)      │  │ (Sharded)    │
│ (S3/CDN)     │  └──────────────┘  └──────────────┘
└──────────────┘

Deep Dive

Component Design

1. Upload Service

Flow:

Client uploads media file to Upload Service
Upload Service validates file (size, format)
Generate unique media ID
Upload to Object Storage (S3)
Generate thumbnail for videos
Store metadata in database
Invalidate cache for user’s feed
Return success response

Optimizations:

Resumable uploads: For large videos, support chunked uploads
Async processing: Thumbnail generation and metadata extraction done asynchronously
CDN integration: Upload directly to CDN edge locations

2. Feed Service

Feed Generation Approaches:

Approach 1: Pull Model (Fan-out on Read)

When user requests feed, fetch posts from all followed users
Aggregate and sort posts
Pros: Simple, real-time data
Cons: Slow for users with many followees, high database load

Approach 2: Push Model (Fan-out on Write)

When user posts, push to all followers’ feed caches
Pros: Fast feed retrieval
Cons: High write load, storage overhead

Approach 3: Hybrid Approach (Recommended)

Push model for users with < 1000 followers
Pull model for users with > 1000 followers (celebrities)
Use Redis sorted sets for feed storage

Feed Generation Flow:

Check Redis cache for user feed
If cache miss:
- Fetch recent posts from followed users
- Merge and sort by timestamp
- Cache result in Redis
Return feed to user

Redis Data Structure:

Key: feed:{user_id}
Type: Sorted Set
Score: timestamp
Value: post_id
TTL: 7 days

Stores follow relationships:

Use graph database (Neo4j) or relational DB with optimized indexes
Cache frequently accessed relationships in Redis

Redis Cache:

Key: followers:{user_id}
Type: Set
Value: follower user_ids
TTL: 1 hour

Key: following:{user_id}
Type: Set
Value: followee user_ids
TTL: 1 hour

4. Search Service

Features:

Search by username
Search by hashtags
Search by location

Implementation:

Use Elasticsearch for full-text search
Index usernames, hashtags, captions, locations
Real-time indexing via message queue

5. Stories Service

Features:

Stories expire after 24 hours
Background job removes expired stories
Stories appear at top of feed

Implementation:

Store stories in database with expires_at timestamp
Background cron job deletes expired stories
Cache active stories in Redis with TTL

Detailed Design

Media Storage Architecture

Object Storage (S3):

Store original photos and videos
Use multiple availability zones for redundancy
Lifecycle policies for old content

CDN (CloudFront):

Cache frequently accessed media
Serve media from edge locations
Reduce origin server load

Storage Tiers:

Hot storage: Recent posts (last 30 days) - SSD
Warm storage: Older posts (30 days - 1 year) - Standard
Cold storage: Archive (1+ years) - Glacier

Caching Strategy

Multi-level Caching:

L1: Application Cache (In-memory)

Cache user profiles, recent posts
TTL: 5 minutes

L2: Redis Cache

Feed data, social graph, trending posts
TTL: 1 hour for feeds, 24 hours for profiles

L3: CDN Cache

Media files, static content
TTL: 7 days

Cache Invalidation:

On post upload: Invalidate user’s feed cache
On like/comment: Invalidate post cache
On follow/unfollow: Invalidate feed cache

Load Balancing

Strategy:

Layer 4 (NLB): For media uploads (high bandwidth)
Layer 7 (ALB): For API requests (routing based on path)

Load Balancer Features:

Health checks
SSL termination
Request routing
Rate limiting

Database Replication

Master-Slave Replication:

Master handles writes
Slaves handle reads
Automatic failover

Read Replicas:

Distribute read load across multiple replicas
Geographic distribution for low latency

Message Queue

Use Cases:

Async processing: Thumbnail generation, metadata extraction
Feed updates: Fan-out to followers
Notifications: Like, comment, follow notifications
Search indexing: Update search index

Technology: Apache Kafka or AWS SQS

Scalability Considerations

Horizontal Scaling

Stateless Services:

Upload Service, Feed Service, API Gateway
Scale based on CPU/memory metrics

Database Scaling:

Shard by user_id
Use consistent hashing
Replicate shards for availability

Data Partitioning

Sharding Strategy:

By User ID: User data, posts, follows on same shard
By Post ID: For global post lookups
By Location: For location-based queries

Handling Hot Users

Problem: Celebrities with millions of followers

Solutions:

Separate feed generation: Use pull model for hot users
Dedicated infrastructure: Separate servers for hot users
Rate limiting: Prevent abuse
Caching: Aggressive caching for hot content

Security Considerations

Authentication: JWT tokens, OAuth 2.0
Authorization: Role-based access control
Media validation: File type, size, content scanning
Rate limiting: Prevent abuse
DDoS protection: CloudFlare, AWS Shield
Data encryption: Encrypt data at rest and in transit
Privacy: User privacy settings, content visibility controls

Monitoring & Observability

Key Metrics:

Upload success rate
Feed generation latency
API response times
Cache hit rates
Database query performance
CDN bandwidth usage

Tools:

Prometheus + Grafana for metrics
ELK stack for logging
Distributed tracing (Jaeger/Zipkin)

Trade-offs and Optimizations

Trade-offs

Consistency vs Availability
- Choose eventual consistency for feeds (better availability)
- Strong consistency for critical operations (likes, follows)
Storage vs Compute
- Pre-compute feeds (more storage, faster reads)
- Compute on-demand (less storage, slower reads)
- Hybrid approach balances both
Latency vs Freshness
- Cache feeds (lower latency, stale data)
- Real-time updates (higher latency, fresh data)
- Use cache with short TTL

Optimizations

Image Optimization
- Multiple resolutions (thumbnail, medium, full)
- WebP format for better compression
- Lazy loading
Video Optimization
- Multiple bitrates for adaptive streaming
- H.264/H.265 encoding
- Progressive download
Database Optimization
- Indexes on frequently queried fields
- Connection pooling
- Query optimization
Caching Optimization
- Cache warming for popular content
- Cache aside pattern
- Write-through for critical data

What Interviewers Look For

Distributed Systems Skills

Scalability Design
- Horizontal scaling strategies
- Sharding and partitioning
- Load balancing approaches
- Red Flags: Vertical scaling only, no sharding strategy, single points of failure
Media Storage Architecture
- Object storage design (S3-like)
- CDN integration
- Efficient media delivery
- Red Flags: Database storage for media, no CDN, inefficient delivery
Feed Generation Strategy
- Push vs pull trade-offs
- Hybrid approach understanding
- Timeline caching
- Red Flags: Only push or only pull, no hybrid, no caching

Problem-Solving Approach

Scale Thinking
- Billions of users consideration
- Petabytes of storage
- Millions of requests per second
- Red Flags: Designing for small scale, ignoring scale challenges
Trade-off Analysis
- Consistency vs availability
- Latency vs freshness
- Storage cost vs performance
- Red Flags: No trade-off discussion, dogmatic choices
Edge Cases
- Celebrity users (millions of followers)
- Viral content
- Storage failures
- Red Flags: Ignoring edge cases, no failure handling

System Design Skills

Component Design
- Clear service boundaries
- Proper API design
- Data flow understanding
- Red Flags: Monolithic design, unclear boundaries
Caching Strategy
- Multi-level caching
- Cache invalidation
- Cache warming
- Red Flags: No caching, poor invalidation, cache stampede
Database Design
- Proper sharding strategy
- Index design
- Read replicas
- Red Flags: No sharding, missing indexes, no read scaling

Communication Skills

Clear Architecture Explanation
- Can explain design clearly
- Justifies decisions
- Discusses alternatives
- Red Flags: Unclear explanations, no justification
Capacity Estimation
- Realistic estimates
- Proper calculations
- Resource planning
- Red Flags: Unrealistic estimates, no calculations

Meta-Specific Focus

Feed Generation Patterns
- Understanding of push/pull/hybrid
- Timeline caching knowledge
- Key: Show understanding of Meta’s patterns
Scale-First Thinking
- Design for billions from start
- Global distribution
- Key: Demonstrate scale awareness

Summary

Designing Instagram requires handling:

Massive Scale: Billions of users, petabytes of media
High Read:Write Ratio: 100:1 read to write ratio
Real-time Features: Feed updates, notifications
Media Storage: Efficient storage and delivery of photos/videos
Social Graph: Efficient follow/unfollow operations
Feed Generation: Fast feed retrieval with hybrid push/pull model

Key Architectural Decisions:

Object storage (S3) + CDN for media
Sharded databases for scalability
Redis for caching and feed storage
Hybrid feed generation (push for normal users, pull for celebrities)
Message queues for async processing
Multi-level caching strategy

This design can handle Instagram’s scale while maintaining low latency and high availability.