Introduction
Designing a social media feed is a system design interview question that tests your ability to model social media entities, implement feed generation algorithms, and handle local data efficiently. This question focuses on:
- Data modeling: Posts, comments, likes, users
- Feed generation: Sorting, filtering, pagination
- Local caching: Efficient retrieval
- Content management: Deletion, edits, updates
This guide covers the design of a local social media feed system with proper data modeling and efficient feed generation.
Table of Contents
- Problem Statement
- Requirements
- Data Modeling
- Feed Generation
- Sorting and Filtering
- Content Management
- Implementation
- Summary
Problem Statement
Design a local social media feed system that:
- Stores posts (text, images, timestamps)
- Stores comments and likes on posts
- Generates user timeline (posts from followed users)
- Supports sorting (chronological, popularity)
- Handles content deletion and edits
- Provides efficient retrieval (< 100ms)
Scale Requirements:
- Support 10K-1M posts
- Support 100K-10M comments/likes
- Fast feed generation: < 100ms
- Efficient storage and retrieval
Requirements
Functional Requirements
- Create Post: Create new post
- Get Feed: Get user’s timeline feed
- Add Comment: Comment on post
- Like Post: Like/unlike post
- Delete Post: Delete post and related data
- Edit Post: Update post content
- Follow/Unfollow: Manage user relationships
Non-Functional Requirements
Performance:
- Fast feed generation
- Efficient sorting
- Quick content retrieval
Consistency:
- Accurate feed content
- Handle deletions properly
- Update on edits
Data Modeling
Database Schema
CREATE TABLE users (
user_id BIGINT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_username (username)
);
CREATE TABLE posts (
post_id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
content TEXT NOT NULL,
post_type VARCHAR(20) DEFAULT 'text', -- text, image, video
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL,
like_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
INDEX idx_user_id (user_id),
INDEX idx_created_at (created_at),
INDEX idx_deleted_at (deleted_at),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
CREATE TABLE comments (
comment_id BIGINT PRIMARY KEY AUTO_INCREMENT,
post_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL,
INDEX idx_post_id (post_id),
INDEX idx_user_id (user_id),
FOREIGN KEY (post_id) REFERENCES posts(post_id),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
CREATE TABLE likes (
like_id BIGINT PRIMARY KEY AUTO_INCREMENT,
post_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY unique_post_user (post_id, user_id),
INDEX idx_post_id (post_id),
INDEX idx_user_id (user_id),
FOREIGN KEY (post_id) REFERENCES posts(post_id),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
CREATE TABLE follows (
follow_id BIGINT PRIMARY KEY AUTO_INCREMENT,
follower_id BIGINT NOT NULL,
followee_id BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY unique_follow (follower_id, followee_id),
INDEX idx_follower (follower_id),
INDEX idx_followee (followee_id),
FOREIGN KEY (follower_id) REFERENCES users(user_id),
FOREIGN KEY (followee_id) REFERENCES users(user_id)
);
Data Model Classes
from datetime import datetime
from typing import Optional, List
from dataclasses import dataclass
@dataclass
class User:
user_id: Optional[int]
username: str
email: str
created_at: datetime
@dataclass
class Post:
post_id: Optional[int]
user_id: int
content: str
post_type: str
created_at: datetime
updated_at: datetime
deleted_at: Optional[datetime]
like_count: int
comment_count: int
user: Optional[User] = None # For convenience
def is_deleted(self) -> bool:
return self.deleted_at is not None
@dataclass
class Comment:
comment_id: Optional[int]
post_id: int
user_id: int
content: str
created_at: datetime
deleted_at: Optional[datetime]
@dataclass
class Like:
like_id: Optional[int]
post_id: int
user_id: int
created_at: datetime
Feed Generation
Basic Feed Generation
class SocialMediaFeed:
def __init__(self, db):
self.db = db
def get_user_feed(self, user_id: int, limit: int = 20, offset: int = 0) -> List[Post]:
"""Get user's timeline feed."""
# Get users that this user follows
followees = self.db.get_followees(user_id)
followee_ids = [f.followee_id for f in followees]
# Include own posts
followee_ids.append(user_id)
# Get posts from followed users
posts = self.db.get_posts_by_users(followee_ids, limit=limit, offset=offset)
# Filter deleted posts
posts = [p for p in posts if not p.is_deleted()]
# Sort by created_at (newest first)
posts.sort(key=lambda p: p.created_at, reverse=True)
return posts
def get_user_feed_chronological(self, user_id: int, limit: int = 20) -> List[Post]:
"""Get feed sorted chronologically."""
followees = self.db.get_followees(user_id)
followee_ids = [f.followee_id for f in followees] + [user_id]
posts = self.db.get_posts_by_users_chronological(followee_ids, limit=limit)
return [p for p in posts if not p.is_deleted()]
def get_user_feed_popular(self, user_id: int, limit: int = 20) -> List[Post]:
"""Get feed sorted by popularity (likes + comments)."""
followees = self.db.get_followees(user_id)
followee_ids = [f.followee_id for f in followees] + [user_id]
posts = self.db.get_posts_by_users(followee_ids)
posts = [p for p in posts if not p.is_deleted()]
# Sort by engagement score
posts.sort(key=lambda p: p.like_count + p.comment_count, reverse=True)
return posts[:limit]
Sorting and Filtering
Efficient Sorting
class SocialMediaFeed:
def get_user_feed_sorted(self, user_id: int, sort_by: str = 'time',
limit: int = 20) -> List[Post]:
"""Get feed with different sorting options."""
followees = self.db.get_followees(user_id)
followee_ids = [f.followee_id for f in followees] + [user_id]
if sort_by == 'time':
return self._get_sorted_by_time(followee_ids, limit)
elif sort_by == 'popular':
return self._get_sorted_by_popularity(followee_ids, limit)
elif sort_by == 'recent':
return self._get_sorted_by_recent_activity(followee_ids, limit)
else:
return self._get_sorted_by_time(followee_ids, limit)
def _get_sorted_by_time(self, user_ids: List[int], limit: int) -> List[Post]:
"""Sort by creation time."""
posts = self.db.get_posts_by_users(user_ids, limit=limit * 2)
posts = [p for p in posts if not p.is_deleted()]
posts.sort(key=lambda p: p.created_at, reverse=True)
return posts[:limit]
def _get_sorted_by_popularity(self, user_ids: List[int], limit: int) -> List[Post]:
"""Sort by engagement (likes + comments)."""
posts = self.db.get_posts_by_users(user_ids, limit=limit * 2)
posts = [p for p in posts if not p.is_deleted()]
posts.sort(key=lambda p: (p.like_count + p.comment_count, p.created_at),
reverse=True)
return posts[:limit]
def _get_sorted_by_recent_activity(self, user_ids: List[int], limit: int) -> List[Post]:
"""Sort by recent activity (comments/likes)."""
posts = self.db.get_posts_by_users(user_ids, limit=limit * 2)
posts = [p for p in posts if not p.is_deleted()]
# Get most recent activity for each post
for post in posts:
recent_comment = self.db.get_most_recent_comment(post.post_id)
recent_like = self.db.get_most_recent_like(post.post_id)
recent_activity = max(
recent_comment.created_at if recent_comment else post.created_at,
recent_like.created_at if recent_like else post.created_at
)
post.recent_activity = recent_activity
posts.sort(key=lambda p: p.recent_activity, reverse=True)
return posts[:limit]
Content Management
Post Operations
class SocialMediaFeed:
def create_post(self, user_id: int, content: str, post_type: str = 'text') -> Post:
"""Create new post."""
post = Post(
post_id=None,
user_id=user_id,
content=content,
post_type=post_type,
created_at=datetime.now(),
updated_at=datetime.now(),
deleted_at=None,
like_count=0,
comment_count=0
)
return self.db.create_post(post)
def delete_post(self, post_id: int, user_id: int) -> bool:
"""Delete post (soft delete)."""
post = self.db.get_post(post_id)
if not post or post.user_id != user_id:
return False
# Soft delete
post.deleted_at = datetime.now()
self.db.update_post(post)
# Delete related comments and likes (optional: soft delete)
# self.db.delete_comments_by_post(post_id)
# self.db.delete_likes_by_post(post_id)
return True
def edit_post(self, post_id: int, user_id: int, new_content: str) -> bool:
"""Edit post content."""
post = self.db.get_post(post_id)
if not post or post.user_id != user_id:
return False
if post.is_deleted():
return False
post.content = new_content
post.updated_at = datetime.now()
self.db.update_post(post)
return True
def add_comment(self, post_id: int, user_id: int, content: str) -> Comment:
"""Add comment to post."""
post = self.db.get_post(post_id)
if not post or post.is_deleted():
raise ValueError("Post not found or deleted")
comment = Comment(
comment_id=None,
post_id=post_id,
user_id=user_id,
content=content,
created_at=datetime.now(),
deleted_at=None
)
comment = self.db.create_comment(comment)
# Update comment count
post.comment_count += 1
self.db.update_post(post)
return comment
def like_post(self, post_id: int, user_id: int) -> bool:
"""Like or unlike post."""
post = self.db.get_post(post_id)
if not post or post.is_deleted():
return False
existing_like = self.db.get_like(post_id, user_id)
if existing_like:
# Unlike
self.db.delete_like(existing_like.like_id)
post.like_count -= 1
else:
# Like
like = Like(
like_id=None,
post_id=post_id,
user_id=user_id,
created_at=datetime.now()
)
self.db.create_like(like)
post.like_count += 1
self.db.update_post(post)
return True
Local Caching
Feed Caching
class SocialMediaFeed:
def __init__(self, db, cache=None):
self.db = db
self.cache = cache # Optional cache (e.g., Redis, in-memory)
self.cache_ttl = 300 # 5 minutes
def get_user_feed_cached(self, user_id: int, limit: int = 20) -> List[Post]:
"""Get feed with caching."""
cache_key = f"feed:{user_id}:{limit}"
# Check cache
if self.cache:
cached = self.cache.get(cache_key)
if cached:
return cached
# Generate feed
feed = self.get_user_feed(user_id, limit=limit)
# Cache result
if self.cache:
self.cache.set(cache_key, feed, ttl=self.cache_ttl)
return feed
def invalidate_feed_cache(self, user_id: int):
"""Invalidate feed cache for user."""
if self.cache:
# Invalidate for user and their followers
followers = self.db.get_followers(user_id)
for follower in followers:
self.cache.delete(f"feed:{follower.follower_id}:*")
Implementation
Complete Example
# Initialize
db = Database()
cache = Cache()
feed_system = SocialMediaFeed(db, cache)
# Create users
user1 = feed_system.create_user("alice", "alice@example.com")
user2 = feed_system.create_user("bob", "bob@example.com")
# Follow
feed_system.follow(user1.user_id, user2.user_id)
# Create posts
post1 = feed_system.create_post(user2.user_id, "Hello World!")
post2 = feed_system.create_post(user1.user_id, "My first post")
# Get feed
feed = feed_system.get_user_feed(user1.user_id)
for post in feed:
print(f"{post.user.username}: {post.content}")
# Like and comment
feed_system.like_post(post1.post_id, user1.user_id)
feed_system.add_comment(post1.post_id, user1.user_id, "Nice post!")
What Interviewers Look For
Data Modeling Skills
- Relational Design
- Proper schema for posts, comments, likes
- Correct relationships and foreign keys
- Appropriate indexes
- Red Flags: Poor schema, missing indexes
- Entity Relationships
- User-Post-Comment-Like relationships
- Follow relationships
- Red Flags: Incorrect relationships, missing entities
Algorithm Implementation
- Feed Generation
- Efficient querying
- Proper sorting algorithms
- Red Flags: Inefficient queries, slow sorting
- Sorting Strategies
- Chronological vs popularity sorting
- Efficient implementation
- Red Flags: No sorting options, inefficient
Problem-Solving Approach
- Content Management
- Soft delete implementation
- Edit handling
- Red Flags: Hard delete, no edit support
- Caching Strategy
- Local caching approach
- Cache invalidation
- Red Flags: No caching, incorrect invalidation
- Edge Cases
- Deleted posts in feed
- Empty feeds
- Large result sets
- Red Flags: Ignoring edge cases
Code Quality
- Query Efficiency
- Proper use of indexes
- Efficient joins
- Red Flags: Slow queries, missing indexes
- Data Consistency
- Accurate counts
- Consistent state
- Red Flags: Wrong counts, inconsistent data
Interview Focus
- Data Modeling
- Strong database design skills
- Efficient queries
- Key: Show database expertise
- Local System Design
- No distributed concerns
- Focus on correctness
- Key: Demonstrate local system understanding
Summary
Key Takeaways
- Data Modeling: Users, Posts, Comments, Likes, Follows
- Feed Generation: Get posts from followed users, sort by time/popularity
- Content Management: Create, edit, delete posts with soft delete
- Local Caching: Cache feeds for fast retrieval
- Efficient Retrieval: Index on user_id, created_at for fast queries
Common Interview Questions
- How would you store posts, comments, and likes?
- Posts: user_id, content, timestamps, counts
- Comments: post_id, user_id, content
- Likes: post_id, user_id (unique constraint)
- How would you retrieve a user’s timeline efficiently?
- Get followees, query posts by user_ids
- Sort by created_at or engagement
- Use indexes on user_id, created_at
- How would you handle content deletion or edits?
- Soft delete: Set deleted_at timestamp
- Filter deleted posts in queries
- Update content and updated_at for edits
Understanding social media feed design is crucial for Meta interviews focusing on:
- Data modeling
- Feed generation algorithms
- Sorting and filtering
- Local caching
- Content management