System Design Interview: Technologies, Options, and Trade-offs - Complete Reference Guide

Introduction

System design interviews require making informed decisions about which technologies to use for different components of a system. Understanding the trade-offs, use cases, and alternatives for each technology is crucial for designing scalable, reliable, and efficient systems.

This comprehensive guide covers all major technologies used in system design interviews, their characteristics, trade-offs, and when to use them. Use this as a reference when preparing for system design interviews or making architectural decisions.

Databases
Caching Solutions
Message Queues
Search Engines
Content Delivery Networks (CDNs)
Load Balancers
Storage Systems
Stream Processing
API Design Patterns
Consistency Models
Replication Strategies
Sharding Strategies
Summary

Databases

SQL Databases

Examples: PostgreSQL, MySQL, Oracle, SQL Server

Characteristics:

ACID transactions
Relational data model
Strong consistency
SQL query language
Vertical scaling (can scale horizontally with sharding)

Use Cases:

Financial transactions
User accounts and authentication
E-commerce orders
Systems requiring strong consistency
Complex queries with joins
Data integrity critical applications

Pros:

ACID guarantees
Strong consistency
Mature ecosystem
Rich query capabilities (joins, aggregations)
Data integrity constraints
Well-understood by developers

Cons:

Limited horizontal scalability
Schema changes can be expensive
Performance issues with large datasets
Joins can be slow at scale
Vertical scaling is expensive

Trade-offs:

Consistency vs. Availability: Strong consistency may reduce availability
Flexibility vs. Performance: Fixed schema improves performance but reduces flexibility
Scalability: Vertical scaling is limited; horizontal scaling requires complex sharding

When to Use:

Need ACID transactions
Complex relational queries
Strong consistency requirements
Data integrity is critical
Moderate scale (< 100M records per table)

Alternatives:

NoSQL for better scalability
NewSQL (CockroachDB, Spanner) for distributed SQL

NoSQL Databases

Document Databases

Examples: MongoDB, CouchDB, DynamoDB

Characteristics:

Document-based storage (JSON/BSON)
Schema-less or flexible schema
Horizontal scaling
Eventual consistency (can be configured)

Use Cases:

Content management systems
User profiles
Product catalogs
Real-time analytics
Mobile applications

Pros:

Flexible schema
Easy horizontal scaling
Good for semi-structured data
Fast writes
JSON-friendly

Cons:

No joins (must denormalize)
Eventual consistency
Limited transaction support
Query capabilities less rich than SQL

Trade-offs:

Flexibility vs. Consistency: Flexible schema but eventual consistency
Performance vs. Features: Fast but limited query capabilities

When to Use:

Flexible schema requirements
High write throughput
Horizontal scaling needed
Document-based data model fits well

Key-Value Stores

Examples: Redis, DynamoDB, Riak, Voldemort

Characteristics:

Simple key-value model
Very fast reads/writes
Horizontal scaling
Simple data model

Use Cases:

Caching
Session storage
Shopping carts
User preferences
Real-time leaderboards

Pros:

Extremely fast
Simple data model
Horizontal scaling
Low latency

Cons:

Limited query capabilities
No complex data structures
Data modeling limitations

Trade-offs:

Simplicity vs. Functionality: Simple but limited query capabilities
Performance vs. Features: Fast but basic

When to Use:

Simple data model (key-value)
High performance requirements
Caching layer
Session management

Column-Family Stores (Wide-Column)

Examples: Cassandra, HBase, ScyllaDB

Characteristics:

Column-family data model
Excellent horizontal scaling
High write throughput
Eventual consistency
No single point of failure

Use Cases:

Time-series data
Logging systems
IoT data
High write throughput systems
Multi-region deployments

Pros:

Excellent horizontal scaling
High write throughput
No single point of failure
Multi-region support
Tunable consistency

Cons:

Eventual consistency
Limited query capabilities
Complex data modeling
No joins or transactions

Trade-offs:

Scalability vs. Consistency: Excellent scalability but eventual consistency
Write Performance vs. Read Complexity: Fast writes but complex reads

When to Use:

High write throughput
Time-series data
Multi-region requirements
Horizontal scaling critical
Can tolerate eventual consistency

Graph Databases

Examples: Neo4j, Amazon Neptune, ArangoDB

Characteristics:

Graph data model (nodes, edges, properties)
Optimized for graph traversals
Relationship queries
ACID transactions (some)

Use Cases:

Social networks
Recommendation engines
Fraud detection
Knowledge graphs
Network analysis

Pros:

Excellent for relationship queries
Fast graph traversals
Flexible schema
Good for complex relationships

Cons:

Limited horizontal scaling
Specialized use case
Can be expensive
Learning curve

Trade-offs:

Relationship Queries vs. Scalability: Great for relationships but limited scaling
Specialization vs. General Use: Optimized for graphs but not general purpose

When to Use:

Complex relationships
Graph traversals needed
Social networks
Recommendation systems
Fraud detection

Time-Series Databases

Examples: InfluxDB, TimescaleDB, Prometheus

Characteristics:

Optimized for time-series data
Efficient compression
Time-based queries
High write throughput

Use Cases:

IoT sensor data
Monitoring and metrics
Financial tick data
Log aggregation
Real-time analytics

Pros:

Optimized for time-series
Efficient storage (compression)
Fast time-range queries
High write throughput

Cons:

Specialized use case
Limited general-purpose queries
Can be expensive

Trade-offs:

Specialization vs. General Use: Optimized for time-series but limited elsewhere

When to Use:

Time-series data
Monitoring/metrics
IoT applications
High-frequency time-stamped data

In-Memory Databases

Examples: Redis, Memcached, Apache Ignite

Characteristics:

Data stored in RAM
Extremely fast
Volatile (data lost on restart)
Limited capacity

Use Cases:

Caching
Session storage
Real-time analytics
Leaderboards
Rate limiting

Pros:

Extremely fast (microsecond latency)
Low latency
High throughput

Cons:

Volatile (data can be lost)
Limited capacity (RAM is expensive)
Cost per GB is high

Trade-offs:

Speed vs. Durability: Fast but volatile
Performance vs. Cost: Fast but expensive

When to Use:

Caching layer
Session storage
Real-time data
Temporary data
Performance critical

Caching Solutions

Redis

Characteristics:

In-memory data store
Rich data structures (strings, lists, sets, sorted sets, hashes)
Persistence options (RDB, AOF)
Pub/Sub support
Lua scripting

Use Cases:

Caching
Session storage
Real-time leaderboards
Rate limiting
Pub/Sub messaging
Distributed locks

Pros:

Very fast
Rich data structures
Persistence options
Pub/Sub support
Widely used and well-documented

Cons:

Single-threaded (can be bottleneck)
Memory limited
Cost per GB is high

Trade-offs:

Performance vs. Cost: Fast but expensive
Features vs. Simplicity: Rich features but more complex than Memcached

When to Use:

Need rich data structures
Pub/Sub required
Complex caching needs
Real-time features needed

Memcached

Characteristics:

Simple key-value cache
Multi-threaded
No persistence
Simple API

Use Cases:

Simple caching
Session storage
Database query caching

Pros:

Simple
Multi-threaded (better CPU utilization)
Lightweight
Fast

Cons:

No persistence
Limited data structures
No advanced features

Trade-offs:

Simplicity vs. Features: Simple but limited features

When to Use:

Simple caching needs
No persistence required
High throughput needed

CDN Caching

Characteristics:

Edge caching
Geographic distribution
Static content caching
Reduced latency

Use Cases:

Static assets (images, CSS, JS)
Video streaming
API responses (if cacheable)
Global content delivery

Pros:

Reduced latency
Reduced origin server load
Global distribution
High availability

Cons:

Cache invalidation complexity
Cost for high traffic
Not suitable for dynamic content

Trade-offs:

Latency vs. Freshness: Low latency but cache invalidation needed

When to Use:

Static content
Global audience
High read traffic
Latency sensitive

Application-Level Caching

Characteristics:

In-process caching
Local to application
Very fast
Limited capacity

Use Cases:

Frequently accessed data
Configuration data
Reference data

Pros:

Extremely fast (no network)
No external dependency
Simple

Cons:

Limited capacity
Not shared across instances
Memory overhead

Trade-offs:

Speed vs. Capacity: Fast but limited

When to Use:

Small, frequently accessed data
Data that doesn’t change often
Performance critical paths

Message Queues

Apache Kafka

Characteristics:

Distributed streaming platform
High throughput
Durable (disk-based)
Pub/Sub model
Partitioned topics
Exactly-once semantics

Use Cases:

Event streaming
Log aggregation
Real-time analytics
Event sourcing
Microservices communication

Pros:

Very high throughput
Durable (disk-based)
Horizontal scaling
Exactly-once semantics
Long retention
Replay capability

Cons:

Complex setup and operations
Higher latency than in-memory queues
Requires Zookeeper (or KRaft)
Learning curve

Trade-offs:

Throughput vs. Latency: High throughput but higher latency
Durability vs. Performance: Durable but slower than in-memory

When to Use:

High throughput requirements
Event streaming
Log aggregation
Need replay capability
Long retention needed

RabbitMQ

Characteristics:

Traditional message broker
Multiple messaging patterns (queues, topics, pub/sub)
ACK-based delivery
Management UI
Plugins ecosystem

Use Cases:

Task queues
Work distribution
Request/response patterns
Traditional messaging

Pros:

Mature and stable
Rich features
Good management tools
Multiple messaging patterns
Easy to use

Cons:

Lower throughput than Kafka
Single broker can be bottleneck
Clustering complexity

Trade-offs:

Features vs. Performance: Rich features but lower throughput

When to Use:

Traditional messaging needs
Task queues
Work distribution
Moderate throughput

Amazon SQS

Characteristics:

Managed message queue
Serverless
Auto-scaling
At-least-once delivery
Dead-letter queues

Use Cases:

Decoupling services
Task queues
Event-driven architectures
AWS-native applications

Pros:

Fully managed
Auto-scaling
Pay-per-use
No infrastructure management
Integrates with AWS services

Cons:

Vendor lock-in
Limited throughput per queue
At-least-once delivery (not exactly-once)
Cost at scale

Trade-offs:

Management vs. Control: Managed but less control
Cost vs. Scale: Pay-per-use but can be expensive at scale

When to Use:

AWS-native applications
Want managed service
Moderate throughput
Decoupling services

Apache Pulsar

Characteristics:

Distributed pub/sub messaging
Multi-tenancy
Geo-replication
Tiered storage
Unified messaging model

Use Cases:

Multi-tenant systems
Geo-distributed systems
Event streaming
High throughput messaging

Pros:

Multi-tenancy
Geo-replication
Tiered storage (cost-effective)
Unified messaging model
Better than Kafka for some use cases

Cons:

Less mature than Kafka
Smaller ecosystem
Learning curve

Trade-offs:

Features vs. Maturity: Rich features but less mature

When to Use:

Multi-tenant requirements
Geo-replication needed
Event streaming
Want alternatives to Kafka

Search Engines

Elasticsearch

Characteristics:

Distributed search engine
Full-text search
Real-time indexing
RESTful API
Rich query DSL
Aggregations

Use Cases:

Full-text search
Log analysis
Real-time analytics
Application search
Security analytics

Pros:

Powerful search capabilities
Real-time indexing
Rich query DSL
Aggregations
Horizontal scaling
Good documentation

Cons:

Complex to operate
Resource intensive
Can be expensive
Requires expertise

Trade-offs:

Features vs. Complexity: Powerful but complex
Performance vs. Cost: Fast but resource-intensive

When to Use:

Full-text search needed
Complex queries
Real-time search
Log analysis
Analytics

Apache Solr

Characteristics:

Search platform
Full-text search
Faceted search
RESTful API
Similar to Elasticsearch

Use Cases:

Enterprise search
E-commerce search
Content search

Pros:

Mature
Good for faceted search
Stable

Cons:

Less popular than Elasticsearch
Smaller ecosystem
Less real-time than Elasticsearch

Trade-offs:

Maturity vs. Innovation: Mature but less innovative

When to Use:

Enterprise search
Faceted search needed
Prefer mature solutions

Algolia

Characteristics:

Managed search service
Typo tolerance
Instant search
Analytics
API-first

Use Cases:

E-commerce search
Application search
Mobile app search

Pros:

Managed service
Typo tolerance
Instant search
Good UX features
Easy to integrate

Cons:

Vendor lock-in
Cost at scale
Less control

Trade-offs:

Ease vs. Control: Easy but less control
Cost vs. Scale: Can be expensive at scale

When to Use:

Want managed service
E-commerce search
Need typo tolerance
Quick integration needed

Content Delivery Networks (CDNs)

CloudFront (AWS)

Characteristics:

Global CDN
Edge locations
Integration with AWS services
DDoS protection
Custom SSL certificates

Use Cases:

Static content delivery
Video streaming
API acceleration
Global content distribution

Pros:

AWS integration
Global network
DDoS protection
Pay-per-use

Cons:

AWS vendor lock-in
Cost at scale
Less control than self-hosted

Trade-offs:

Integration vs. Flexibility: Good AWS integration but less flexible

When to Use:

AWS-native applications
Want managed CDN
Global distribution needed

Cloudflare

Characteristics:

CDN + security
DDoS protection
WAF (Web Application Firewall)
Free tier available
Global network

Use Cases:

Website acceleration
DDoS protection
Security services
Global content delivery

Pros:

Free tier
Security features
DDoS protection
Good performance

Cons:

Less control than self-hosted
Can be expensive at scale

Trade-offs:

Features vs. Cost: Rich features but can be expensive

When to Use:

Need security features
Want free tier
Website acceleration
DDoS protection needed

Self-Hosted CDN

Characteristics:

Full control
Custom configuration
Can use Varnish, Nginx, etc.

Use Cases:

Custom requirements
Cost optimization
Full control needed

Pros:

Full control
Cost optimization possible
Custom configuration

Cons:

Operational overhead
Requires expertise
Infrastructure management

Trade-offs:

Control vs. Management: Full control but more management

When to Use:

Custom requirements
Want full control
Have operational expertise
Cost optimization critical

Load Balancers

Application Load Balancer (ALB)

Characteristics:

Layer 7 (HTTP/HTTPS)
Content-based routing
SSL termination
Health checks
Auto-scaling

Use Cases:

HTTP/HTTPS traffic
Microservices routing
Content-based routing
SSL termination

Pros:

Content-based routing
SSL termination
Health checks
Auto-scaling
Managed service

Cons:

Higher latency than NLB
Cost
AWS vendor lock-in

Trade-offs:

Features vs. Latency: Rich features but higher latency

When to Use:

HTTP/HTTPS traffic
Content-based routing needed
Microservices
SSL termination needed

Network Load Balancer (NLB)

Characteristics:

Layer 4 (TCP/UDP)
Low latency
High throughput
Connection-based routing

Use Cases:

TCP/UDP traffic
Low latency requirements
High throughput
Non-HTTP protocols

Pros:

Low latency
High throughput
Connection-based routing
Managed service

Cons:

No content-based routing
Less features than ALB

Trade-offs:

Performance vs. Features: Fast but fewer features

When to Use:

Low latency critical
High throughput needed
TCP/UDP protocols
Non-HTTP traffic

HAProxy / Nginx

Characteristics:

Software load balancers
Full control
Configurable
Can be self-hosted

Use Cases:

Custom load balancing
Cost optimization
Full control needed

Pros:

Full control
Cost-effective
Highly configurable
No vendor lock-in

Cons:

Operational overhead
Requires expertise
Infrastructure management

Trade-offs:

Control vs. Management: Full control but more management

When to Use:

Custom requirements
Want full control
Cost optimization
Have operational expertise

Storage Systems

Object Storage (S3, GCS, Azure Blob)

Characteristics:

Key-value storage
Unlimited scale
Durable
Cost-effective
RESTful API

Use Cases:

File storage
Backup and archival
Static assets
Data lakes
Media storage

Pros:

Unlimited scale
Durable
Cost-effective
RESTful API
Versioning support

Cons:

Eventual consistency
Not for frequent updates
Higher latency than block storage

Trade-offs:

Scale vs. Performance: Unlimited scale but higher latency

When to Use:

File storage
Backup/archival
Static assets
Data lakes
Media storage

Block Storage (EBS, Persistent Disk)

Characteristics:

Block-level storage
Attached to instances
Low latency
Limited scale per volume

Use Cases:

Database storage
Application storage
Boot volumes
High IOPS requirements

Pros:

Low latency
High IOPS
Direct attachment
Good for databases

Cons:

Limited scale per volume
More expensive than object storage
Tied to instances

Trade-offs:

Performance vs. Scale: Low latency but limited scale

When to Use:

Database storage
High IOPS needed
Low latency critical
Application storage

Distributed File Systems (HDFS, GlusterFS)

Characteristics:

Distributed across nodes
High throughput
Fault tolerant
Good for large files

Use Cases:

Big data processing
Data lakes
Analytics workloads
Large file storage

Pros:

High throughput
Fault tolerant
Good for large files
Cost-effective

Cons:

Not for small files
Higher latency
Complex operations

Trade-offs:

Throughput vs. Latency: High throughput but higher latency

When to Use:

Big data processing
Large files
Analytics workloads
Batch processing

Stream Processing

Apache Flink

Characteristics:

Stream processing framework
Low latency
Exactly-once semantics
Stateful processing
Event time processing

Use Cases:

Real-time analytics
Event-driven applications
Complex event processing
Top-K calculations
Windowed aggregations

Pros:

Low latency
Exactly-once semantics
Stateful processing
Event time processing
Good fault tolerance

Cons:

Complex to operate
Learning curve
Resource intensive

Trade-offs:

Features vs. Complexity: Powerful but complex

When to Use:

Real-time processing
Exactly-once needed
Stateful processing
Low latency critical

Apache Kafka Streams

Characteristics:

Stream processing library
Part of Kafka ecosystem
Lightweight
Exactly-once semantics

Use Cases:

Kafka-native processing
Simple stream processing
Microservices

Pros:

Kafka integration
Lightweight
Exactly-once semantics
Easy to deploy

Cons:

Less features than Flink
Tied to Kafka
Limited for complex processing

Trade-offs:

Simplicity vs. Features: Simple but fewer features

When to Use:

Kafka ecosystem
Simple processing
Microservices
Lightweight needs

Apache Storm

Characteristics:

Stream processing framework
Real-time processing
At-least-once semantics
Mature

Use Cases:

Real-time processing
Simple stream processing

Pros:

Mature
Real-time processing
Good for simple cases

Cons:

At-least-once (not exactly-once)
Less features than Flink
Declining popularity

Trade-offs:

Maturity vs. Features: Mature but fewer features

When to Use:

Simple stream processing
Real-time needed
At-least-once acceptable

API Design Patterns

REST

Characteristics:

Stateless
Resource-based
HTTP methods
JSON/XML
Cacheable

Use Cases:

General APIs
CRUD operations
Web services
Public APIs

Pros:

Simple
Well-understood
Cacheable
Stateless
Tooling support

Cons:

Over-fetching/under-fetching
Multiple round trips
No real-time updates

Trade-offs:

Simplicity vs. Efficiency: Simple but can be inefficient

When to Use:

General APIs
CRUD operations
Public APIs
Simple use cases

GraphQL

Characteristics:

Query language
Single endpoint
Client-specified queries
Strongly typed
Real-time subscriptions

Use Cases:

Mobile applications
Complex data requirements
Multiple clients
Real-time updates needed

Pros:

Flexible queries
Single endpoint
No over-fetching
Strongly typed
Real-time subscriptions

Cons:

Complexity
Caching challenges
Over-fetching prevention needed
Learning curve

Trade-offs:

Flexibility vs. Complexity: Flexible but complex

When to Use:

Complex data requirements
Multiple clients
Mobile applications
Real-time needed

gRPC

Characteristics:

RPC framework
Protocol Buffers
HTTP/2
Strongly typed
High performance

Use Cases:

Microservices communication
Internal APIs
High performance needed
Streaming

Pros:

High performance
Strongly typed
Streaming support
Efficient serialization
Multi-language support

Cons:

Less human-readable
Browser support limited
Learning curve

Trade-offs:

Performance vs. Usability: Fast but less user-friendly

When to Use:

Microservices
Internal APIs
High performance critical
Streaming needed

Consistency Models

Strong Consistency

Characteristics:

All reads see latest write
ACID transactions
Synchronous replication

Use Cases:

Financial transactions
User accounts
Critical data
ACID requirements

Pros:

Data always correct
Predictable behavior
Easier to reason about

Cons:

Lower availability
Higher latency
Limited scalability

Trade-offs:

Correctness vs. Availability: Correct but lower availability

When to Use:

Critical data
Financial transactions
ACID requirements
Correctness > Availability

Eventual Consistency

Characteristics:

Reads may see stale data
Asynchronous replication
Eventually consistent

Use Cases:

Social media feeds
User profiles
Non-critical data
High availability needed

Pros:

High availability
Lower latency
Better scalability
Global distribution

Cons:

Stale reads possible
Complex conflict resolution
Harder to reason about

Trade-offs:

Availability vs. Correctness: High availability but may be stale

When to Use:

High availability critical
Can tolerate stale data
Global distribution
High scale needed

Read-Your-Writes Consistency

Characteristics:

User sees own writes immediately
Others may see stale data
Session-based

Use Cases:

User profiles
Social media
E-commerce

Pros:

Good user experience
Better than eventual consistency for users
Reasonable availability

Cons:

Not globally consistent
Complex to implement

Trade-offs:

UX vs. Complexity: Good UX but complex

When to Use:

User-specific data
Good UX needed
Can tolerate eventual consistency for others

Replication Strategies

Master-Slave (Primary-Replica)

Characteristics:

One master, multiple replicas
Master handles writes
Replicas handle reads
Asynchronous replication

Use Cases:

Read-heavy workloads
Scaling reads
Backup and disaster recovery

Pros:

Simple
Scales reads
Backup available
Disaster recovery

Cons:

Single point of failure (master)
Replication lag
Write bottleneck

Trade-offs:

Simplicity vs. Availability: Simple but single point of failure

When to Use:

Read-heavy workloads
Simple setup needed
Can tolerate replication lag

Master-Master (Multi-Master)

Characteristics:

Multiple masters
Writes to any master
Conflict resolution needed
Synchronous or asynchronous

Use Cases:

High availability
Geographic distribution
Write scaling

Pros:

High availability
No single point of failure
Geographic distribution
Write scaling

Cons:

Conflict resolution complexity
Consistency challenges
Complex operations

Trade-offs:

Availability vs. Complexity: High availability but complex

When to Use:

High availability critical
Geographic distribution
Can handle conflicts
Write scaling needed

Leader-Follower (Raft, Paxos)

Characteristics:

Consensus algorithm
Leader elected
Followers replicate
Strong consistency

Use Cases:

Distributed systems
Configuration management
Strong consistency needed

Pros:

Strong consistency
Fault tolerant
No single point of failure
Consensus guaranteed

Cons:

Complex
Higher latency
Requires majority

Trade-offs:

Consistency vs. Latency: Consistent but higher latency

When to Use:

Strong consistency needed
Configuration management
Distributed coordination
Can tolerate higher latency

Sharding Strategies

Range-Based Sharding

Characteristics:

Shard by value ranges
Sequential data
Easy to understand

Use Cases:

Time-series data
Sequential IDs
Range queries

Pros:

Simple
Good for range queries
Easy to understand

Cons:

Hot spots possible
Uneven distribution
Rebalancing needed

Trade-offs:

Simplicity vs. Distribution: Simple but can have hot spots

When to Use:

Time-series data
Range queries
Sequential data

Hash-Based Sharding

Characteristics:

Shard by hash of key
Even distribution
No hot spots

Use Cases:

User data
General sharding
Even distribution needed

Pros:

Even distribution
No hot spots
Simple hashing

Cons:

Range queries difficult
Rebalancing complex
No locality

Trade-offs:

Distribution vs. Queries: Even distribution but hard range queries

When to Use:

Even distribution needed
No range queries
User data
General sharding

Directory-Based Sharding

Characteristics:

Shard lookup service
Flexible mapping
Can change mapping

Use Cases:

Flexible sharding
Changing requirements
Complex sharding logic

Pros:

Flexible
Can change mapping
Complex logic possible

Cons:

Single point of failure
Lookup overhead
Complexity

Trade-offs:

Flexibility vs. Complexity: Flexible but complex

When to Use:

Flexible sharding needed
Complex logic
Changing requirements

What Interviewers Look For

Technology Selection Skills

Understanding Trade-offs
- Consistency vs. availability
- Latency vs. throughput
- Performance vs. cost
- Red Flags: No trade-off awareness, dogmatic choices, ignoring constraints
Technology Knowledge
- When to use SQL vs. NoSQL
- When to use Redis vs. Memcached
- When to use Kafka vs. RabbitMQ
- Red Flags: Wrong technology choice, no justification, can’t explain differences
Decision-Making Framework
- Requirements analysis
- Constraint identification
- Trade-off evaluation
- Red Flags: No framework, random choices, no analysis

System Design Skills

Database Selection
- SQL for ACID transactions
- NoSQL for scale and flexibility
- Time-series for metrics
- Red Flags: Wrong database choice, no justification, ignoring requirements
Caching Strategy
- Redis for rich features
- Memcached for simplicity
- CDN for static content
- Red Flags: No caching, wrong cache choice, no strategy
Message Queue Selection
- Kafka for high throughput
- RabbitMQ for traditional messaging
- SQS for managed service
- Red Flags: Wrong queue choice, no justification, ignoring scale

Problem-Solving Approach

Requirements Analysis
- Scale requirements
- Consistency requirements
- Latency requirements
- Red Flags: No requirements analysis, assumptions, ignoring constraints
Cost Consideration
- Cost optimization
- Budget constraints
- Operational costs
- Red Flags: Ignoring costs, no optimization, expensive choices
Operational Complexity
- Team expertise
- Maintenance overhead
- Managed vs. self-hosted
- Red Flags: Ignoring complexity, no operational consideration

Communication Skills

Technology Justification
- Can explain why each technology
- Understands trade-offs
- Red Flags: No justification, vague explanations, can’t defend choices
Alternative Discussion
- Considers alternatives
- Explains why not chosen
- Red Flags: No alternatives, single solution, no comparison

Meta-Specific Focus

Judgment in Technology Selection
- Right tool for the job
- Understanding of trade-offs
- Key: Show good judgment in technology selection
Practical Knowledge
- Real-world experience
- Understanding of limitations
- Key: Demonstrate practical knowledge, not just theory

Summary

Key Decision Factors

When choosing technologies for system design, consider:

Scale Requirements
- Read/write throughput
- Data volume
- Number of users
- Geographic distribution
Consistency Requirements
- Strong consistency vs. eventual consistency
- ACID transactions needed?
- Can tolerate stale data?
Latency Requirements
- Real-time vs. batch
- P50, P95, P99 latency targets
- User-facing vs. background
Availability Requirements
- Uptime SLA (99.9%, 99.99%, etc.)
- Disaster recovery needs
- Multi-region requirements
Cost Constraints
- Budget limitations
- Cost optimization needed
- Pay-per-use vs. fixed cost
Operational Complexity
- Team expertise
- Operational overhead
- Managed vs. self-hosted
Data Model
- Relational vs. document vs. key-value
- Query patterns
- Relationships complexity

Quick Reference: When to Use What

Databases:

SQL: ACID transactions, complex queries, strong consistency
NoSQL Document: Flexible schema, horizontal scaling, high writes
NoSQL Key-Value: Simple model, caching, high performance
NoSQL Column: Time-series, high writes, multi-region
Graph: Complex relationships, social networks
Time-Series: IoT, metrics, monitoring

Caching:

Redis: Rich data structures, pub/sub, persistence
Memcached: Simple caching, high throughput
CDN: Static content, global distribution
Application Cache: Small, frequent data, no network

Message Queues:

Kafka: High throughput, event streaming, replay
RabbitMQ: Traditional messaging, task queues
SQS: Managed, AWS-native, decoupling
Pulsar: Multi-tenant, geo-replication

Search:

Elasticsearch: Full-text search, complex queries, analytics
Solr: Enterprise search, faceted search
Algolia: Managed, typo tolerance, instant search

Storage:

Object Storage: Files, backups, static assets, unlimited scale
Block Storage: Databases, high IOPS, low latency
Distributed FS: Big data, large files, analytics

Stream Processing:

Flink: Low latency, exactly-once, stateful processing
Kafka Streams: Kafka-native, lightweight, simple
Storm: Real-time, simple processing

API Patterns:

REST: General APIs, CRUD, simple
GraphQL: Complex queries, multiple clients, flexible
gRPC: Microservices, high performance, streaming

Consistency:

Strong: Financial, critical data, ACID
Eventual: High availability, global, scale
Read-Your-Writes: User data, good UX

Replication:

Master-Slave: Read scaling, simple, backup
Master-Master: High availability, geo-distribution
Leader-Follower: Strong consistency, consensus

Sharding:

Range: Time-series, sequential, range queries
Hash: Even distribution, user data, general
Directory: Flexible, complex logic, changing needs

Common Trade-offs Summary

Consistency vs. Availability: Strong consistency reduces availability
Latency vs. Throughput: Lower latency often means lower throughput
Performance vs. Cost: Higher performance usually costs more
Simplicity vs. Features: More features increase complexity
Control vs. Management: More control requires more management
Scale vs. Complexity: Better scaling often means more complexity

Remember: There’s no one-size-fits-all solution. The best technology choice depends on your specific requirements, constraints, and trade-offs you’re willing to make.

Introduction

Table of Contents

Databases

SQL Databases

NoSQL Databases

Document Databases

Key-Value Stores

Column-Family Stores (Wide-Column)

Graph Databases

Time-Series Databases

In-Memory Databases

Caching Solutions

Redis

Memcached

CDN Caching

Application-Level Caching

Message Queues

Apache Kafka

RabbitMQ

Amazon SQS

Apache Pulsar

Search Engines

Elasticsearch

Apache Solr

Algolia

Content Delivery Networks (CDNs)

CloudFront (AWS)

Cloudflare

Self-Hosted CDN

Load Balancers

Application Load Balancer (ALB)

Network Load Balancer (NLB)

HAProxy / Nginx

Storage Systems

Object Storage (S3, GCS, Azure Blob)

Block Storage (EBS, Persistent Disk)

Distributed File Systems (HDFS, GlusterFS)

Stream Processing

Apache Flink

Apache Kafka Streams

Apache Storm

API Design Patterns

REST

GraphQL

gRPC

Consistency Models

Strong Consistency

Eventual Consistency

Read-Your-Writes Consistency

Replication Strategies

Master-Slave (Primary-Replica)

Master-Master (Multi-Master)

Leader-Follower (Raft, Paxos)

Sharding Strategies

Range-Based Sharding

Hash-Based Sharding

Directory-Based Sharding

What Interviewers Look For

Technology Selection Skills

System Design Skills

Problem-Solving Approach

Communication Skills

Meta-Specific Focus

Summary

Key Decision Factors

Quick Reference: When to Use What

Common Trade-offs Summary

Related Posts

Recent Posts