Introduction

The CAP Theorem (Consistency, Availability, Partition Tolerance) is a fundamental principle in distributed systems that states that it’s impossible for a distributed system to simultaneously guarantee all three properties. Understanding CAP Theorem is crucial for system design interviews and building distributed systems.

This guide covers:

  • CAP Theorem Fundamentals: Understanding the three properties
  • Trade-offs: Why you can only choose two out of three
  • Real-World Examples: How different systems handle CAP trade-offs
  • System Design Applications: How to apply CAP Theorem in interviews
  • Practical Patterns: CP, AP, and CA systems

What is CAP Theorem?

The CAP Theorem, proposed by Eric Brewer in 2000, states that in a distributed system, you can only guarantee two out of three properties:

  1. Consistency (C): All nodes see the same data at the same time
  2. Availability (A): System remains operational and responds to requests
  3. Partition Tolerance (P): System continues to operate despite network partitions

The Impossibility

Why can’t we have all three?

When a network partition occurs:

  • To maintain Consistency: System must reject writes (unavailable)
  • To maintain Availability: System must accept writes (inconsistent)
  • Partition Tolerance: Must be handled (network failures are inevitable)

Conclusion: You must choose between Consistency and Availability during partitions.

Understanding the Three Properties

Consistency (C)

Definition: Every read receives the most recent write or an error.

Characteristics:

  • All nodes have the same data at the same time
  • Strong consistency: Linearizability, sequential consistency
  • Weak consistency: Eventual consistency, causal consistency

Examples:

  • Strong Consistency: ACID databases (PostgreSQL, MySQL)
  • Eventual Consistency: DNS, CDNs, DynamoDB

Trade-offs:

  • ✅ Data is always correct
  • ❌ Higher latency (wait for replication)
  • ❌ Lower availability (may reject requests)

Availability (A)

Definition: Every request receives a response (non-error), without guarantee that it contains the most recent write.

Characteristics:

  • System remains operational
  • No downtime
  • Accepts all requests

Examples:

  • Highly Available: Cassandra, DynamoDB, CDNs
  • Always On: DNS, load balancers

Trade-offs:

  • ✅ System always responds
  • ✅ Better user experience
  • ❌ May return stale data
  • ❌ Eventual consistency

Partition Tolerance (P)

Definition: System continues to operate despite network partitions (message loss or delay between nodes).

Characteristics:

  • Network failures are inevitable
  • System must handle partitions
  • Cannot be sacrificed in distributed systems

Examples:

  • Partition Tolerant: All distributed systems
  • Not Partition Tolerant: Single-node systems

Trade-offs:

  • ✅ Handles network failures
  • ✅ Distributed architecture
  • ❌ Must choose C or A during partitions

Architecture

CAP Theorem Trade-off Visualization

                    CAP Theorem
                        │
        ┌───────────────┼───────────────┐
        │               │               │
    Consistency    Availability   Partition
        (C)            (A)        Tolerance
                                        (P)
        │               │               │
        └───────┬───────┴───────┬───────┘
                │               │
        ┌───────▼───────┐ ┌──────▼───────┐
        │   CP Systems  │ │   AP Systems │
        │               │ │              │
        │ - MongoDB     │ │ - Cassandra  │
        │ - HBase       │ │ - DynamoDB   │
        │ - PostgreSQL  │ │ - CouchDB    │
        │   (Replica)   │ │ - Riak       │
        └───────────────┘ └──────────────┘
                │               │
                └───────┬───────┘
                        │
                Network Partition
                        │
        ┌───────────────┴───────────────┐
        │                               │
┌───────▼───────┐           ┌──────────▼───────┐
│ Choose C      │           │ Choose A          │
│ (Reject Writes│           │ (Accept Writes   │
│  - Unavailable│           │  - Inconsistent)│
└───────────────┘           └──────────────────┘

Explanation:

  • CAP Theorem: States that in a distributed system, you can only guarantee two out of three properties: Consistency, Availability, and Partition Tolerance.
  • CP Systems: Prioritize Consistency and Partition Tolerance. During network partitions, they reject writes to maintain consistency (become unavailable).
  • AP Systems: Prioritize Availability and Partition Tolerance. During network partitions, they accept writes but may return inconsistent data.
  • Network Partition: When network communication between nodes fails, the system must choose between maintaining consistency (CP) or availability (AP).

CAP Theorem Combinations

CP Systems (Consistency + Partition Tolerance)

Characteristics:

  • Strong consistency
  • Sacrifices availability during partitions
  • Rejects requests when partition occurs

Examples:

  • MongoDB (with strong consistency)
  • HBase
  • Traditional RDBMS (with replication)
  • Zookeeper

Use Cases:

  • Financial systems (banking, trading)
  • Critical data (user accounts, payments)
  • Systems where consistency is paramount

Behavior During Partition:

Partition occurs → System detects partition → Rejects writes → Maintains consistency

AP Systems (Availability + Partition Tolerance)

Characteristics:

  • High availability
  • Sacrifices consistency during partitions
  • Accepts requests even during partitions

Examples:

  • Cassandra
  • DynamoDB (eventual consistency mode)
  • CouchDB
  • DNS

Use Cases:

  • Social media feeds
  • Content delivery (CDN)
  • Real-time analytics
  • Systems where availability is critical

Behavior During Partition:

Partition occurs → System accepts writes → May have conflicts → Resolves later (eventual consistency)

CA Systems (Consistency + Availability)

Characteristics:

  • Strong consistency
  • High availability
  • Not partition tolerant (single node or tightly coupled)

Examples:

  • Single-node databases (PostgreSQL on one server)
  • Traditional monolithic systems
  • In-memory databases (single instance)

Limitations:

  • Cannot scale horizontally
  • Single point of failure
  • Not suitable for distributed systems

Note: In practice, CA systems don’t exist in distributed systems because partitions are inevitable.

Real-World Examples

CP System: MongoDB (with Strong Consistency)

Configuration:

  • Write concern: majority
  • Read concern: majority
  • Replica set with majority writes

Behavior:

  • During partition: Rejects writes if majority unavailable
  • Maintains consistency
  • Sacrifices availability

Example:

// MongoDB with strong consistency
db.collection.insertOne(
  { user_id: 123, balance: 1000 },
  { writeConcern: { w: "majority" } }
);

// During partition:
// - If majority nodes available: Write succeeds
// - If majority nodes unavailable: Write fails (CP)

AP System: Cassandra

Configuration:

  • Tunable consistency (QUORUM, ONE, ALL)
  • Default: Eventual consistency
  • Multi-master replication

Behavior:

  • During partition: Accepts writes to available nodes
  • May have conflicts
  • Resolves conflicts later (vector clocks, last-write-wins)

Example:

# Cassandra with eventual consistency
session.execute(
    "INSERT INTO users (user_id, name) VALUES (123, 'John')",
    consistency_level=ConsistencyLevel.ONE  # AP mode
);

# During partition:
# - Accepts writes to any available node
# - May have conflicts (different nodes have different data)
# - Resolves conflicts later (eventual consistency)

Hybrid: DynamoDB

Configuration:

  • Strong consistency (optional)
  • Eventual consistency (default)
  • Tunable per request

Behavior:

  • Strong Consistency Mode: CP (may reject during partition)
  • Eventual Consistency Mode: AP (always available)

Example:

# DynamoDB with eventual consistency (AP)
response = table.get_item(
    Key={'user_id': 123},
    ConsistentRead=False  # AP mode
)

# DynamoDB with strong consistency (CP)
response = table.get_item(
    Key={'user_id': 123},
    ConsistentRead=True  # CP mode (may be slower/unavailable)
)

CAP Theorem in System Design Interviews

How to Apply CAP Theorem

1. Identify System Requirements:

  • What consistency level is needed?
  • What availability requirements?
  • Is the system distributed?

2. Choose CAP Combination:

  • CP: Financial systems, critical data
  • AP: Social media, content delivery, analytics
  • CA: Not applicable for distributed systems

3. Explain Trade-offs:

  • Why you chose CP or AP
  • What you’re sacrificing
  • How you handle partitions

Interview Example: Design a Payment System

Question: “Design a payment processing system”

CAP Analysis:

  • Consistency: Critical (can’t have double charges)
  • Availability: Important (but can accept brief downtime)
  • Partition Tolerance: Required (distributed system)

Choice: CP System

Reasoning:

  • Consistency is paramount (financial correctness)
  • Can sacrifice availability during partitions
  • Reject transactions if partition detected
  • Better to be unavailable than inconsistent

Implementation:

  • Use strong consistency (majority writes)
  • Reject writes during partitions
  • Use two-phase commit for transactions
  • Maintain consistency at all costs

Interview Example: Design a Social Media Feed

Question: “Design a social media feed system”

CAP Analysis:

  • Consistency: Less critical (eventual consistency acceptable)
  • Availability: Critical (users expect always-on)
  • Partition Tolerance: Required (distributed system)

Choice: AP System

Reasoning:

  • Availability is critical (user experience)
  • Eventual consistency acceptable (feeds can be slightly stale)
  • Better to show stale data than no data
  • Resolve conflicts later

Implementation:

  • Use eventual consistency
  • Accept writes during partitions
  • Resolve conflicts (last-write-wins, vector clocks)
  • Prioritize availability

Beyond CAP: PACELC Theorem

PACELC extends CAP Theorem:

  • PAC: If Partition (P), choose Availability (A) or Consistency (C)
  • ELC: Else (no partition), choose Latency (L) or Consistency (C)

Examples:

  • DynamoDB: PACELC = PA/EC (Partition: Availability, Else: Consistency)
  • MongoDB: PACELC = PC/EC (Partition: Consistency, Else: Consistency)
  • Cassandra: PACELC = PA/EL (Partition: Availability, Else: Latency)

Common Misconceptions

Misconception 1: “You can have 2.5 out of 3”

Reality: You can only have 2 out of 3. There’s no “partial” guarantee.

Misconception 2: “CA systems exist in distributed systems”

Reality: CA systems are not partition tolerant, so they’re not truly distributed. In practice, all distributed systems must handle partitions (P).

Misconception 3: “CP means always consistent”

Reality: CP means consistent during normal operation, but may reject requests during partitions to maintain consistency.

Misconception 4: “AP means always available”

Reality: AP means available during partitions, but may return stale data (eventual consistency).

Best Practices

Choosing CP vs AP

Choose CP when:

  • Data correctness is critical
  • Financial transactions
  • User accounts, authentication
  • Can accept brief unavailability

Choose AP when:

  • User experience is critical
  • Content delivery
  • Social media, feeds
  • Analytics, logging
  • Can accept eventual consistency

Hybrid Approaches

Many systems use both:

  • Strong consistency for critical operations
  • Eventual consistency for non-critical operations
  • Tunable consistency per operation

Example:

  • Payment processing: CP (strong consistency)
  • User profiles: AP (eventual consistency)
  • Analytics: AP (eventual consistency)

What Interviewers Look For

Distributed Systems Theory

  1. CAP Theorem Understanding
    • Understanding of the three properties
    • Why you can only choose two
    • Trade-off analysis
    • Red Flags: Thinks you can have all three, no trade-off understanding, wrong choices
  2. System Classification
    • Can classify systems as CP or AP
    • Understands when to use each
    • Explains trade-offs clearly
    • Red Flags: Wrong classification, no reasoning, can’t explain trade-offs
  3. Practical Application
    • Can apply CAP Theorem to design decisions
    • Explains choices in interviews
    • Understands real-world implications
    • Red Flags: Theoretical only, can’t apply, no practical understanding

Problem-Solving Approach

  1. Trade-off Analysis
    • Identifies consistency vs availability trade-offs
    • Makes informed decisions
    • Justifies choices
    • Red Flags: No trade-offs, dogmatic choices, no justification
  2. System Design Application
    • Applies CAP Theorem to system design
    • Chooses appropriate consistency model
    • Explains impact on design
    • Red Flags: Ignores CAP, wrong choices, no impact analysis

Communication Skills

  1. Clear Explanation
    • Explains CAP Theorem clearly
    • Discusses trade-offs
    • Justifies design decisions
    • Red Flags: Unclear explanations, no justification, confusing
  2. Real-World Examples
    • Provides real-world examples
    • Understands how systems handle CAP
    • Can discuss specific systems
    • Red Flags: No examples, theoretical only, wrong examples

Meta-Specific Focus

  1. Distributed Systems Expertise
    • Deep understanding of distributed systems
    • CAP Theorem mastery
    • Trade-off thinking
    • Key: Demonstrate distributed systems expertise
  2. System Design Skills
    • Can apply theory to practice
    • Makes informed design decisions
    • Understands implications
    • Key: Show practical application skills

Summary

CAP Theorem Key Points:

  • You can only guarantee 2 out of 3 properties
  • Partition Tolerance is required in distributed systems
  • Choose between Consistency (CP) or Availability (AP)
  • CP Systems: Consistency + Partition Tolerance (sacrifice availability)
  • AP Systems: Availability + Partition Tolerance (sacrifice consistency)
  • CA Systems: Not applicable for distributed systems

System Design Applications:

  • Financial Systems: CP (consistency critical)
  • Social Media: AP (availability critical)
  • Hybrid: Use both CP and AP for different operations

Understanding CAP Theorem helps you make informed decisions about consistency, availability, and partition handling in distributed systems.