Introduction

Apache Cassandra is an open-source, distributed NoSQL database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Originally developed at Facebook and released as open-source in 2008, Cassandra has become one of the most popular distributed databases, powering systems at companies like Netflix, Apple, Instagram, and many others.

What is Cassandra?

Cassandra is a wide-column store NoSQL database that provides:

  • Distributed Architecture: Data distributed across multiple nodes
  • High Availability: No single point of failure
  • Linear Scalability: Add nodes to increase capacity
  • Tunable Consistency: Choose consistency level per operation
  • Multi-Datacenter Support: Built-in replication across data centers

Why Cassandra?

Key Advantages:

  • Massive Scalability: Handle petabytes of data across thousands of nodes
  • High Availability: Replication ensures data is always available
  • Write Performance: Optimized for high write throughput
  • Fault Tolerance: Continues operating even if nodes fail
  • No Single Point of Failure: Peer-to-peer architecture
  • Multi-Region: Built-in support for multiple data centers

Common Use Cases:

  • Time-series data (IoT, metrics, logs)
  • High write throughput applications
  • Global applications requiring multi-region support
  • Event logging and tracking
  • Real-time analytics
  • Content management systems

Architecture

High-Level Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client    │────▶│  Load Balancer  │────▶│   Cassandra   │
│ (Application)│     │             │     │    Cluster    │
└──────┬──────┘     └──────┬──────┘     └──────┬───────┘
       │                    │                    │
       │                    ▼                    │
       │             ┌───────────┐               │
       │             │ Coordinator │               │
       │             │    Node     │               │
       │             └──────┬──────┘               │
       │                    │                    │
       │                    ▼                    │
       │             ┌───────────┐               │
       │             │  Replica    │               │
       │             │    Nodes    │               │
       │             └───────────┘               │
       │                    │                    │
       └───────────────────────────────────────────┘

Explanation:

  • Client: Your application (e.g., web server, microservice) that interacts with Cassandra.
  • Load Balancer: Distributes client requests across Cassandra nodes. Can be a dedicated load balancer or client-side load balancing in the Cassandra driver.
  • Coordinator Node: Any Cassandra node can act as a coordinator. It receives the client request, determines which replica nodes should be involved based on the partition key and replication strategy, and then forwards the request to them.
  • Cassandra Cluster: A collection of Cassandra nodes working together. Data is distributed across these nodes.
  • Replica Nodes: Nodes that store copies of the data. The number of replicas is determined by the replication factor.

Detailed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Client Applications                       │
│  (Web Apps, Mobile Apps, Microservices, Data Pipelines)          │
└────────────────────────────┬──────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Cassandra Cluster                           │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Datacenter 1 (US-East)                 │  │
│  │                                                             │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Node 1     │  │   Node 2     │  │   Node 3     │    │  │
│  │  │  (Broker)    │  │  (Broker)    │  │  (Broker)    │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │ Memtable│  │  │  │ Memtable│  │  │  │ Memtable│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │CommitLog│  │  │  │CommitLog│  │  │  │CommitLog│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │SSTables│  │  │  │SSTables│  │  │  │SSTables│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │
│  │                                                             │  │
│  │  Rack 1              Rack 2              Rack 3            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                             │                                     │
│                             │ Replication                         │
│                             │                                     │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Datacenter 2 (EU-West)                 │  │
│  │                                                             │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Node 4     │  │   Node 5     │  │   Node 6     │    │  │
│  │  │  (Broker)    │  │  (Broker)    │  │  (Broker)    │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │ Memtable│  │  │  │ Memtable│  │  │  │ Memtable│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │CommitLog│  │  │  │CommitLog│  │  │  │CommitLog│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │SSTables│  │  │  │SSTables│  │  │  │SSTables│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │
│  │                                                             │  │
│  │  Rack 1              Rack 2              Rack 3            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Token Ring (Consistent Hashing)         │  │
│  │                                                             │  │
│  │      Node1 ──► Node2 ──► Node3 ──► Node4 ──► Node5         │  │
│  │        ▲                                    │              │  │
│  │        └────────────────────────────────────┘              │  │
│  │                                                             │  │
│  │  Partition Key Hash → Determines Node Location             │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Topics/Keyspaces                        │  │
│  │                                                             │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Topic A    │  │   Topic B    │  │   Topic C    │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │ Partition 0  │  │ Partition 0  │  │ Partition 0  │    │  │
│  │  │ Partition 1  │  │ Partition 1  │  │ Partition 1  │    │  │
│  │  │ Partition 2  │  │ Partition 2  │  │ Partition 2  │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │ RF=3         │  │ RF=3         │  │ RF=3         │    │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                             │
                             │ Gossip Protocol (Node Communication)
                             │
┌─────────────────────────────────────────────────────────────────┐
│                    Cluster Management                            │
│  - Gossip Protocol: Node discovery and health monitoring        │
│  - Snitch: Network topology awareness                            │
│  - Partitioner: Data distribution (Murmur3Partitioner)          │
│  - Compaction: SSTable merging and cleanup                       │
└─────────────────────────────────────────────────────────────────┘

Key Components:

  1. Client Applications: Connect to any node in the cluster
  2. Cassandra Nodes: Peer-to-peer architecture, no master node
  3. Datacenters: Logical grouping of nodes (often by geographic location)
  4. Racks: Physical grouping within datacenter (for fault tolerance)
  5. Token Ring: Consistent hashing for data distribution
  6. Replication: Data replicated across multiple nodes/datacenters
  7. Gossip Protocol: Nodes communicate to discover cluster state

Data Flow:

  1. Write Path:
    • Client sends write to coordinator node
    • Coordinator determines replica nodes (based on partition key)
    • Write to commit log (durability)
    • Write to memtable (in-memory)
    • Replicate to other replicas (based on consistency level)
    • Return success to client
    • Memtable flushed to SSTable (when full)
  2. Read Path:
    • Client sends read to coordinator node
    • Coordinator determines replica nodes (based on consistency level)
    • Read from replicas (may read from multiple)
    • Merge results (if different versions)
    • Return result to client

Distributed Architecture

Peer-to-Peer Model:

  • No master node (unlike master-slave architectures)
  • All nodes are equal (peer-to-peer)
  • Data distributed across all nodes
  • No single point of failure

Key Concepts:

  • Node: Single server in cluster
  • Datacenter: Group of nodes (often physical location)
  • Rack: Group of nodes within datacenter
  • Cluster: Collection of nodes
  • Ring: Logical representation of cluster

Data Distribution

Consistent Hashing:

  • Each node assigned a token (position on ring)
  • Data partitioned by partition key
  • Hash(partition_key) → determines which node stores data
  • Even distribution across nodes

Partitioning:

  • Data split into partitions
  • Each partition stored on multiple nodes (replication)
  • Partition key determines data location
  • Enables parallel processing

Replication

Replication Strategy:

  • Replication Factor (RF): Number of copies of data
  • Replication Strategy: How replicas are placed
    • SimpleStrategy: For single datacenter
    • NetworkTopologyStrategy: For multiple datacenters

Replica Placement:

Replication Factor = 3

Partition → Node A (primary)
         → Node B (replica)
         → Node C (replica)

All replicas on different nodes/racks/datacenters

Multi-Datacenter Replication:

  • Replicas distributed across datacenters
  • Local reads (read from local datacenter)
  • Cross-datacenter replication for disaster recovery

Data Model

Tables and Columns

Table Structure:

  • Similar to SQL tables but with important differences
  • Partition Key: Determines which node stores data
  • Clustering Columns: Determine sort order within partition
  • Regular Columns: Data columns

Example:

CREATE TABLE users (
    user_id UUID,
    email TEXT,
    first_name TEXT,
    last_name TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id)
);

Composite Primary Key:

CREATE TABLE user_posts (
    user_id UUID,
    post_id UUID,
    title TEXT,
    content TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
);
-- user_id = partition key
-- post_id = clustering column

Partition Key

Purpose:

  • Determines which node stores the data
  • All rows with same partition key stored together
  • Enables efficient queries

Best Practices:

  • High Cardinality: Many unique values
  • Even Distribution: Avoid hotspots
  • Query Pattern: Match your query patterns

Example:

-- Good: High cardinality
PRIMARY KEY (user_id, post_id)

-- Bad: Low cardinality (hotspot)
PRIMARY KEY (status, user_id)  -- status has few values

Clustering Columns

Purpose:

  • Determines sort order within partition
  • Enables range queries
  • Multiple clustering columns create nested sorting

Example:

CREATE TABLE sensor_data (
    sensor_id UUID,
    date DATE,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (sensor_id, date, timestamp)
);
-- Partition by sensor_id
-- Sort by date, then timestamp
-- Can query: WHERE sensor_id = ? AND date = ? AND timestamp > ?

Data Types

Basic Types:

  • TEXT, VARCHAR, ASCII: String types
  • INT, BIGINT, SMALLINT: Integer types
  • FLOAT, DOUBLE: Floating point
  • BOOLEAN: Boolean
  • UUID, TIMEUUID: UUID types
  • TIMESTAMP: Date/time
  • BLOB: Binary data

Collection Types:

  • LIST<TEXT>: Ordered list
  • SET<TEXT>: Unordered unique set
  • MAP<TEXT, TEXT>: Key-value map

Example:

CREATE TABLE user_profile (
    user_id UUID PRIMARY KEY,
    tags LIST<TEXT>,
    preferences MAP<TEXT, TEXT>,
    favorite_colors SET<TEXT>
);

Consistency Model

Tunable Consistency

Cassandra provides tunable consistency - choose consistency level per operation.

Consistency Levels:

  1. ONE
    • Read/write from/to one replica
    • Fastest, least consistent
    • May see stale data
  2. QUORUM
    • Read/write from/to majority of replicas
    • RF=3: Read/write from 2 replicas
    • Good balance of consistency and performance
  3. ALL
    • Read/write from/to all replicas
    • Strongest consistency
    • Slowest, may fail if any replica unavailable
  4. LOCAL_QUORUM
    • Quorum within local datacenter
    • Faster than QUORUM for multi-datacenter
  5. EACH_QUORUM
    • Quorum in each datacenter
    • Strong consistency across datacenters

Consistency Formula:

R + W > RF  → Strong Consistency
R + W ≤ RF  → Eventual Consistency

Where:
R = Read consistency level
W = Write consistency level
RF = Replication factor

Example:

RF = 3
R = QUORUM (2)
W = QUORUM (2)
R + W = 4 > 3 → Strong Consistency

Eventual Consistency

Default Behavior:

  • Writes propagate asynchronously
  • Reads may see stale data temporarily
  • Eventually all replicas converge
  • Trade-off: Availability vs Consistency

Use Cases:

  • High availability required
  • Can tolerate temporary inconsistency
  • High write throughput needed

Replication

Replication Factor

Definition:

  • Number of copies of data stored
  • Typically 3 (for fault tolerance)
  • Can be 1-100+ depending on requirements

Choosing Replication Factor:

  • RF=1: No redundancy (not recommended for production)
  • RF=3: Standard for single datacenter
  • RF=5: Higher durability, multi-datacenter

Example:

CREATE KEYSPACE my_keyspace
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,
    'datacenter2': 3
};

Replication Strategies

1. SimpleStrategy

  • For single datacenter
  • Places replicas sequentially on ring
  • Not recommended for production

2. NetworkTopologyStrategy

  • For multiple datacenters
  • Places replicas based on network topology
  • Recommended for production
  • Considers racks and datacenters

Example:

CREATE KEYSPACE my_keyspace
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,  -- 3 replicas in DC1
    'datacenter2': 2   -- 2 replicas in DC2
};

Replica Placement

Strategy:

  • First replica: Node determined by partitioner
  • Additional replicas: Next nodes on ring
  • NetworkTopologyStrategy: Considers racks/datacenters

Fault Tolerance:

  • RF=3: Can lose 2 nodes without data loss
  • Replicas on different racks/datacenters
  • Automatic repair on node recovery

Write Path

Write Process

Steps:

  1. Client sends write request
  2. Coordinator node receives request
  3. Coordinator determines replica nodes
  4. Write to commit log (durability)
  5. Write to memtable (in-memory)
  6. Replicate to other replicas (based on consistency level)
  7. Return success to client
  8. Memtable flushed to SSTable (when full)

Write Components:

1. Commit Log

  • Append-only log for durability
  • All writes logged before acknowledgment
  • Survives crashes
  • Replayed on recovery

2. Memtable

  • In-memory structure (sorted)
  • Fast writes
  • Flushed to disk when full
  • One memtable per table

3. SSTable (Sorted String Table)

  • Immutable data files on disk
  • Sorted by partition key
  • Created when memtable flushed
  • Multiple SSTables per table

Write Performance:

  • Fast: Sequential writes to commit log
  • No Random I/O: Append-only writes
  • In-Memory: Memtable for speed
  • Batch Writes: Efficient batching

Read Path

Read Process

Steps:

  1. Client sends read request
  2. Coordinator node receives request
  3. Determine replica nodes (based on consistency level)
  4. Read from replicas (may read from multiple)
  5. Merge results (if different versions)
  6. Return result to client

Read Components:

1. Memtable

  • Check in-memory memtable first
  • Fastest access

2. SSTables

  • Check on-disk SSTables
  • May need to check multiple SSTables
  • Bloom filters reduce unnecessary reads

3. Bloom Filters

  • Probabilistic data structure
  • Quickly determine if key exists in SSTable
  • Reduces disk I/O
  • False positives possible (but rare)

4. Compaction

  • Merges multiple SSTables
  • Removes deleted/tombstoned data
  • Improves read performance

Read Performance:

  • Bloom Filters: Reduce unnecessary reads
  • Caching: Row cache, key cache
  • Compaction: Reduces SSTable count
  • Consistency Level: Affects latency

Compaction

What is Compaction?

Purpose:

  • Merge multiple SSTables into one
  • Remove deleted/tombstoned data
  • Improve read performance
  • Reclaim disk space

Why Needed:

  • Multiple SSTables per table (from memtable flushes)
  • Reading from multiple SSTables is slow
  • Deleted data takes space
  • Old data versions accumulate

Compaction Strategies

1. SizeTieredCompactionStrategy (STCS)

  • Merges SSTables of similar size
  • Simple, good for write-heavy workloads
  • Can create large SSTables
  • More disk space needed temporarily

2. LeveledCompactionStrategy (LCS)

  • Organizes SSTables into levels
  • Each level 10x larger than previous
  • Better read performance
  • More CPU intensive

3. TimeWindowCompactionStrategy (TWCS)

  • For time-series data
  • Compacts SSTables within time windows
  • Good for TTL-based data

Example:

CREATE TABLE sensor_data (
    sensor_id UUID,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (sensor_id, timestamp)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS',
    'compaction_window_size': 1
};

Data Modeling

Query-Driven Design

Cassandra Data Modeling Principles:

  1. Design for Queries: Model based on query patterns
  2. Denormalize: Duplicate data if needed
  3. One Table Per Query: Create separate tables for different queries
  4. Partition Key: Choose based on query patterns

Modeling Patterns

1. Single Partition Queries

-- Query: Get user by ID
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    name TEXT
);

SELECT * FROM users WHERE user_id = ?;

2. Partition + Clustering Queries

-- Query: Get user's posts, sorted by date
CREATE TABLE user_posts (
    user_id UUID,
    post_id UUID,
    title TEXT,
    content TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, created_at, post_id)
);

SELECT * FROM user_posts 
WHERE user_id = ? 
ORDER BY created_at DESC;

3. Multiple Tables for Different Queries

-- Query 1: Get posts by user
CREATE TABLE posts_by_user (
    user_id UUID,
    post_id UUID,
    title TEXT,
    PRIMARY KEY (user_id, post_id)
);

-- Query 2: Get posts by date
CREATE TABLE posts_by_date (
    date DATE,
    post_id UUID,
    user_id UUID,
    title TEXT,
    PRIMARY KEY (date, post_id)
);

-- Denormalize: Store same data in both tables

Anti-Patterns

❌ Don’t Do:

  • Low cardinality partition keys
  • Secondary indexes on high-cardinality columns
  • Queries requiring joins
  • Queries filtering by non-partition key without clustering key

Example of Bad Design:

-- Bad: Low cardinality partition key
CREATE TABLE posts (
    status TEXT,  -- Only 'published', 'draft' (2 values)
    post_id UUID,
    title TEXT,
    PRIMARY KEY (status, post_id)
);
-- Problem: All published posts on one node (hotspot)

✅ Good Design:

-- Good: High cardinality partition key
CREATE TABLE posts (
    user_id UUID,  -- Many unique values
    post_id UUID,
    title TEXT,
    PRIMARY KEY (user_id, post_id)
);

Secondary Indexes

When to Use

Secondary Indexes:

  • Query by non-partition key column
  • Limited use cases
  • Performance considerations

Limitations:

  • Performance: Can be slow (scans all nodes)
  • Cardinality: Best for low-medium cardinality
  • Not for High Cardinality: Avoid on unique columns

Example:

-- Create secondary index
CREATE INDEX ON users (email);

-- Query using index
SELECT * FROM users WHERE email = 'user@example.com';

When NOT to Use:

  • High cardinality columns (e.g., UUID, timestamps)
  • Frequently updated columns
  • Columns with many null values

Alternatives:

  • Denormalize into separate table
  • Use materialized views
  • Application-level indexing

Materialized Views

Purpose

Materialized Views:

  • Pre-computed query results
  • Automatically maintained
  • Alternative to denormalization

Example:

-- Base table
CREATE TABLE posts (
    user_id UUID,
    post_id UUID,
    title TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
);

-- Materialized view
CREATE MATERIALIZED VIEW posts_by_date AS
SELECT date, post_id, user_id, title
FROM posts
WHERE date IS NOT NULL AND post_id IS NOT NULL
PRIMARY KEY (date, post_id);

Trade-offs:

  • Automatic maintenance
  • Additional storage
  • Write overhead (updates both tables)

Performance Characteristics

Maximum Read & Write Throughput

Single Node:

  • Max Write Throughput: 10K-50K writes/sec
  • Max Read Throughput: 5K-25K reads/sec (depends on consistency level and data locality)

Cluster (Horizontal Scaling):

  • Max Write Throughput: 10K-50K writes/sec per node (linear scaling)
  • Max Read Throughput: 5K-25K reads/sec per node (linear scaling)
  • Example: 100-node cluster can handle 1M-5M writes/sec and 500K-2.5M reads/sec total

Factors Affecting Throughput:

  • Commit log disk speed (SSD recommended)
  • Memtable size and flush frequency
  • Replication factor (higher RF = more writes)
  • Consistency level (QUORUM slower than ONE)
  • Data locality (local reads faster)
  • Compaction strategy
  • Network latency between nodes
  • Number of nodes in cluster

Optimized Configuration:

  • Max Write Throughput: 50K-100K writes/sec per node (with optimized settings)
  • Max Read Throughput: 25K-50K reads/sec per node (with proper consistency level and data locality)

Write Performance

Throughput:

  • Single Node: 10K-50K writes/second
  • Cluster: Scales linearly with nodes
  • 100 Nodes: 1M-5M writes/second

Factors:

  • Commit log disk speed
  • Memtable size
  • Replication factor
  • Consistency level

Optimization:

  • Fast commit log disk (SSD)
  • Batch writes
  • Lower consistency level (if acceptable)
  • Tune memtable size

Read Performance

Latency:

  • Single Partition: < 10ms (p95)
  • Multi-Partition: Depends on consistency level
  • With Caching: < 1ms

Factors:

  • Number of SSTables
  • Bloom filter effectiveness
  • Cache hit ratio
  • Consistency level
  • Compaction strategy

Optimization:

  • Row cache for hot data
  • Key cache for partition keys
  • Reduce SSTable count (compaction)
  • Appropriate consistency level

Scalability

Linear Scalability:

  • Add nodes → increase capacity
  • No bottleneck (peer-to-peer)
  • Scales to thousands of nodes
  • Handles petabytes of data

Example:

  • 10 nodes: 100K writes/sec
  • 100 nodes: 1M writes/sec
  • 1000 nodes: 10M writes/sec

Use Cases

1. Time-Series Data

Characteristics:

  • High write volume
  • Time-ordered data
  • Rarely updated
  • Queries by time range

Example:

CREATE TABLE sensor_readings (
    sensor_id UUID,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (sensor_id, timestamp)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy'
};

Use Cases:

  • IoT sensor data
  • Metrics and monitoring
  • Financial tick data
  • Log aggregation

2. High Write Throughput

Characteristics:

  • Many writes per second
  • Simple queries
  • Event logging

Example:

CREATE TABLE user_events (
    user_id UUID,
    event_id TIMEUUID,
    event_type TEXT,
    event_data TEXT,
    PRIMARY KEY (user_id, event_id)
);

Use Cases:

  • Event tracking
  • Activity logs
  • Clickstream data
  • Audit logs

3. Multi-Region Applications

Characteristics:

  • Global user base
  • Low latency requirements
  • Data locality

Example:

CREATE KEYSPACE my_app
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,
    'eu-west': 3,
    'ap-south': 3
};

Use Cases:

  • Global applications
  • Content delivery
  • User data replication
  • Disaster recovery

4. Content Management

Characteristics:

  • Large amounts of content
  • Metadata storage
  • High availability

Example:

CREATE TABLE content_metadata (
    content_id UUID,
    user_id UUID,
    title TEXT,
    tags SET<TEXT>,
    metadata MAP<TEXT, TEXT>,
    PRIMARY KEY (content_id)
);

Use Cases:

  • Media metadata
  • Product catalogs
  • Document storage
  • User-generated content

Best Practices

1. Data Modeling

✅ Do:

  • Design tables based on query patterns
  • Use high cardinality partition keys
  • Denormalize for different queries
  • Use clustering columns for sorting

❌ Don’t:

  • Use low cardinality partition keys
  • Create too many secondary indexes
  • Try to normalize data
  • Query by non-partition key without clustering key

2. Partition Key Design

Guidelines:

  • High Cardinality: Many unique values
  • Even Distribution: Avoid hotspots
  • Query Pattern: Match your queries
  • Size: Keep partition size reasonable (< 100MB)

Example:

-- Good: High cardinality
PRIMARY KEY (user_id, post_id)

-- Bad: Low cardinality
PRIMARY KEY (status, post_id)  -- status has few values

3. Consistency Levels

Recommendations:

  • Writes: QUORUM (good balance)
  • Reads: QUORUM (for consistency) or ONE (for speed)
  • Multi-Datacenter: LOCAL_QUORUM
  • Critical Data: ALL (if acceptable latency)

4. Replication Factor

Guidelines:

  • Single Datacenter: RF=3
  • Multi-Datacenter: RF=3 per datacenter
  • High Durability: RF=5
  • Development: RF=1 (not production)

5. Compaction Strategy

Choose Based on:

  • Write-Heavy: SizeTieredCompactionStrategy
  • Read-Heavy: LeveledCompactionStrategy
  • Time-Series: TimeWindowCompactionStrategy

6. Monitoring

Key Metrics:

  • Read/Write Latency: p50, p95, p99
  • Throughput: Operations per second
  • Compaction: Pending tasks, duration
  • Node Health: Disk usage, CPU, memory
  • Repair: Last repair time, status

7. Maintenance

Regular Tasks:

  • Repair: Run nodetool repair regularly
  • Compaction: Monitor compaction status
  • Cleanup: Remove old snapshots
  • Backup: Regular backups

Configuration

Key Configuration Parameters

cassandra.yaml:

# Cluster name
cluster_name: 'MyCluster'

# Data directories
data_file_directories:
  - /var/lib/cassandra/data

# Commit log directory
commitlog_directory: /var/lib/cassandra/commitlog

# Seed nodes (for node discovery)
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "192.168.1.1,192.168.1.2"

# RPC address (client connections)
rpc_address: 0.0.0.0
rpc_port: 9042

# Native transport (CQL)
native_transport_port: 9042

# Endpoint snitch (network topology)
endpoint_snitch: GossipingPropertyFileSnitch

# Partitioner (how data is distributed)
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

# Compaction
compaction_throughput_mb_per_sec: 16

# Memtable settings
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048

JVM Settings

Heap Size:

  • Recommended: 8GB-32GB
  • Don’t exceed 32GB (GC issues)
  • Leave memory for OS page cache

GC Settings:

JVM_OPTS="$JVM_OPTS -Xms8G"
JVM_OPTS="$JVM_OPTS -Xmx8G"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"

Comparison with Other Databases

Cassandra vs MongoDB

Aspect Cassandra MongoDB
Data Model Wide-column Document
Scaling Horizontal (easy) Horizontal (sharding)
Consistency Tunable Strong (replica sets)
Write Performance Very High High
Query Language CQL MongoDB Query Language
Use Cases Time-series, high writes General purpose, flexible

Cassandra vs DynamoDB

Aspect Cassandra DynamoDB
Deployment Self-hosted Managed
Cost Lower (self-hosted) Higher (pay-per-use)
Scaling Manual Automatic
Multi-Region Built-in Built-in
Consistency Tunable Tunable
Use Cases Large scale, cost-sensitive Managed service needed

Cassandra vs PostgreSQL

Aspect Cassandra PostgreSQL
Data Model NoSQL Relational
Scaling Horizontal Vertical + sharding
Consistency Eventual (tunable) Strong (ACID)
Write Throughput Very High High
Complex Queries Limited Excellent (SQL)
Use Cases High writes, simple queries Complex queries, transactions

Limitations & Considerations

Limitations

  1. No Joins
    • Cannot join tables
    • Must denormalize or application-level joins
  2. No Transactions
    • No ACID transactions across partitions
    • Lightweight transactions (limited)
  3. Secondary Indexes
    • Limited performance
    • Not for high cardinality
  4. Deletes
    • Creates tombstones
    • Must wait for compaction to reclaim space
  5. Aggregations
    • Limited aggregation support
    • May need application-level aggregation

When NOT to Use Cassandra

❌ Not Suitable For:

  • Complex relational queries
  • ACID transactions across partitions
  • Frequent updates to same rows
  • Small datasets (< 100GB)
  • Applications requiring strong consistency everywhere

✅ Suitable For:

  • High write throughput
  • Time-series data
  • Simple query patterns
  • Multi-region requirements
  • Large scale (petabytes)

Real-World Examples

Netflix

Use Case:

  • User viewing history
  • Recommendations
  • Content metadata

Scale:

  • Billions of writes per day
  • Petabytes of data
  • Multi-region deployment

Instagram

Use Case:

  • User activity feeds
  • Direct messages
  • Media metadata

Scale:

  • Millions of users
  • High write throughput
  • Global distribution

Apple

Use Case:

  • iCloud metadata
  • Device synchronization
  • Service logs

Scale:

  • Billions of devices
  • High availability requirements
  • Multi-region

Additional Resources

Video Tutorials

  • Cassandra Tutorial Video: Comprehensive video tutorial covering Cassandra concepts, architecture, and practical examples

Official Documentation

Learning Resources

  • Cassandra University (free online courses)
  • DataStax Academy (free courses)
  • Apache Cassandra GitHub Repository

What Interviewers Look For

Cassandra Knowledge & Application

  1. Data Modeling Skills
    • Partition key design
    • Clustering columns
    • Denormalization strategies
    • Red Flags: Poor key design, hot partitions, inefficient queries
  2. Consistency Understanding
    • Tunable consistency
    • QUORUM, ALL, ONE
    • Red Flags: Wrong consistency, no understanding
  3. Replication Strategy
    • Replication factor
    • Multi-datacenter
    • Red Flags: No replication, single region, data loss

System Design Skills

  1. When to Use Cassandra
    • High write throughput
    • Time-series data
    • Multi-region
    • Red Flags: Wrong use case, low writes, can’t justify
  2. Scalability Design
    • Horizontal scaling
    • Partition strategy
    • Red Flags: Vertical scaling, hot partitions, bottlenecks
  3. Query Design
    • Query-first modeling
    • Denormalization
    • Red Flags: Normalized design, query issues, poor modeling

Problem-Solving Approach

  1. Trade-off Analysis
    • Consistency vs availability
    • Write vs read optimization
    • Red Flags: No trade-offs, dogmatic choices
  2. Edge Cases
    • Hot partitions
    • Compaction issues
    • Node failures
    • Red Flags: Ignoring edge cases, no handling
  3. Data Modeling
    • Denormalization
    • Materialized views
    • Red Flags: Normalized design, query issues, poor modeling

Communication Skills

  1. Cassandra Explanation
    • Can explain Cassandra architecture
    • Understands data modeling
    • Red Flags: No understanding, vague explanations
  2. Decision Justification
    • Explains why Cassandra
    • Discusses alternatives
    • Red Flags: No justification, no alternatives

Meta-Specific Focus

  1. Distributed Database Expertise
    • Deep Cassandra knowledge
    • Data modeling skills
    • Key: Show distributed database expertise
  2. High-Throughput Design
    • Write optimization
    • Scale thinking
    • Key: Demonstrate high-throughput expertise

Conclusion

Apache Cassandra is a powerful distributed database designed for:

Key Strengths:

  • Massive Scalability: Handle petabytes across thousands of nodes
  • High Availability: No single point of failure
  • Write Performance: Optimized for high write throughput
  • Multi-Region: Built-in support for global deployments
  • Fault Tolerance: Continues operating during failures

Best Use Cases:

  • Time-series data (IoT, metrics)
  • High write throughput applications
  • Event logging and tracking
  • Multi-region applications
  • Content management systems

Key Takeaways:

  1. Design for Queries: Model data based on query patterns
  2. Partition Key: Choose high cardinality, even distribution
  3. Denormalize: Duplicate data for different queries
  4. Tunable Consistency: Choose consistency level per operation
  5. Monitor & Maintain: Regular repair, compaction, backups

Cassandra excels when you need to handle massive scale, high write throughput, and global distribution while maintaining high availability and fault tolerance.