Apache Cassandra: Comprehensive Guide to Distributed NoSQL Database

Introduction

Apache Cassandra is an open-source, distributed NoSQL database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Originally developed at Facebook and released as open-source in 2008, Cassandra has become one of the most popular distributed databases, powering systems at companies like Netflix, Apple, Instagram, and many others.

What is Cassandra?

Cassandra is a wide-column store NoSQL database that provides:

Distributed Architecture: Data distributed across multiple nodes
High Availability: No single point of failure
Linear Scalability: Add nodes to increase capacity
Tunable Consistency: Choose consistency level per operation
Multi-Datacenter Support: Built-in replication across data centers

Why Cassandra?

Key Advantages:

Massive Scalability: Handle petabytes of data across thousands of nodes
High Availability: Replication ensures data is always available
Write Performance: Optimized for high write throughput
Fault Tolerance: Continues operating even if nodes fail
No Single Point of Failure: Peer-to-peer architecture
Multi-Region: Built-in support for multiple data centers

Common Use Cases:

Time-series data (IoT, metrics, logs)
High write throughput applications
Global applications requiring multi-region support
Event logging and tracking
Real-time analytics
Content management systems

Architecture

High-Level Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client    │────▶│  Load Balancer  │────▶│   Cassandra   │
│ (Application)│     │             │     │    Cluster    │
└──────┬──────┘     └──────┬──────┘     └──────┬───────┘
       │                    │                    │
       │                    ▼                    │
       │             ┌───────────┐               │
       │             │ Coordinator │               │
       │             │    Node     │               │
       │             └──────┬──────┘               │
       │                    │                    │
       │                    ▼                    │
       │             ┌───────────┐               │
       │             │  Replica    │               │
       │             │    Nodes    │               │
       │             └───────────┘               │
       │                    │                    │
       └───────────────────────────────────────────┘

Explanation:

Client: Your application (e.g., web server, microservice) that interacts with Cassandra.
Load Balancer: Distributes client requests across Cassandra nodes. Can be a dedicated load balancer or client-side load balancing in the Cassandra driver.
Coordinator Node: Any Cassandra node can act as a coordinator. It receives the client request, determines which replica nodes should be involved based on the partition key and replication strategy, and then forwards the request to them.
Cassandra Cluster: A collection of Cassandra nodes working together. Data is distributed across these nodes.
Replica Nodes: Nodes that store copies of the data. The number of replicas is determined by the replication factor.

Detailed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Client Applications                       │
│  (Web Apps, Mobile Apps, Microservices, Data Pipelines)          │
└────────────────────────────┬──────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Cassandra Cluster                           │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Datacenter 1 (US-East)                 │  │
│  │                                                             │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Node 1     │  │   Node 2     │  │   Node 3     │    │  │
│  │  │  (Broker)    │  │  (Broker)    │  │  (Broker)    │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │ Memtable│  │  │  │ Memtable│  │  │  │ Memtable│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │CommitLog│  │  │  │CommitLog│  │  │  │CommitLog│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │SSTables│  │  │  │SSTables│  │  │  │SSTables│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │
│  │                                                             │  │
│  │  Rack 1              Rack 2              Rack 3            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                             │                                     │
│                             │ Replication                         │
│                             │                                     │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Datacenter 2 (EU-West)                 │  │
│  │                                                             │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Node 4     │  │   Node 5     │  │   Node 6     │    │  │
│  │  │  (Broker)    │  │  (Broker)    │  │  (Broker)    │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │ Memtable│  │  │  │ Memtable│  │  │  │ Memtable│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │CommitLog│  │  │  │CommitLog│  │  │  │CommitLog│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │    │  │
│  │  │  │SSTables│  │  │  │SSTables│  │  │  │SSTables│  │    │  │
│  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │    │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │
│  │                                                             │  │
│  │  Rack 1              Rack 2              Rack 3            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Token Ring (Consistent Hashing)         │  │
│  │                                                             │  │
│  │      Node1 ──► Node2 ──► Node3 ──► Node4 ──► Node5         │  │
│  │        ▲                                    │              │  │
│  │        └────────────────────────────────────┘              │  │
│  │                                                             │  │
│  │  Partition Key Hash → Determines Node Location             │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Topics/Keyspaces                        │  │
│  │                                                             │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Topic A    │  │   Topic B    │  │   Topic C    │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │ Partition 0  │  │ Partition 0  │  │ Partition 0  │    │  │
│  │  │ Partition 1  │  │ Partition 1  │  │ Partition 1  │    │  │
│  │  │ Partition 2  │  │ Partition 2  │  │ Partition 2  │    │  │
│  │  │              │  │              │  │              │    │  │
│  │  │ RF=3         │  │ RF=3         │  │ RF=3         │    │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                             │
                             │ Gossip Protocol (Node Communication)
                             │
┌─────────────────────────────────────────────────────────────────┐
│                    Cluster Management                            │
│  - Gossip Protocol: Node discovery and health monitoring        │
│  - Snitch: Network topology awareness                            │
│  - Partitioner: Data distribution (Murmur3Partitioner)          │
│  - Compaction: SSTable merging and cleanup                       │
└─────────────────────────────────────────────────────────────────┘

Key Components:

Client Applications: Connect to any node in the cluster
Cassandra Nodes: Peer-to-peer architecture, no master node
Datacenters: Logical grouping of nodes (often by geographic location)
Racks: Physical grouping within datacenter (for fault tolerance)
Token Ring: Consistent hashing for data distribution
Replication: Data replicated across multiple nodes/datacenters
Gossip Protocol: Nodes communicate to discover cluster state

Data Flow:

Write Path:
- Client sends write to coordinator node
- Coordinator determines replica nodes (based on partition key)
- Write to commit log (durability)
- Write to memtable (in-memory)
- Replicate to other replicas (based on consistency level)
- Return success to client
- Memtable flushed to SSTable (when full)
Read Path:
- Client sends read to coordinator node
- Coordinator determines replica nodes (based on consistency level)
- Read from replicas (may read from multiple)
- Merge results (if different versions)
- Return result to client

Distributed Architecture

Peer-to-Peer Model:

No master node (unlike master-slave architectures)
All nodes are equal (peer-to-peer)
Data distributed across all nodes
No single point of failure

Key Concepts:

Node: Single server in cluster
Datacenter: Group of nodes (often physical location)
Rack: Group of nodes within datacenter
Cluster: Collection of nodes
Ring: Logical representation of cluster

Data Distribution

Consistent Hashing:

Each node assigned a token (position on ring)
Data partitioned by partition key
Hash(partition_key) → determines which node stores data
Even distribution across nodes

Partitioning:

Data split into partitions
Each partition stored on multiple nodes (replication)
Partition key determines data location
Enables parallel processing

Replication

Replication Strategy:

Replication Factor (RF): Number of copies of data
Replication Strategy: How replicas are placed
- SimpleStrategy: For single datacenter
- NetworkTopologyStrategy: For multiple datacenters

Replica Placement:

Replication Factor = 3

Partition → Node A (primary)
         → Node B (replica)
         → Node C (replica)

All replicas on different nodes/racks/datacenters

Multi-Datacenter Replication:

Replicas distributed across datacenters
Local reads (read from local datacenter)
Cross-datacenter replication for disaster recovery

Data Model

Tables and Columns

Table Structure:

Similar to SQL tables but with important differences
Partition Key: Determines which node stores data
Clustering Columns: Determine sort order within partition
Regular Columns: Data columns

Example:

CREATE TABLE users (
    user_id UUID,
    email TEXT,
    first_name TEXT,
    last_name TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id)
);

Composite Primary Key:

CREATE TABLE user_posts (
    user_id UUID,
    post_id UUID,
    title TEXT,
    content TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
);
-- user_id = partition key
-- post_id = clustering column

Partition Key

Purpose:

Determines which node stores the data
All rows with same partition key stored together
Enables efficient queries

Best Practices:

High Cardinality: Many unique values
Even Distribution: Avoid hotspots
Query Pattern: Match your query patterns

Example:

-- Good: High cardinality
PRIMARY KEY (user_id, post_id)

-- Bad: Low cardinality (hotspot)
PRIMARY KEY (status, user_id)  -- status has few values

Clustering Columns

Purpose:

Determines sort order within partition
Enables range queries
Multiple clustering columns create nested sorting

Example:

CREATE TABLE sensor_data (
    sensor_id UUID,
    date DATE,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (sensor_id, date, timestamp)
);
-- Partition by sensor_id
-- Sort by date, then timestamp
-- Can query: WHERE sensor_id = ? AND date = ? AND timestamp > ?

Data Types

Basic Types:

TEXT, VARCHAR, ASCII: String types
INT, BIGINT, SMALLINT: Integer types
FLOAT, DOUBLE: Floating point
BOOLEAN: Boolean
UUID, TIMEUUID: UUID types
TIMESTAMP: Date/time
BLOB: Binary data

Collection Types:

LIST<TEXT>: Ordered list
SET<TEXT>: Unordered unique set
MAP<TEXT, TEXT>: Key-value map

Example:

CREATE TABLE user_profile (
    user_id UUID PRIMARY KEY,
    tags LIST<TEXT>,
    preferences MAP<TEXT, TEXT>,
    favorite_colors SET<TEXT>
);

Consistency Model

Tunable Consistency

Cassandra provides tunable consistency - choose consistency level per operation.

Consistency Levels:

ONE
- Read/write from/to one replica
- Fastest, least consistent
- May see stale data
QUORUM
- Read/write from/to majority of replicas
- RF=3: Read/write from 2 replicas
- Good balance of consistency and performance
ALL
- Read/write from/to all replicas
- Strongest consistency
- Slowest, may fail if any replica unavailable
LOCAL_QUORUM
- Quorum within local datacenter
- Faster than QUORUM for multi-datacenter
EACH_QUORUM
- Quorum in each datacenter
- Strong consistency across datacenters

Consistency Formula:

R + W > RF  → Strong Consistency
R + W ≤ RF  → Eventual Consistency

Where:
R = Read consistency level
W = Write consistency level
RF = Replication factor

Example:

RF = 3
R = QUORUM (2)
W = QUORUM (2)
R + W = 4 > 3 → Strong Consistency

Eventual Consistency

Default Behavior:

Writes propagate asynchronously
Reads may see stale data temporarily
Eventually all replicas converge
Trade-off: Availability vs Consistency

Use Cases:

High availability required
Can tolerate temporary inconsistency
High write throughput needed

Replication

Replication Factor

Definition:

Number of copies of data stored
Typically 3 (for fault tolerance)
Can be 1-100+ depending on requirements

Choosing Replication Factor:

RF=1: No redundancy (not recommended for production)
RF=3: Standard for single datacenter
RF=5: Higher durability, multi-datacenter

Example:

CREATE KEYSPACE my_keyspace
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,
    'datacenter2': 3
};

Replication Strategies

1. SimpleStrategy

For single datacenter
Places replicas sequentially on ring
Not recommended for production

2. NetworkTopologyStrategy

For multiple datacenters
Places replicas based on network topology
Recommended for production
Considers racks and datacenters

Example:

CREATE KEYSPACE my_keyspace
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,  -- 3 replicas in DC1
    'datacenter2': 2   -- 2 replicas in DC2
};

Replica Placement

Strategy:

First replica: Node determined by partitioner
Additional replicas: Next nodes on ring
NetworkTopologyStrategy: Considers racks/datacenters

Fault Tolerance:

RF=3: Can lose 2 nodes without data loss
Replicas on different racks/datacenters
Automatic repair on node recovery

Write Path

Write Process

Steps:

Client sends write request
Coordinator node receives request
Coordinator determines replica nodes
Write to commit log (durability)
Write to memtable (in-memory)
Replicate to other replicas (based on consistency level)
Return success to client
Memtable flushed to SSTable (when full)

Write Components:

1. Commit Log

Append-only log for durability
All writes logged before acknowledgment
Survives crashes
Replayed on recovery

2. Memtable

In-memory structure (sorted)
Fast writes
Flushed to disk when full
One memtable per table

3. SSTable (Sorted String Table)

Immutable data files on disk
Sorted by partition key
Created when memtable flushed
Multiple SSTables per table

Write Performance:

Fast: Sequential writes to commit log
No Random I/O: Append-only writes
In-Memory: Memtable for speed
Batch Writes: Efficient batching

Read Path

Read Process

Steps:

Client sends read request
Coordinator node receives request
Determine replica nodes (based on consistency level)
Read from replicas (may read from multiple)
Merge results (if different versions)
Return result to client

Read Components:

1. Memtable

Check in-memory memtable first
Fastest access

2. SSTables

Check on-disk SSTables
May need to check multiple SSTables
Bloom filters reduce unnecessary reads

3. Bloom Filters

Probabilistic data structure
Quickly determine if key exists in SSTable
Reduces disk I/O
False positives possible (but rare)

4. Compaction

Merges multiple SSTables
Removes deleted/tombstoned data
Improves read performance

Read Performance:

Bloom Filters: Reduce unnecessary reads
Caching: Row cache, key cache
Compaction: Reduces SSTable count
Consistency Level: Affects latency

Compaction

What is Compaction?

Purpose:

Merge multiple SSTables into one
Remove deleted/tombstoned data
Improve read performance
Reclaim disk space

Why Needed:

Multiple SSTables per table (from memtable flushes)
Reading from multiple SSTables is slow
Deleted data takes space
Old data versions accumulate

Compaction Strategies

1. SizeTieredCompactionStrategy (STCS)

Merges SSTables of similar size
Simple, good for write-heavy workloads
Can create large SSTables
More disk space needed temporarily

2. LeveledCompactionStrategy (LCS)

Organizes SSTables into levels
Each level 10x larger than previous
Better read performance
More CPU intensive

3. TimeWindowCompactionStrategy (TWCS)

For time-series data
Compacts SSTables within time windows
Good for TTL-based data

Example:

CREATE TABLE sensor_data (
    sensor_id UUID,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (sensor_id, timestamp)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS',
    'compaction_window_size': 1
};

Data Modeling

Query-Driven Design

Cassandra Data Modeling Principles:

Design for Queries: Model based on query patterns
Denormalize: Duplicate data if needed
One Table Per Query: Create separate tables for different queries
Partition Key: Choose based on query patterns

Modeling Patterns

1. Single Partition Queries

-- Query: Get user by ID
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    name TEXT
);

SELECT * FROM users WHERE user_id = ?;

2. Partition + Clustering Queries

-- Query: Get user's posts, sorted by date
CREATE TABLE user_posts (
    user_id UUID,
    post_id UUID,
    title TEXT,
    content TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, created_at, post_id)
);

SELECT * FROM user_posts 
WHERE user_id = ? 
ORDER BY created_at DESC;

3. Multiple Tables for Different Queries

-- Query 1: Get posts by user
CREATE TABLE posts_by_user (
    user_id UUID,
    post_id UUID,
    title TEXT,
    PRIMARY KEY (user_id, post_id)
);

-- Query 2: Get posts by date
CREATE TABLE posts_by_date (
    date DATE,
    post_id UUID,
    user_id UUID,
    title TEXT,
    PRIMARY KEY (date, post_id)
);

-- Denormalize: Store same data in both tables

Anti-Patterns

❌ Don’t Do:

Low cardinality partition keys
Secondary indexes on high-cardinality columns
Queries requiring joins
Queries filtering by non-partition key without clustering key

Example of Bad Design:

-- Bad: Low cardinality partition key
CREATE TABLE posts (
    status TEXT,  -- Only 'published', 'draft' (2 values)
    post_id UUID,
    title TEXT,
    PRIMARY KEY (status, post_id)
);
-- Problem: All published posts on one node (hotspot)

✅ Good Design:

-- Good: High cardinality partition key
CREATE TABLE posts (
    user_id UUID,  -- Many unique values
    post_id UUID,
    title TEXT,
    PRIMARY KEY (user_id, post_id)
);

Secondary Indexes

When to Use

Secondary Indexes:

Query by non-partition key column
Limited use cases
Performance considerations

Limitations:

Performance: Can be slow (scans all nodes)
Cardinality: Best for low-medium cardinality
Not for High Cardinality: Avoid on unique columns

Example:

-- Create secondary index
CREATE INDEX ON users (email);

-- Query using index
SELECT * FROM users WHERE email = 'user@example.com';

When NOT to Use:

High cardinality columns (e.g., UUID, timestamps)
Frequently updated columns
Columns with many null values

Alternatives:

Denormalize into separate table
Use materialized views
Application-level indexing

Materialized Views

Purpose

Materialized Views:

Pre-computed query results
Automatically maintained
Alternative to denormalization

Example:

-- Base table
CREATE TABLE posts (
    user_id UUID,
    post_id UUID,
    title TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
);

-- Materialized view
CREATE MATERIALIZED VIEW posts_by_date AS
SELECT date, post_id, user_id, title
FROM posts
WHERE date IS NOT NULL AND post_id IS NOT NULL
PRIMARY KEY (date, post_id);

Trade-offs:

Automatic maintenance
Additional storage
Write overhead (updates both tables)

Performance Characteristics

Maximum Read & Write Throughput

Single Node:

Max Write Throughput: 10K-50K writes/sec
Max Read Throughput: 5K-25K reads/sec (depends on consistency level and data locality)

Cluster (Horizontal Scaling):

Max Write Throughput: 10K-50K writes/sec per node (linear scaling)
Max Read Throughput: 5K-25K reads/sec per node (linear scaling)
Example: 100-node cluster can handle 1M-5M writes/sec and 500K-2.5M reads/sec total

Factors Affecting Throughput:

Commit log disk speed (SSD recommended)
Memtable size and flush frequency
Replication factor (higher RF = more writes)
Consistency level (QUORUM slower than ONE)
Data locality (local reads faster)
Compaction strategy
Network latency between nodes
Number of nodes in cluster

Optimized Configuration:

Max Write Throughput: 50K-100K writes/sec per node (with optimized settings)
Max Read Throughput: 25K-50K reads/sec per node (with proper consistency level and data locality)

Write Performance

Throughput:

Single Node: 10K-50K writes/second
Cluster: Scales linearly with nodes
100 Nodes: 1M-5M writes/second

Factors:

Commit log disk speed
Memtable size
Replication factor
Consistency level

Optimization:

Fast commit log disk (SSD)
Batch writes
Lower consistency level (if acceptable)
Tune memtable size

Read Performance

Latency:

Single Partition: < 10ms (p95)
Multi-Partition: Depends on consistency level
With Caching: < 1ms

Factors:

Number of SSTables
Bloom filter effectiveness
Cache hit ratio
Consistency level
Compaction strategy

Optimization:

Row cache for hot data
Key cache for partition keys
Reduce SSTable count (compaction)
Appropriate consistency level

Scalability

Linear Scalability:

Add nodes → increase capacity
No bottleneck (peer-to-peer)
Scales to thousands of nodes
Handles petabytes of data

Example:

10 nodes: 100K writes/sec
100 nodes: 1M writes/sec
1000 nodes: 10M writes/sec

Use Cases

1. Time-Series Data

Characteristics:

High write volume
Time-ordered data
Rarely updated
Queries by time range

Example:

CREATE TABLE sensor_readings (
    sensor_id UUID,
    timestamp TIMESTAMP,
    value DOUBLE,
    PRIMARY KEY (sensor_id, timestamp)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy'
};

Use Cases:

IoT sensor data
Metrics and monitoring
Financial tick data
Log aggregation

2. High Write Throughput

Characteristics:

Many writes per second
Simple queries
Event logging

Example:

CREATE TABLE user_events (
    user_id UUID,
    event_id TIMEUUID,
    event_type TEXT,
    event_data TEXT,
    PRIMARY KEY (user_id, event_id)
);

Use Cases:

Event tracking
Activity logs
Clickstream data
Audit logs

3. Multi-Region Applications

Characteristics:

Global user base
Low latency requirements
Data locality

Example:

CREATE KEYSPACE my_app
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,
    'eu-west': 3,
    'ap-south': 3
};

Use Cases:

Global applications
Content delivery
User data replication
Disaster recovery

4. Content Management

Characteristics:

Large amounts of content
Metadata storage
High availability

Example:

CREATE TABLE content_metadata (
    content_id UUID,
    user_id UUID,
    title TEXT,
    tags SET<TEXT>,
    metadata MAP<TEXT, TEXT>,
    PRIMARY KEY (content_id)
);

Use Cases:

Media metadata
Product catalogs
Document storage
User-generated content

Best Practices

1. Data Modeling

✅ Do:

Design tables based on query patterns
Use high cardinality partition keys
Denormalize for different queries
Use clustering columns for sorting

❌ Don’t:

Use low cardinality partition keys
Create too many secondary indexes
Try to normalize data
Query by non-partition key without clustering key

2. Partition Key Design

Guidelines:

High Cardinality: Many unique values
Even Distribution: Avoid hotspots
Query Pattern: Match your queries
Size: Keep partition size reasonable (< 100MB)

Example:

-- Good: High cardinality
PRIMARY KEY (user_id, post_id)

-- Bad: Low cardinality
PRIMARY KEY (status, post_id)  -- status has few values

3. Consistency Levels

Recommendations:

Writes: QUORUM (good balance)
Reads: QUORUM (for consistency) or ONE (for speed)
Multi-Datacenter: LOCAL_QUORUM
Critical Data: ALL (if acceptable latency)

4. Replication Factor

Guidelines:

Single Datacenter: RF=3
Multi-Datacenter: RF=3 per datacenter
High Durability: RF=5
Development: RF=1 (not production)

5. Compaction Strategy

Choose Based on:

Write-Heavy: SizeTieredCompactionStrategy
Read-Heavy: LeveledCompactionStrategy
Time-Series: TimeWindowCompactionStrategy

6. Monitoring

Key Metrics:

Read/Write Latency: p50, p95, p99
Throughput: Operations per second
Compaction: Pending tasks, duration
Node Health: Disk usage, CPU, memory
Repair: Last repair time, status

7. Maintenance

Regular Tasks:

Repair: Run nodetool repair regularly
Compaction: Monitor compaction status
Cleanup: Remove old snapshots
Backup: Regular backups

Configuration

Key Configuration Parameters

cassandra.yaml:

# Cluster name
cluster_name: 'MyCluster'

# Data directories
data_file_directories:
  - /var/lib/cassandra/data

# Commit log directory
commitlog_directory: /var/lib/cassandra/commitlog

# Seed nodes (for node discovery)
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "192.168.1.1,192.168.1.2"

# RPC address (client connections)
rpc_address: 0.0.0.0
rpc_port: 9042

# Native transport (CQL)
native_transport_port: 9042

# Endpoint snitch (network topology)
endpoint_snitch: GossipingPropertyFileSnitch

# Partitioner (how data is distributed)
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

# Compaction
compaction_throughput_mb_per_sec: 16

# Memtable settings
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048

JVM Settings

Heap Size:

Recommended: 8GB-32GB
Don’t exceed 32GB (GC issues)
Leave memory for OS page cache

GC Settings:

JVM_OPTS="$JVM_OPTS -Xms8G"
JVM_OPTS="$JVM_OPTS -Xmx8G"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"

Comparison with Other Databases

Cassandra vs MongoDB

Aspect	Cassandra	MongoDB
Data Model	Wide-column	Document
Scaling	Horizontal (easy)	Horizontal (sharding)
Consistency	Tunable	Strong (replica sets)
Write Performance	Very High	High
Query Language	CQL	MongoDB Query Language
Use Cases	Time-series, high writes	General purpose, flexible

Cassandra vs DynamoDB

Aspect	Cassandra	DynamoDB
Deployment	Self-hosted	Managed
Cost	Lower (self-hosted)	Higher (pay-per-use)
Scaling	Manual	Automatic
Multi-Region	Built-in	Built-in
Consistency	Tunable	Tunable
Use Cases	Large scale, cost-sensitive	Managed service needed

Cassandra vs PostgreSQL

Aspect	Cassandra	PostgreSQL
Data Model	NoSQL	Relational
Scaling	Horizontal	Vertical + sharding
Consistency	Eventual (tunable)	Strong (ACID)
Write Throughput	Very High	High
Complex Queries	Limited	Excellent (SQL)
Use Cases	High writes, simple queries	Complex queries, transactions

Limitations & Considerations

Limitations

No Joins
- Cannot join tables
- Must denormalize or application-level joins
No Transactions
- No ACID transactions across partitions
- Lightweight transactions (limited)
Secondary Indexes
- Limited performance
- Not for high cardinality
Deletes
- Creates tombstones
- Must wait for compaction to reclaim space
Aggregations
- Limited aggregation support
- May need application-level aggregation

When NOT to Use Cassandra

❌ Not Suitable For:

Complex relational queries
ACID transactions across partitions
Frequent updates to same rows
Small datasets (< 100GB)
Applications requiring strong consistency everywhere

✅ Suitable For:

High write throughput
Time-series data
Simple query patterns
Multi-region requirements
Large scale (petabytes)

Real-World Examples

Netflix

Use Case:

User viewing history
Recommendations
Content metadata

Scale:

Billions of writes per day
Petabytes of data
Multi-region deployment

Instagram

Use Case:

User activity feeds
Direct messages
Media metadata

Scale:

Millions of users
High write throughput
Global distribution

Apple

Use Case:

iCloud metadata
Device synchronization
Service logs

Scale:

Billions of devices
High availability requirements
Multi-region

Additional Resources

Video Tutorials

Cassandra Tutorial Video: Comprehensive video tutorial covering Cassandra concepts, architecture, and practical examples

Official Documentation

Learning Resources

Cassandra University (free online courses)
DataStax Academy (free courses)
Apache Cassandra GitHub Repository

What Interviewers Look For

Cassandra Knowledge & Application

Data Modeling Skills
- Partition key design
- Clustering columns
- Denormalization strategies
- Red Flags: Poor key design, hot partitions, inefficient queries
Consistency Understanding
- Tunable consistency
- QUORUM, ALL, ONE
- Red Flags: Wrong consistency, no understanding
Replication Strategy
- Replication factor
- Multi-datacenter
- Red Flags: No replication, single region, data loss

System Design Skills

When to Use Cassandra
- High write throughput
- Time-series data
- Multi-region
- Red Flags: Wrong use case, low writes, can’t justify
Scalability Design
- Horizontal scaling
- Partition strategy
- Red Flags: Vertical scaling, hot partitions, bottlenecks
Query Design
- Query-first modeling
- Denormalization
- Red Flags: Normalized design, query issues, poor modeling

Problem-Solving Approach

Trade-off Analysis
- Consistency vs availability
- Write vs read optimization
- Red Flags: No trade-offs, dogmatic choices
Edge Cases
- Hot partitions
- Compaction issues
- Node failures
- Red Flags: Ignoring edge cases, no handling
Data Modeling
- Denormalization
- Materialized views
- Red Flags: Normalized design, query issues, poor modeling

Communication Skills

Cassandra Explanation
- Can explain Cassandra architecture
- Understands data modeling
- Red Flags: No understanding, vague explanations
Decision Justification
- Explains why Cassandra
- Discusses alternatives
- Red Flags: No justification, no alternatives

Meta-Specific Focus

Distributed Database Expertise
- Deep Cassandra knowledge
- Data modeling skills
- Key: Show distributed database expertise
High-Throughput Design
- Write optimization
- Scale thinking
- Key: Demonstrate high-throughput expertise

Conclusion

Apache Cassandra is a powerful distributed database designed for:

Key Strengths:

Massive Scalability: Handle petabytes across thousands of nodes
High Availability: No single point of failure
Write Performance: Optimized for high write throughput
Multi-Region: Built-in support for global deployments
Fault Tolerance: Continues operating during failures

Best Use Cases:

Time-series data (IoT, metrics)
High write throughput applications
Event logging and tracking
Multi-region applications
Content management systems

Key Takeaways:

Design for Queries: Model data based on query patterns
Partition Key: Choose high cardinality, even distribution
Denormalize: Duplicate data for different queries
Tunable Consistency: Choose consistency level per operation
Monitor & Maintain: Regular repair, compaction, backups

Cassandra excels when you need to handle massive scale, high write throughput, and global distribution while maintaining high availability and fault tolerance.

Introduction

What is Cassandra?

Why Cassandra?

Architecture

High-Level Architecture

Detailed Architecture

Distributed Architecture

Data Distribution

Replication

Data Model

Tables and Columns

Partition Key

Clustering Columns

Data Types

Consistency Model

Tunable Consistency

Eventual Consistency

Replication

Replication Factor

Replication Strategies

Replica Placement

Write Path

Write Process

Read Path

Read Process

Compaction

What is Compaction?

Compaction Strategies

Data Modeling

Query-Driven Design

Modeling Patterns

Anti-Patterns

Secondary Indexes

When to Use

Materialized Views

Purpose

Performance Characteristics

Maximum Read & Write Throughput

Write Performance

Read Performance

Scalability

Use Cases

1. Time-Series Data

2. High Write Throughput

3. Multi-Region Applications

4. Content Management

Best Practices

1. Data Modeling

2. Partition Key Design

3. Consistency Levels

4. Replication Factor

5. Compaction Strategy

6. Monitoring

7. Maintenance

Configuration

Key Configuration Parameters

JVM Settings

Comparison with Other Databases

Cassandra vs MongoDB

Cassandra vs DynamoDB

Cassandra vs PostgreSQL

Limitations & Considerations

Limitations

When NOT to Use Cassandra

Real-World Examples

Netflix

Instagram

Apple

Additional Resources

Video Tutorials

Official Documentation

Learning Resources

What Interviewers Look For

Cassandra Knowledge & Application

System Design Skills

Problem-Solving Approach

Communication Skills

Meta-Specific Focus

Conclusion

Related Posts