Introduction

Amazon DynamoDB is a fully managed NoSQL database service provided by AWS that delivers single-digit millisecond performance at any scale. DynamoDB is a key-value and document database that provides built-in security, backup and restore, and in-memory caching for internet-scale applications. It’s designed to handle massive workloads with predictable performance and seamless scalability.

What is DynamoDB?

DynamoDB is a managed NoSQL database that provides:

  • Serverless Architecture: No servers to manage, automatic scaling
  • Key-Value Store: Simple key-value data model
  • Document Store: Support for JSON documents
  • Managed Service: Fully managed by AWS
  • Global Tables: Multi-region replication
  • On-Demand Scaling: Automatic scaling based on traffic

Why DynamoDB?

Key Advantages:

  • Fully Managed: No server management, automatic backups
  • Scalability: Handles millions of requests per second
  • Performance: Single-digit millisecond latency
  • Durability: Built-in replication and backups
  • Security: Encryption at rest and in transit
  • Pay-per-Use: Pay only for what you use
  • Global Tables: Multi-region replication for low latency

Common Use Cases:

  • Mobile and web applications
  • Gaming applications
  • IoT applications
  • Real-time bidding
  • Session management
  • User profiles and preferences
  • Shopping carts
  • Leaderboards

Architecture

High-Level Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client    │────▶│   Client    │────▶│   Client    │
│ Application │     │ Application │     │ Application │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                    │                    │
       └────────────────────┴────────────────────┘
                            │
                            │ AWS SDK / API
                            │
                            ▼
              ┌─────────────────────────┐
              │   Amazon DynamoDB       │
              │   (Managed Service)      │
              │                         │
              │  ┌──────────┐           │
              │  │ Request  │           │
              │  │ Router   │           │
              │  └────┬─────┘           │
              │       │                 │
              │  ┌────┴─────┐           │
              │  │ Partitions│           │
              │  │ (Shards)  │           │
              │  └──────────┘           │
              │                         │
              │  ┌───────────────────┐  │
              │  │ Storage Nodes     │  │
              │  │ (Multi-AZ)        │  │
              │  └───────────────────┘  │
              └─────────────────────────┘

Explanation:

  • Client Applications: Applications that use DynamoDB to store and retrieve data (e.g., web applications, mobile backends, microservices).
  • Amazon DynamoDB: Fully managed NoSQL database service provided by AWS. No servers to manage.
  • Request Router: Routes requests to the appropriate partition based on the partition key.
  • Partitions (Shards): Logical divisions of data. DynamoDB automatically partitions data across multiple partitions for scalability.
  • Storage Nodes (Multi-AZ): Physical storage nodes distributed across multiple Availability Zones for high availability and durability.

Core Architecture

DynamoDB Architecture:

┌─────────────────────────────────────────┐
│         Application Layer               │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│         DynamoDB Service                │
│  ┌──────────────────────────────────┐   │
│  │     Request Router               │   │
│  └──────────────┬───────────────────┘   │
│                 │                        │
│  ┌──────────────▼───────────────────┐   │
│  │     Partition Management         │   │
│  └──────────────┬───────────────────┘   │
│                 │                        │
│  ┌──────────────▼───────────────────┐   │
│  │     Storage Nodes                │   │
│  │  ┌──────┐  ┌──────┐  ┌──────┐   │   │
│  │  │Node1 │  │Node2 │  │Node3 │   │   │
│  │  └──────┘  └──────┘  └──────┘   │   │
│  └──────────────────────────────────┘   │
└─────────────────────────────────────────┘

Key Components:

  • Request Router: Routes requests to appropriate partitions
  • Partition Management: Manages data partitioning
  • Storage Nodes: Physical storage for data
  • Replication: Automatic replication across availability zones

Partitioning

Partition Key:

  • Determines which partition stores the item
  • Hash function applied to partition key
  • Even distribution across partitions
  • Must be provided for every item

Partitioning Strategy:

Partition Key → Hash Function → Partition Number → Storage Node

Example:
UserID: "user123"
Hash("user123") → 0x3A7F → Partition 5 → Node 2

Hot Partitions:

  • Uneven access patterns can create hot partitions
  • Solution: Use composite partition keys
  • Distribute load across multiple partitions

Replication

Automatic Replication:

  • Data replicated across 3 availability zones (AZs)
  • Synchronous replication within region
  • High availability and durability
  • No manual configuration needed

Replication Model:

Primary Partition (AZ-1)
    ├── Replica 1 (AZ-2)
    └── Replica 2 (AZ-3)

Durability:

  • 99.999999999% (11 9’s) durability
  • Automatic failover to replicas
  • No data loss in case of node failure

Global Tables

Multi-Region Replication:

  • Replicate tables across multiple AWS regions
  • Active-active replication
  • Low latency for global users
  • Eventual consistency across regions

Global Table Architecture:

Region 1 (US-East-1)          Region 2 (EU-West-1)
┌──────────────┐              ┌──────────────┐
│   Table A    │◄─────────────►│   Table A    │
│  (Primary)   │   Replication │  (Replica)    │
└──────────────┘              └──────────────┘

Use Cases:

  • Global applications
  • Disaster recovery
  • Low latency requirements
  • Multi-region compliance

Data Model

Items and Attributes

Item:

  • Collection of attributes (like a row in SQL)
  • Uniquely identified by primary key
  • Up to 400 KB in size
  • Flexible schema (no fixed schema)

Attributes:

  • Key-value pairs
  • Scalar types: String, Number, Binary
  • Set types: String Set, Number Set, Binary Set
  • Document types: List, Map

Example Item:

{
  "UserId": "user123",
  "Name": "John Doe",
  "Email": "john@example.com",
  "Age": 30,
  "Tags": ["developer", "aws"],
  "Address": {
    "Street": "123 Main St",
    "City": "Seattle",
    "State": "WA"
  }
}

Primary Key

Simple Primary Key (Partition Key Only):

  • Single attribute as partition key
  • Must be unique
  • Example: UserId (partition key)

Composite Primary Key (Partition Key + Sort Key):

  • Partition key + sort key
  • Multiple items can share partition key
  • Sort key determines order within partition
  • Example: UserId (partition) + OrderId (sort)

Primary Key Design:

Simple Key:
  Partition Key: UserId
  Items: One item per UserId

Composite Key:
  Partition Key: UserId
  Sort Key: OrderId
  Items: Multiple orders per user

Data Types

Scalar Types:

  • String: UTF-8 encoded text
  • Number: Positive or negative number
  • Binary: Binary data (base64 encoded)

Set Types:

  • String Set: Set of unique strings
  • Number Set: Set of unique numbers
  • Binary Set: Set of unique binary values

Document Types:

  • List: Ordered collection of values
  • Map: Unordered collection of key-value pairs

Type Examples:

{
  "String": "Hello",
  "Number": 42,
  "Binary": "dGVzdA==",
  "StringSet": ["red", "green", "blue"],
  "NumberSet": [1, 2, 3],
  "List": [1, "two", 3.0],
  "Map": {
    "key1": "value1",
    "key2": 2
  }
}

Indexes

Global Secondary Index (GSI)

Purpose:

  • Query data using different partition key
  • Query data using different sort key
  • Access patterns different from base table

Characteristics:

  • Can have different partition key than base table
  • Can have different sort key than base table
  • Eventually consistent (or strongly consistent)
  • Separate throughput capacity

GSI Example:

Base Table:
  PK: UserId
  SK: OrderId

GSI:
  PK: OrderStatus
  SK: OrderDate
  Projected Attributes: All attributes

Use Case:

  • Query orders by status
  • Query orders by date range
  • Different access pattern than base table

Local Secondary Index (LSI)

Purpose:

  • Query data using different sort key
  • Same partition key as base table
  • Alternative sort order within partition

Characteristics:

  • Same partition key as base table
  • Different sort key than base table
  • Strongly consistent
  • Shares throughput with base table

LSI Example:

Base Table:
  PK: UserId
  SK: OrderId

LSI:
  PK: UserId (same)
  SK: OrderDate (different)

Use Case:

  • Query user orders by date
  • Query user orders by status
  • Same partition, different sort order

Index Design Best Practices

When to Use GSI:

  • Different partition key needed
  • Different access pattern
  • Can tolerate eventual consistency

When to Use LSI:

  • Same partition key
  • Different sort key
  • Need strong consistency
  • Limited to one LSI per table

Index Limitations:

  • Maximum 20 GSIs per table
  • Maximum 5 LSIs per table
  • Index size counts toward item size limit

Consistency Models

Eventual Consistency

Default Read Consistency:

  • Eventually consistent reads (default)
  • May return stale data
  • Lower cost (half the read capacity units)
  • Best for read-heavy workloads

Eventual Consistency Characteristics:

  • Data replicated across 3 AZs
  • Replication lag possible
  • May read from any replica
  • Eventually all replicas consistent

Use Cases:

  • Read-heavy workloads
  • Can tolerate stale data
  • Cost optimization
  • Non-critical reads

Strong Consistency

Strongly Consistent Reads:

  • Always returns latest data
  • Higher cost (full read capacity units)
  • May have higher latency
  • Best for critical reads

Strong Consistency Characteristics:

  • Reads from leader partition
  • Always latest data
  • Higher latency possible
  • Higher cost

Use Cases:

  • Critical reads
  • Financial transactions
  • Real-time data requirements
  • Cannot tolerate stale data

Consistency Comparison

Feature Eventual Consistency Strong Consistency
Read Cost 0.5 RCU 1 RCU
Latency Lower Higher
Data Freshness May be stale Always latest
Use Case Read-heavy, cost-sensitive Critical reads

Performance and Scaling

Maximum Read & Write Throughput

Provisioned Capacity Mode:

  • Max Read Throughput:
    • Per table: 40,000 RCU (Read Capacity Units) = 40,000 strongly consistent reads/sec (4 KB items) or 80,000 eventually consistent reads/sec
    • Can request limit increases (up to millions of RCU)
    • With DAX: Millions of reads/sec (cached reads)
  • Max Write Throughput:
    • Per table: 40,000 WCU (Write Capacity Units) = 40,000 writes/sec (1 KB items)
    • Can request limit increases (up to millions of WCU)

On-Demand Capacity Mode:

  • Max Read Throughput: No predefined limits (scales automatically)
  • Max Write Throughput: No predefined limits (scales automatically)
  • Typical Performance: Handles thousands to millions of requests/sec automatically

Global Tables (Multi-Region):

  • Max Read Throughput: 40,000 RCU per region (scales across regions)
  • Max Write Throughput: 40,000 WCU per region (scales across regions)

Factors Affecting Throughput:

  • Item size (larger items consume more capacity units)
  • Consistency level (strongly consistent = 2x RCU cost)
  • Partition key distribution (hot partitions limit throughput)
  • Network latency
  • DAX caching (for reads)
  • Auto-scaling configuration

Optimized Configuration:

  • Max Read Throughput: Millions of reads/sec (with DAX and proper partition key design)
  • Max Write Throughput: Hundreds of thousands of writes/sec (with proper partition key design and limit increases)

Capacity Modes

Provisioned Capacity:

  • Set read/write capacity units
  • Predictable performance
  • Cost-effective for steady workloads
  • Can use auto-scaling

On-Demand Capacity:

  • Automatic scaling
  • Pay per request
  • No capacity planning needed
  • Good for unpredictable workloads

Capacity Comparison:

Feature Provisioned On-Demand
Cost Lower for steady load Higher for steady load
Scaling Manual/auto-scaling Automatic
Predictability Predictable Variable
Use Case Steady workloads Unpredictable workloads

Read Capacity Units (RCU)

RCU Calculation:

  • 1 RCU = 1 strongly consistent read of 4 KB per second
  • 1 RCU = 2 eventually consistent reads of 4 KB per second
  • Larger items consume more RCUs

RCU Examples:

Item Size: 4 KB
Strongly Consistent: 1 RCU per read
Eventually Consistent: 0.5 RCU per read

Item Size: 8 KB
Strongly Consistent: 2 RCUs per read
Eventually Consistent: 1 RCU per read

Write Capacity Units (WCU)

WCU Calculation:

  • 1 WCU = 1 write of 1 KB per second
  • Larger items consume more WCUs

WCU Examples:

Item Size: 1 KB
Write: 1 WCU

Item Size: 2 KB
Write: 2 WCUs

Auto-Scaling

Provisioned Capacity Auto-Scaling:

  • Automatically adjust capacity based on traffic
  • Set target utilization (e.g., 70%)
  • Scale up/down based on metrics
  • Avoid throttling

Auto-Scaling Configuration:

Target Utilization: 70%
Scale Up: When utilization > 70%
Scale Down: When utilization < 70%
Cooldown: 60 seconds

Performance Optimization

1. Partition Key Design:

  • Distribute load evenly
  • Avoid hot partitions
  • Use composite keys when needed

2. Index Optimization:

  • Use GSI for different access patterns
  • Use LSI for different sort orders
  • Project only needed attributes

3. Batch Operations:

  • Use BatchGetItem (up to 100 items)
  • Use BatchWriteItem (up to 25 items)
  • Reduce round trips

4. Caching:

  • Use DAX (DynamoDB Accelerator)
  • In-memory caching
  • Microsecond latency
  • Reduces DynamoDB costs

Advanced Features

DynamoDB Streams

Purpose:

  • Capture item-level changes
  • Time-ordered sequence of changes
  • Enable event-driven architectures
  • Integrate with Lambda, Kinesis

Stream Record Types:

  • INSERT: New item created
  • MODIFY: Item updated
  • REMOVE: Item deleted

Use Cases:

  • Real-time analytics
  • Data replication
  • Audit logging
  • Trigger Lambda functions

Stream Example:

{
  "eventID": "1",
  "eventName": "INSERT",
  "dynamodb": {
    "Keys": {
      "UserId": {"S": "user123"}
    },
    "NewImage": {
      "UserId": {"S": "user123"},
      "Name": {"S": "John Doe"}
    }
  }
}

DynamoDB Accelerator (DAX)

Purpose:

  • In-memory caching layer
  • Microsecond latency
  • Fully managed
  • Compatible with DynamoDB API

DAX Architecture:

Application → DAX Cluster → DynamoDB
            (Cache Hit)    (Cache Miss)

Benefits:

  • 10x faster reads
  • Reduces DynamoDB costs
  • Automatic cache management
  • No code changes needed

Use Cases:

  • Read-heavy workloads
  • Low latency requirements
  • Frequently accessed data
  • Gaming leaderboards

Transactions

ACID Transactions:

  • All-or-nothing operations
  • Up to 25 items per transaction
  • Strong consistency
  • Atomic operations

Transaction Operations:

  • TransactWriteItems: Write multiple items atomically
  • TransactGetItems: Read multiple items atomically

Transaction Example:

# Transfer money between accounts
dynamodb.transact_write_items(
    TransactItems=[
        {
            'Update': {
                'TableName': 'Accounts',
                'Key': {'AccountId': 'account1'},
                'UpdateExpression': 'ADD Balance :amount',
                'ExpressionAttributeValues': {':amount': -100}
            }
        },
        {
            'Update': {
                'TableName': 'Accounts',
                'Key': {'AccountId': 'account2'},
                'UpdateExpression': 'ADD Balance :amount',
                'ExpressionAttributeValues': {':amount': 100}
            }
        }
    ]
)

Time to Live (TTL)

Purpose:

  • Automatically delete expired items
  • No additional cost
  • Useful for session data, logs

TTL Configuration:

  • Set TTL attribute on items
  • Unix timestamp (seconds since epoch)
  • DynamoDB automatically deletes expired items
  • Deletion happens within 48 hours

TTL Example:

{
  "SessionId": "session123",
  "UserId": "user123",
  "TTL": 1733788800  // Expires on Dec 10, 2024
}

Use Cases:

  • Session management
  • Temporary data
  • Log retention
  • Cache invalidation

Data Modeling

Access Patterns

Design Process:

  1. Identify access patterns
  2. Design primary key
  3. Design indexes (GSI/LSI)
  4. Optimize for queries

Common Access Patterns:

  • Get item by ID
  • Query items by partition key
  • Query items by partition key + sort key range
  • Query items by different attribute

Single Table Design

Benefits:

  • Fewer round trips
  • Better performance
  • Lower cost
  • Atomic operations across entities

Challenges:

  • Complex data model
  • Harder to understand
  • Requires careful planning

Example:

Table: ApplicationData
PK: EntityType#EntityId
SK: Attribute#Value

Items:
  User#user123 | Profile#Info → User data
  User#user123 | Order#order1 → Order data
  Order#order1 | Item#item1 → Order item

Multi-Table Design

Benefits:

  • Simpler data model
  • Easier to understand
  • Clear separation of concerns

Challenges:

  • More tables to manage
  • More round trips
  • Higher cost

Example:

Table: Users
  PK: UserId

Table: Orders
  PK: UserId
  SK: OrderId

Table: OrderItems
  PK: OrderId
  SK: ItemId

Design Patterns

1. Adjacency List Pattern:

  • Store relationships in same table
  • Use sort key for relationships
  • Query related items efficiently

2. Materialized Aggregates:

  • Pre-compute aggregations
  • Store in separate items
  • Update on writes

3. Sparse Indexes:

  • GSI with selective attributes
  • Only items with attribute in index
  • Efficient for filtering

Use Cases

1. Mobile and Web Applications

User Profiles:

  • Store user data
  • Fast lookups by user ID
  • Flexible schema for user attributes

Session Management:

  • Store session data
  • Use TTL for expiration
  • Fast session lookups

Shopping Carts:

  • Store cart items
  • User ID as partition key
  • Product ID as sort key

2. Gaming Applications

Player Profiles:

  • Store player data
  • Fast updates
  • Global leaderboards

Leaderboards:

  • Use GSI for rankings
  • Sort by score
  • Real-time updates

Game State:

  • Store game sessions
  • Fast reads/writes
  • Low latency

3. IoT Applications

Device Data:

  • Store sensor readings
  • Time-series data
  • High write throughput

Device Management:

  • Store device metadata
  • Query by device type
  • Update device status

4. Real-Time Bidding

Ad Inventory:

  • Store ad data
  • Fast lookups
  • High throughput

Bid Tracking:

  • Store bid data
  • Real-time updates
  • Low latency

Best Practices

1. Partition Key Design

Guidelines:

  • Distribute load evenly
  • Avoid hot partitions
  • Use high cardinality values
  • Consider access patterns

Bad Example:

Partition Key: Status
Values: "active", "inactive"
Problem: Hot partition (most items "active")

Good Example:

Partition Key: UserId
Values: "user1", "user2", "user3", ...
Benefit: Even distribution

2. Sort Key Design

Guidelines:

  • Use for range queries
  • Consider sort order
  • Use composite sort keys if needed

Example:

Sort Key: OrderDate
Query: Get orders between dates

3. Index Design

Guidelines:

  • Create indexes for access patterns
  • Project only needed attributes
  • Monitor index usage
  • Remove unused indexes

GSI Best Practices:

  • Use for different partition keys
  • Consider eventual consistency
  • Monitor GSI capacity

LSI Best Practices:

  • Use for different sort keys
  • Same partition key
  • Strong consistency

4. Capacity Planning

Provisioned Capacity:

  • Monitor CloudWatch metrics
  • Use auto-scaling
  • Plan for peak loads

On-Demand Capacity:

  • Good for unpredictable workloads
  • Monitor costs
  • Consider switching to provisioned for steady loads

5. Error Handling

Throttling:

  • Handle ProvisionedThroughputExceededException
  • Implement exponential backoff
  • Use retry logic

Error Handling Example:

import time
from botocore.exceptions import ClientError

def retry_with_backoff(func, max_retries=3):
    for i in range(max_retries):
        try:
            return func()
        except ClientError as e:
            if e.response['Error']['Code'] == 'ProvisionedThroughputExceededException':
                time.sleep(2 ** i)  # Exponential backoff
            else:
                raise
    raise Exception("Max retries exceeded")

6. Security

Encryption:

  • Enable encryption at rest
  • Use AWS KMS for key management
  • Enable encryption in transit (HTTPS)

Access Control:

  • Use IAM policies
  • Principle of least privilege
  • Use VPC endpoints for private access

Best Practices:

  • Enable CloudTrail logging
  • Use IAM roles
  • Rotate access keys regularly
  • Use MFA for admin access

Limitations and Considerations

Limitations

Item Size:

  • Maximum 400 KB per item
  • Includes attribute names and values
  • Consider compression for large items

Throughput:

  • Per-partition limits
  • Hot partitions can throttle
  • Use proper partition key design

Indexes:

  • Maximum 20 GSIs per table
  • Maximum 5 LSIs per table
  • Index size counts toward item size

Queries:

  • Cannot query across partitions
  • Must provide partition key
  • Use GSI for different access patterns

Considerations

Cost:

  • Provisioned capacity: Pay for reserved capacity
  • On-demand: Pay per request
  • Storage costs
  • Index costs (GSI consumes capacity)

Consistency:

  • Eventual consistency default
  • Strong consistency costs more
  • Consider application requirements

Scalability:

  • Automatic scaling
  • No manual sharding needed
  • Consider partition key design

Comparison with Other Databases

DynamoDB vs Cassandra

Feature DynamoDB Cassandra
Managed Fully managed Self-managed
Scaling Automatic Manual
Cost Pay-per-use Infrastructure costs
Multi-region Global Tables Multi-datacenter
Consistency Tunable Tunable

DynamoDB vs MongoDB

Feature DynamoDB MongoDB
Managed Fully managed Self-managed (Atlas managed)
Data Model Key-value/Document Document
Query Language API-based Rich query language
Scaling Automatic Manual/Atlas auto-scaling
Cost Pay-per-use Infrastructure costs

DynamoDB vs RDS

Feature DynamoDB RDS
Data Model NoSQL Relational (SQL)
Scaling Automatic Manual
Consistency Eventual/Strong ACID
Query Language API-based SQL
Use Case High-scale, simple queries Complex queries, relationships

Deployment

Local Deployment (DynamoDB Local)

What is DynamoDB Local?

  • Self-contained local version of DynamoDB
  • Runs on your machine (no AWS account needed)
  • Perfect for development and testing
  • Free to use
  • Compatible with DynamoDB API

Use Cases:

  • Local development
  • Testing and CI/CD pipelines
  • Learning and experimentation
  • Offline development
  • Cost-free testing

Installation Methods

1. Docker (Recommended):

Pull Docker Image:

docker pull amazon/dynamodb-local

Run DynamoDB Local:

docker run -p 8000:8000 amazon/dynamodb-local

Run with Custom Port:

docker run -p 8001:8000 amazon/dynamodb-local

Run with Persistent Storage:

docker run -p 8000:8000 \
  -v $(pwd)/dynamodb-data:/home/dynamodblocal/data \
  amazon/dynamodb-local \
  -sharedDb

2. Java JAR File:

Download DynamoDB Local:

# Download DynamoDB Local JAR
wget https://s3-us-west-2.amazonaws.com/dynamodb-local/dynamodb_local_latest.tar.gz

# Extract
tar -xzf dynamodb_local_latest.tar.gz

# Run
java -Djava.library.path=./DynamoDBLocal_lib \
  -jar DynamoDBLocal.jar \
  -sharedDb \
  -port 8000

3. Homebrew (macOS):

brew install dynamodb-local
dynamodb-local

4. npm (Node.js):

npm install -g dynamodb-local
dynamodb-local

Configuration

Command-Line Options:

java -jar DynamoDBLocal.jar \
  -port 8000 \              # Port number (default: 8000)
  -sharedDb \               # Use single database file
  -dbPath ./data \          # Database file path
  -optimizeDbBeforeStartup  # Optimize database on startup

Environment Variables:

export AWS_ACCESS_KEY_ID=local
export AWS_SECRET_ACCESS_KEY=local
export AWS_DEFAULT_REGION=us-east-1

Connecting to DynamoDB Local

AWS CLI:

# Set endpoint
aws dynamodb list-tables \
  --endpoint-url http://localhost:8000

boto3 (Python):

import boto3

# Create DynamoDB client pointing to local instance
dynamodb = boto3.resource(
    'dynamodb',
    endpoint_url='http://localhost:8000',
    region_name='us-east-1',
    aws_access_key_id='local',
    aws_secret_access_key='local'
)

# Use normally
table = dynamodb.Table('my-table')

AWS SDK (JavaScript):

const AWS = require('aws-sdk');

const dynamodb = new AWS.Dynamodb({
    endpoint: 'http://localhost:8000',
    region: 'us-east-1',
    accessKeyId: 'local',
    secretAccessKey: 'local'
});

// Use normally
dynamodb.listTables({}, (err, data) => {
    console.log(data);
});

Java SDK:

AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
    .withEndpointConfiguration(
        new AwsClientBuilder.EndpointConfiguration(
            "http://localhost:8000", "us-east-1"))
    .build();

Creating Tables Locally

Using AWS CLI:

aws dynamodb create-table \
  --table-name Users \
  --attribute-definitions \
    AttributeName=UserId,AttributeType=S \
  --key-schema \
    AttributeName=UserId,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --endpoint-url http://localhost:8000

Using Python:

import boto3

dynamodb = boto3.resource(
    'dynamodb',
    endpoint_url='http://localhost:8000',
    region_name='us-east-1',
    aws_access_key_id='local',
    aws_secret_access_key='local'
)

table = dynamodb.create_table(
    TableName='Users',
    KeySchema=[
        {
            'AttributeName': 'UserId',
            'KeyType': 'HASH'
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'UserId',
            'AttributeType': 'S'
        }
    ],
    BillingMode='PAY_PER_REQUEST'
)

table.wait_until_exists()

Data Management

Importing Data:

# Export from AWS DynamoDB
aws dynamodb scan \
  --table-name Users \
  --output json > data.json

# Import to Local
aws dynamodb batch-write-item \
  --request-items file://data.json \
  --endpoint-url http://localhost:8000

Exporting Data:

aws dynamodb scan \
  --table-name Users \
  --endpoint-url http://localhost:8000 \
  --output json > local-data.json

Limitations of DynamoDB Local

Not Supported:

  • Global Tables
  • Streams (limited support)
  • Point-in-time recovery
  • On-demand backup
  • Some advanced features

Differences:

  • No actual network latency
  • No throttling (unless configured)
  • File-based storage (not distributed)
  • Limited to single machine

Remote Deployment (AWS Cloud)

AWS DynamoDB Service:

  • Fully managed service
  • No server management
  • Automatic scaling
  • Multi-region support
  • Production-ready

Prerequisites

1. AWS Account:

  • Create AWS account at aws.amazon.com
  • Set up billing and payment method
  • Configure IAM user/role

2. AWS CLI Installation:

# macOS
brew install awscli

# Linux
pip install awscli

# Windows
# Download from AWS website

3. AWS CLI Configuration:

aws configure
# Enter:
# - AWS Access Key ID
# - AWS Secret Access Key
# - Default region (e.g., us-east-1)
# - Default output format (json)

4. IAM Permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:*"
      ],
      "Resource": "*"
    }
  ]
}

Creating Tables in AWS

1. Using AWS Console:

Steps:

  1. Log in to AWS Console
  2. Navigate to DynamoDB service
  3. Click “Create table”
  4. Enter table name
  5. Define partition key (and sort key if needed)
  6. Choose billing mode (On-demand or Provisioned)
  7. Configure settings (encryption, tags, etc.)
  8. Click “Create table”

2. Using AWS CLI:

aws dynamodb create-table \
  --table-name Users \
  --attribute-definitions \
    AttributeName=UserId,AttributeType=S \
  --key-schema \
    AttributeName=UserId,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

3. Using CloudFormation:

Resources:
  UsersTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: Users
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: UserId
          AttributeType: S
      KeySchema:
        - AttributeName: UserId
          KeyType: HASH

4. Using Terraform:

resource "aws_dynamodb_table" "users" {
  name           = "Users"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "UserId"

  attribute {
    name = "UserId"
    type = "S"
  }
}

5. Using AWS SDK (Python):

import boto3

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

table = dynamodb.create_table(
    TableName='Users',
    KeySchema=[
        {
            'AttributeName': 'UserId',
            'KeyType': 'HASH'
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'UserId',
            'AttributeType': 'S'
        }
    ],
    BillingMode='PAY_PER_REQUEST'
)

table.wait_until_exists()

Connecting to AWS DynamoDB

Using AWS SDK (Python):

import boto3

# Default credentials (from ~/.aws/credentials)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

# Explicit credentials
dynamodb = boto3.resource(
    'dynamodb',
    region_name='us-east-1',
    aws_access_key_id='YOUR_ACCESS_KEY',
    aws_secret_access_key='YOUR_SECRET_KEY'
)

# Using IAM role (EC2/Lambda)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

Using AWS SDK (JavaScript/Node.js):

const AWS = require('aws-sdk');

// Configure region
AWS.config.update({ region: 'us-east-1' });

// Create DynamoDB client
const dynamodb = new AWS.DynamoDB.DocumentClient();

// Use normally
dynamodb.get({
    TableName: 'Users',
    Key: { UserId: 'user123' }
}, (err, data) => {
    console.log(data);
});

Using AWS SDK (Java):

AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
    .withRegion(Regions.US_EAST_1)
    .build();

Deployment Best Practices

1. Environment Configuration:

import os
import boto3

# Determine environment
ENV = os.getenv('ENVIRONMENT', 'local')

if ENV == 'local':
    dynamodb = boto3.resource(
        'dynamodb',
        endpoint_url='http://localhost:8000',
        region_name='us-east-1',
        aws_access_key_id='local',
        aws_secret_access_key='local'
    )
else:
    dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

2. Infrastructure as Code:

  • Use CloudFormation or Terraform
  • Version control your infrastructure
  • Deploy to multiple environments
  • Use parameterized templates

3. Multi-Region Deployment:

# Create table in multiple regions
aws dynamodb create-table \
  --table-name Users \
  --attribute-definitions \
    AttributeName=UserId,AttributeType=S \
  --key-schema \
    AttributeName=UserId,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

# Enable Global Tables
aws dynamodb update-table \
  --table-name Users \
  --replica-updates \
    Add={RegionName=eu-west-1} \
  --region us-east-1

4. Security Configuration:

  • Use IAM roles (not access keys)
  • Enable encryption at rest
  • Enable encryption in transit
  • Use VPC endpoints for private access
  • Enable CloudTrail logging

5. Monitoring and Alerts:

# Create CloudWatch alarm
aws cloudwatch put-metric-alarm \
  --alarm-name DynamoDB-Throttling \
  --alarm-description "Alert on DynamoDB throttling" \
  --metric-name UserErrors \
  --namespace AWS/DynamoDB \
  --statistic Sum \
  --period 300 \
  --evaluation-periods 1 \
  --threshold 1 \
  --comparison-operator GreaterThanThreshold

Migration from Local to AWS

1. Export Local Data:

aws dynamodb scan \
  --table-name Users \
  --endpoint-url http://localhost:8000 \
  --output json > local-data.json

2. Transform Data Format:

import json
import boto3

# Read local data
with open('local-data.json', 'r') as f:
    data = json.load(f)

# Transform to batch write format
items = []
for item in data['Items']:
    items.append({
        'PutRequest': {
            'Item': item
        }
    })

# Write to AWS
dynamodb = boto3.client('dynamodb', region_name='us-east-1')

# Batch write (max 25 items per batch)
for i in range(0, len(items), 25):
    batch = items[i:i+25]
    dynamodb.batch_write_item(
        RequestItems={
            'Users': batch
        }
    )

3. Verify Migration:

# Compare item counts
aws dynamodb describe-table \
  --table-name Users \
  --region us-east-1 \
  --query 'Table.ItemCount'

Deployment Comparison

Feature Local (DynamoDB Local) Remote (AWS)
Cost Free Pay-per-use
Setup Easy (Docker/JAR) AWS account required
Scalability Single machine Unlimited
Features Limited Full feature set
Use Case Development/Testing Production
Network Localhost Internet
Latency Very low Network dependent
Backup Manual Automatic
Monitoring Limited CloudWatch

Choosing Deployment Method

Use Local When:

  • Developing locally
  • Running tests
  • Learning DynamoDB
  • Offline development
  • Cost-sensitive testing

Use AWS When:

  • Production deployment
  • Need full feature set
  • Require scalability
  • Need multi-region
  • Production workloads

Additional Resources

Official Documentation

Video Tutorials

DynamoDB Deep Dive:

  • YouTube: DynamoDB Deep Dive
  • Comprehensive overview of DynamoDB architecture, data modeling, and best practices
  • Covers partitioning, indexes, consistency models, and performance optimization

Books and Courses

  • The DynamoDB Book: Comprehensive guide to DynamoDB data modeling
  • AWS Certified Solutions Architect: Covers DynamoDB in detail
  • AWS re:Invent Sessions: Annual conference sessions on DynamoDB

Community Resources

  • DynamoDB Forum: AWS Forums
  • Stack Overflow: DynamoDB tagged questions
  • GitHub: DynamoDB examples and tools

What Interviewers Look For

DynamoDB Knowledge & Application

  1. Data Modeling Skills
    • Partition key design
    • Sort key design
    • GSI/LSI usage
    • Red Flags: Poor key design, hot partitions, inefficient queries
  2. Access Pattern Understanding
    • Query vs scan
    • Index selection
    • Red Flags: Wrong access patterns, scans everywhere, poor performance
  3. Consistency Model
    • Strong vs eventual consistency
    • When to use each
    • Red Flags: Wrong consistency, no understanding

System Design Skills

  1. When to Use DynamoDB
    • High-scale applications
    • Simple access patterns
    • Cloud-native apps
    • Red Flags: Wrong use case, complex queries, can’t justify
  2. Scalability Design
    • Automatic scaling
    • Partition key design
    • Red Flags: Manual scaling, hot partitions, bottlenecks
  3. Cost Optimization
    • On-demand vs provisioned
    • Index optimization
    • Red Flags: No optimization, high costs, inefficient

Problem-Solving Approach

  1. Trade-off Analysis
    • Cost vs performance
    • Consistency vs availability
    • Red Flags: No trade-offs, dogmatic choices
  2. Edge Cases
    • Hot partitions
    • Throttling
    • Item size limits
    • Red Flags: Ignoring edge cases, no handling
  3. Data Modeling
    • Denormalization
    • Query-first design
    • Red Flags: Normalized design, query issues, poor modeling

Communication Skills

  1. DynamoDB Explanation
    • Can explain DynamoDB features
    • Understands data modeling
    • Red Flags: No understanding, vague explanations
  2. Decision Justification
    • Explains why DynamoDB
    • Discusses alternatives
    • Red Flags: No justification, no alternatives

Meta-Specific Focus

  1. NoSQL Expertise
    • Deep DynamoDB knowledge
    • Data modeling skills
    • Key: Show NoSQL expertise
  2. Cloud-Native Design
    • Managed services understanding
    • Serverless architecture
    • Key: Demonstrate cloud-native thinking

Conclusion

Amazon DynamoDB is a powerful, fully managed NoSQL database that provides predictable performance at any scale. Its serverless architecture, automatic scaling, and built-in features make it ideal for modern cloud-native applications.

Key Takeaways:

  1. Fully Managed: No server management, automatic scaling, built-in backups
  2. Performance: Single-digit millisecond latency, handles millions of requests
  3. Scalability: Automatic scaling, no manual sharding needed
  4. Flexibility: Key-value and document database, flexible schema
  5. Global: Multi-region replication with Global Tables
  6. Cost-Effective: Pay-per-use pricing, no upfront costs

When to Use DynamoDB:

  • High-scale applications
  • Predictable performance requirements
  • Simple data access patterns
  • Need for automatic scaling
  • Cloud-native applications

When Not to Use DynamoDB:

  • Complex queries across multiple entities
  • Need for complex joins
  • Very large items (>400 KB)
  • Cost-sensitive for steady workloads (consider provisioned capacity)

DynamoDB is an excellent choice for applications that need high performance, automatic scaling, and minimal operational overhead. With proper data modeling and index design, DynamoDB can handle massive workloads while maintaining low latency and high availability.