Introduction
Amazon DynamoDB is a fully managed NoSQL database service provided by AWS that delivers single-digit millisecond performance at any scale. DynamoDB is a key-value and document database that provides built-in security, backup and restore, and in-memory caching for internet-scale applications. It’s designed to handle massive workloads with predictable performance and seamless scalability.
What is DynamoDB?
DynamoDB is a managed NoSQL database that provides:
- Serverless Architecture: No servers to manage, automatic scaling
- Key-Value Store: Simple key-value data model
- Document Store: Support for JSON documents
- Managed Service: Fully managed by AWS
- Global Tables: Multi-region replication
- On-Demand Scaling: Automatic scaling based on traffic
Why DynamoDB?
Key Advantages:
- Fully Managed: No server management, automatic backups
- Scalability: Handles millions of requests per second
- Performance: Single-digit millisecond latency
- Durability: Built-in replication and backups
- Security: Encryption at rest and in transit
- Pay-per-Use: Pay only for what you use
- Global Tables: Multi-region replication for low latency
Common Use Cases:
- Mobile and web applications
- Gaming applications
- IoT applications
- Real-time bidding
- Session management
- User profiles and preferences
- Shopping carts
- Leaderboards
Architecture
High-Level Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────▶│ Client │────▶│ Client │
│ Application │ │ Application │ │ Application │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────┴────────────────────┘
│
│ AWS SDK / API
│
▼
┌─────────────────────────┐
│ Amazon DynamoDB │
│ (Managed Service) │
│ │
│ ┌──────────┐ │
│ │ Request │ │
│ │ Router │ │
│ └────┬─────┘ │
│ │ │
│ ┌────┴─────┐ │
│ │ Partitions│ │
│ │ (Shards) │ │
│ └──────────┘ │
│ │
│ ┌───────────────────┐ │
│ │ Storage Nodes │ │
│ │ (Multi-AZ) │ │
│ └───────────────────┘ │
└─────────────────────────┘
Explanation:
- Client Applications: Applications that use DynamoDB to store and retrieve data (e.g., web applications, mobile backends, microservices).
- Amazon DynamoDB: Fully managed NoSQL database service provided by AWS. No servers to manage.
- Request Router: Routes requests to the appropriate partition based on the partition key.
- Partitions (Shards): Logical divisions of data. DynamoDB automatically partitions data across multiple partitions for scalability.
- Storage Nodes (Multi-AZ): Physical storage nodes distributed across multiple Availability Zones for high availability and durability.
Core Architecture
DynamoDB Architecture:
┌─────────────────────────────────────────┐
│ Application Layer │
└──────────────┬──────────────────────────┘
│
┌──────────────▼──────────────────────────┐
│ DynamoDB Service │
│ ┌──────────────────────────────────┐ │
│ │ Request Router │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────┐ │
│ │ Partition Management │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────┐ │
│ │ Storage Nodes │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Node1 │ │Node2 │ │Node3 │ │ │
│ │ └──────┘ └──────┘ └──────┘ │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────┘
Key Components:
- Request Router: Routes requests to appropriate partitions
- Partition Management: Manages data partitioning
- Storage Nodes: Physical storage for data
- Replication: Automatic replication across availability zones
Partitioning
Partition Key:
- Determines which partition stores the item
- Hash function applied to partition key
- Even distribution across partitions
- Must be provided for every item
Partitioning Strategy:
Partition Key → Hash Function → Partition Number → Storage Node
Example:
UserID: "user123"
Hash("user123") → 0x3A7F → Partition 5 → Node 2
Hot Partitions:
- Uneven access patterns can create hot partitions
- Solution: Use composite partition keys
- Distribute load across multiple partitions
Replication
Automatic Replication:
- Data replicated across 3 availability zones (AZs)
- Synchronous replication within region
- High availability and durability
- No manual configuration needed
Replication Model:
Primary Partition (AZ-1)
├── Replica 1 (AZ-2)
└── Replica 2 (AZ-3)
Durability:
- 99.999999999% (11 9’s) durability
- Automatic failover to replicas
- No data loss in case of node failure
Global Tables
Multi-Region Replication:
- Replicate tables across multiple AWS regions
- Active-active replication
- Low latency for global users
- Eventual consistency across regions
Global Table Architecture:
Region 1 (US-East-1) Region 2 (EU-West-1)
┌──────────────┐ ┌──────────────┐
│ Table A │◄─────────────►│ Table A │
│ (Primary) │ Replication │ (Replica) │
└──────────────┘ └──────────────┘
Use Cases:
- Global applications
- Disaster recovery
- Low latency requirements
- Multi-region compliance
Data Model
Items and Attributes
Item:
- Collection of attributes (like a row in SQL)
- Uniquely identified by primary key
- Up to 400 KB in size
- Flexible schema (no fixed schema)
Attributes:
- Key-value pairs
- Scalar types: String, Number, Binary
- Set types: String Set, Number Set, Binary Set
- Document types: List, Map
Example Item:
{
"UserId": "user123",
"Name": "John Doe",
"Email": "john@example.com",
"Age": 30,
"Tags": ["developer", "aws"],
"Address": {
"Street": "123 Main St",
"City": "Seattle",
"State": "WA"
}
}
Primary Key
Simple Primary Key (Partition Key Only):
- Single attribute as partition key
- Must be unique
- Example:
UserId(partition key)
Composite Primary Key (Partition Key + Sort Key):
- Partition key + sort key
- Multiple items can share partition key
- Sort key determines order within partition
- Example:
UserId(partition) +OrderId(sort)
Primary Key Design:
Simple Key:
Partition Key: UserId
Items: One item per UserId
Composite Key:
Partition Key: UserId
Sort Key: OrderId
Items: Multiple orders per user
Data Types
Scalar Types:
- String: UTF-8 encoded text
- Number: Positive or negative number
- Binary: Binary data (base64 encoded)
Set Types:
- String Set: Set of unique strings
- Number Set: Set of unique numbers
- Binary Set: Set of unique binary values
Document Types:
- List: Ordered collection of values
- Map: Unordered collection of key-value pairs
Type Examples:
{
"String": "Hello",
"Number": 42,
"Binary": "dGVzdA==",
"StringSet": ["red", "green", "blue"],
"NumberSet": [1, 2, 3],
"List": [1, "two", 3.0],
"Map": {
"key1": "value1",
"key2": 2
}
}
Indexes
Global Secondary Index (GSI)
Purpose:
- Query data using different partition key
- Query data using different sort key
- Access patterns different from base table
Characteristics:
- Can have different partition key than base table
- Can have different sort key than base table
- Eventually consistent (or strongly consistent)
- Separate throughput capacity
GSI Example:
Base Table:
PK: UserId
SK: OrderId
GSI:
PK: OrderStatus
SK: OrderDate
Projected Attributes: All attributes
Use Case:
- Query orders by status
- Query orders by date range
- Different access pattern than base table
Local Secondary Index (LSI)
Purpose:
- Query data using different sort key
- Same partition key as base table
- Alternative sort order within partition
Characteristics:
- Same partition key as base table
- Different sort key than base table
- Strongly consistent
- Shares throughput with base table
LSI Example:
Base Table:
PK: UserId
SK: OrderId
LSI:
PK: UserId (same)
SK: OrderDate (different)
Use Case:
- Query user orders by date
- Query user orders by status
- Same partition, different sort order
Index Design Best Practices
When to Use GSI:
- Different partition key needed
- Different access pattern
- Can tolerate eventual consistency
When to Use LSI:
- Same partition key
- Different sort key
- Need strong consistency
- Limited to one LSI per table
Index Limitations:
- Maximum 20 GSIs per table
- Maximum 5 LSIs per table
- Index size counts toward item size limit
Consistency Models
Eventual Consistency
Default Read Consistency:
- Eventually consistent reads (default)
- May return stale data
- Lower cost (half the read capacity units)
- Best for read-heavy workloads
Eventual Consistency Characteristics:
- Data replicated across 3 AZs
- Replication lag possible
- May read from any replica
- Eventually all replicas consistent
Use Cases:
- Read-heavy workloads
- Can tolerate stale data
- Cost optimization
- Non-critical reads
Strong Consistency
Strongly Consistent Reads:
- Always returns latest data
- Higher cost (full read capacity units)
- May have higher latency
- Best for critical reads
Strong Consistency Characteristics:
- Reads from leader partition
- Always latest data
- Higher latency possible
- Higher cost
Use Cases:
- Critical reads
- Financial transactions
- Real-time data requirements
- Cannot tolerate stale data
Consistency Comparison
| Feature | Eventual Consistency | Strong Consistency |
|---|---|---|
| Read Cost | 0.5 RCU | 1 RCU |
| Latency | Lower | Higher |
| Data Freshness | May be stale | Always latest |
| Use Case | Read-heavy, cost-sensitive | Critical reads |
Performance and Scaling
Maximum Read & Write Throughput
Provisioned Capacity Mode:
- Max Read Throughput:
- Per table: 40,000 RCU (Read Capacity Units) = 40,000 strongly consistent reads/sec (4 KB items) or 80,000 eventually consistent reads/sec
- Can request limit increases (up to millions of RCU)
- With DAX: Millions of reads/sec (cached reads)
- Max Write Throughput:
- Per table: 40,000 WCU (Write Capacity Units) = 40,000 writes/sec (1 KB items)
- Can request limit increases (up to millions of WCU)
On-Demand Capacity Mode:
- Max Read Throughput: No predefined limits (scales automatically)
- Max Write Throughput: No predefined limits (scales automatically)
- Typical Performance: Handles thousands to millions of requests/sec automatically
Global Tables (Multi-Region):
- Max Read Throughput: 40,000 RCU per region (scales across regions)
- Max Write Throughput: 40,000 WCU per region (scales across regions)
Factors Affecting Throughput:
- Item size (larger items consume more capacity units)
- Consistency level (strongly consistent = 2x RCU cost)
- Partition key distribution (hot partitions limit throughput)
- Network latency
- DAX caching (for reads)
- Auto-scaling configuration
Optimized Configuration:
- Max Read Throughput: Millions of reads/sec (with DAX and proper partition key design)
- Max Write Throughput: Hundreds of thousands of writes/sec (with proper partition key design and limit increases)
Capacity Modes
Provisioned Capacity:
- Set read/write capacity units
- Predictable performance
- Cost-effective for steady workloads
- Can use auto-scaling
On-Demand Capacity:
- Automatic scaling
- Pay per request
- No capacity planning needed
- Good for unpredictable workloads
Capacity Comparison:
| Feature | Provisioned | On-Demand |
|---|---|---|
| Cost | Lower for steady load | Higher for steady load |
| Scaling | Manual/auto-scaling | Automatic |
| Predictability | Predictable | Variable |
| Use Case | Steady workloads | Unpredictable workloads |
Read Capacity Units (RCU)
RCU Calculation:
- 1 RCU = 1 strongly consistent read of 4 KB per second
- 1 RCU = 2 eventually consistent reads of 4 KB per second
- Larger items consume more RCUs
RCU Examples:
Item Size: 4 KB
Strongly Consistent: 1 RCU per read
Eventually Consistent: 0.5 RCU per read
Item Size: 8 KB
Strongly Consistent: 2 RCUs per read
Eventually Consistent: 1 RCU per read
Write Capacity Units (WCU)
WCU Calculation:
- 1 WCU = 1 write of 1 KB per second
- Larger items consume more WCUs
WCU Examples:
Item Size: 1 KB
Write: 1 WCU
Item Size: 2 KB
Write: 2 WCUs
Auto-Scaling
Provisioned Capacity Auto-Scaling:
- Automatically adjust capacity based on traffic
- Set target utilization (e.g., 70%)
- Scale up/down based on metrics
- Avoid throttling
Auto-Scaling Configuration:
Target Utilization: 70%
Scale Up: When utilization > 70%
Scale Down: When utilization < 70%
Cooldown: 60 seconds
Performance Optimization
1. Partition Key Design:
- Distribute load evenly
- Avoid hot partitions
- Use composite keys when needed
2. Index Optimization:
- Use GSI for different access patterns
- Use LSI for different sort orders
- Project only needed attributes
3. Batch Operations:
- Use BatchGetItem (up to 100 items)
- Use BatchWriteItem (up to 25 items)
- Reduce round trips
4. Caching:
- Use DAX (DynamoDB Accelerator)
- In-memory caching
- Microsecond latency
- Reduces DynamoDB costs
Advanced Features
DynamoDB Streams
Purpose:
- Capture item-level changes
- Time-ordered sequence of changes
- Enable event-driven architectures
- Integrate with Lambda, Kinesis
Stream Record Types:
- INSERT: New item created
- MODIFY: Item updated
- REMOVE: Item deleted
Use Cases:
- Real-time analytics
- Data replication
- Audit logging
- Trigger Lambda functions
Stream Example:
{
"eventID": "1",
"eventName": "INSERT",
"dynamodb": {
"Keys": {
"UserId": {"S": "user123"}
},
"NewImage": {
"UserId": {"S": "user123"},
"Name": {"S": "John Doe"}
}
}
}
DynamoDB Accelerator (DAX)
Purpose:
- In-memory caching layer
- Microsecond latency
- Fully managed
- Compatible with DynamoDB API
DAX Architecture:
Application → DAX Cluster → DynamoDB
(Cache Hit) (Cache Miss)
Benefits:
- 10x faster reads
- Reduces DynamoDB costs
- Automatic cache management
- No code changes needed
Use Cases:
- Read-heavy workloads
- Low latency requirements
- Frequently accessed data
- Gaming leaderboards
Transactions
ACID Transactions:
- All-or-nothing operations
- Up to 25 items per transaction
- Strong consistency
- Atomic operations
Transaction Operations:
- TransactWriteItems: Write multiple items atomically
- TransactGetItems: Read multiple items atomically
Transaction Example:
# Transfer money between accounts
dynamodb.transact_write_items(
TransactItems=[
{
'Update': {
'TableName': 'Accounts',
'Key': {'AccountId': 'account1'},
'UpdateExpression': 'ADD Balance :amount',
'ExpressionAttributeValues': {':amount': -100}
}
},
{
'Update': {
'TableName': 'Accounts',
'Key': {'AccountId': 'account2'},
'UpdateExpression': 'ADD Balance :amount',
'ExpressionAttributeValues': {':amount': 100}
}
}
]
)
Time to Live (TTL)
Purpose:
- Automatically delete expired items
- No additional cost
- Useful for session data, logs
TTL Configuration:
- Set TTL attribute on items
- Unix timestamp (seconds since epoch)
- DynamoDB automatically deletes expired items
- Deletion happens within 48 hours
TTL Example:
{
"SessionId": "session123",
"UserId": "user123",
"TTL": 1733788800 // Expires on Dec 10, 2024
}
Use Cases:
- Session management
- Temporary data
- Log retention
- Cache invalidation
Data Modeling
Access Patterns
Design Process:
- Identify access patterns
- Design primary key
- Design indexes (GSI/LSI)
- Optimize for queries
Common Access Patterns:
- Get item by ID
- Query items by partition key
- Query items by partition key + sort key range
- Query items by different attribute
Single Table Design
Benefits:
- Fewer round trips
- Better performance
- Lower cost
- Atomic operations across entities
Challenges:
- Complex data model
- Harder to understand
- Requires careful planning
Example:
Table: ApplicationData
PK: EntityType#EntityId
SK: Attribute#Value
Items:
User#user123 | Profile#Info → User data
User#user123 | Order#order1 → Order data
Order#order1 | Item#item1 → Order item
Multi-Table Design
Benefits:
- Simpler data model
- Easier to understand
- Clear separation of concerns
Challenges:
- More tables to manage
- More round trips
- Higher cost
Example:
Table: Users
PK: UserId
Table: Orders
PK: UserId
SK: OrderId
Table: OrderItems
PK: OrderId
SK: ItemId
Design Patterns
1. Adjacency List Pattern:
- Store relationships in same table
- Use sort key for relationships
- Query related items efficiently
2. Materialized Aggregates:
- Pre-compute aggregations
- Store in separate items
- Update on writes
3. Sparse Indexes:
- GSI with selective attributes
- Only items with attribute in index
- Efficient for filtering
Use Cases
1. Mobile and Web Applications
User Profiles:
- Store user data
- Fast lookups by user ID
- Flexible schema for user attributes
Session Management:
- Store session data
- Use TTL for expiration
- Fast session lookups
Shopping Carts:
- Store cart items
- User ID as partition key
- Product ID as sort key
2. Gaming Applications
Player Profiles:
- Store player data
- Fast updates
- Global leaderboards
Leaderboards:
- Use GSI for rankings
- Sort by score
- Real-time updates
Game State:
- Store game sessions
- Fast reads/writes
- Low latency
3. IoT Applications
Device Data:
- Store sensor readings
- Time-series data
- High write throughput
Device Management:
- Store device metadata
- Query by device type
- Update device status
4. Real-Time Bidding
Ad Inventory:
- Store ad data
- Fast lookups
- High throughput
Bid Tracking:
- Store bid data
- Real-time updates
- Low latency
Best Practices
1. Partition Key Design
Guidelines:
- Distribute load evenly
- Avoid hot partitions
- Use high cardinality values
- Consider access patterns
Bad Example:
Partition Key: Status
Values: "active", "inactive"
Problem: Hot partition (most items "active")
Good Example:
Partition Key: UserId
Values: "user1", "user2", "user3", ...
Benefit: Even distribution
2. Sort Key Design
Guidelines:
- Use for range queries
- Consider sort order
- Use composite sort keys if needed
Example:
Sort Key: OrderDate
Query: Get orders between dates
3. Index Design
Guidelines:
- Create indexes for access patterns
- Project only needed attributes
- Monitor index usage
- Remove unused indexes
GSI Best Practices:
- Use for different partition keys
- Consider eventual consistency
- Monitor GSI capacity
LSI Best Practices:
- Use for different sort keys
- Same partition key
- Strong consistency
4. Capacity Planning
Provisioned Capacity:
- Monitor CloudWatch metrics
- Use auto-scaling
- Plan for peak loads
On-Demand Capacity:
- Good for unpredictable workloads
- Monitor costs
- Consider switching to provisioned for steady loads
5. Error Handling
Throttling:
- Handle ProvisionedThroughputExceededException
- Implement exponential backoff
- Use retry logic
Error Handling Example:
import time
from botocore.exceptions import ClientError
def retry_with_backoff(func, max_retries=3):
for i in range(max_retries):
try:
return func()
except ClientError as e:
if e.response['Error']['Code'] == 'ProvisionedThroughputExceededException':
time.sleep(2 ** i) # Exponential backoff
else:
raise
raise Exception("Max retries exceeded")
6. Security
Encryption:
- Enable encryption at rest
- Use AWS KMS for key management
- Enable encryption in transit (HTTPS)
Access Control:
- Use IAM policies
- Principle of least privilege
- Use VPC endpoints for private access
Best Practices:
- Enable CloudTrail logging
- Use IAM roles
- Rotate access keys regularly
- Use MFA for admin access
Limitations and Considerations
Limitations
Item Size:
- Maximum 400 KB per item
- Includes attribute names and values
- Consider compression for large items
Throughput:
- Per-partition limits
- Hot partitions can throttle
- Use proper partition key design
Indexes:
- Maximum 20 GSIs per table
- Maximum 5 LSIs per table
- Index size counts toward item size
Queries:
- Cannot query across partitions
- Must provide partition key
- Use GSI for different access patterns
Considerations
Cost:
- Provisioned capacity: Pay for reserved capacity
- On-demand: Pay per request
- Storage costs
- Index costs (GSI consumes capacity)
Consistency:
- Eventual consistency default
- Strong consistency costs more
- Consider application requirements
Scalability:
- Automatic scaling
- No manual sharding needed
- Consider partition key design
Comparison with Other Databases
DynamoDB vs Cassandra
| Feature | DynamoDB | Cassandra |
|---|---|---|
| Managed | Fully managed | Self-managed |
| Scaling | Automatic | Manual |
| Cost | Pay-per-use | Infrastructure costs |
| Multi-region | Global Tables | Multi-datacenter |
| Consistency | Tunable | Tunable |
DynamoDB vs MongoDB
| Feature | DynamoDB | MongoDB |
|---|---|---|
| Managed | Fully managed | Self-managed (Atlas managed) |
| Data Model | Key-value/Document | Document |
| Query Language | API-based | Rich query language |
| Scaling | Automatic | Manual/Atlas auto-scaling |
| Cost | Pay-per-use | Infrastructure costs |
DynamoDB vs RDS
| Feature | DynamoDB | RDS |
|---|---|---|
| Data Model | NoSQL | Relational (SQL) |
| Scaling | Automatic | Manual |
| Consistency | Eventual/Strong | ACID |
| Query Language | API-based | SQL |
| Use Case | High-scale, simple queries | Complex queries, relationships |
Deployment
Local Deployment (DynamoDB Local)
What is DynamoDB Local?
- Self-contained local version of DynamoDB
- Runs on your machine (no AWS account needed)
- Perfect for development and testing
- Free to use
- Compatible with DynamoDB API
Use Cases:
- Local development
- Testing and CI/CD pipelines
- Learning and experimentation
- Offline development
- Cost-free testing
Installation Methods
1. Docker (Recommended):
Pull Docker Image:
docker pull amazon/dynamodb-local
Run DynamoDB Local:
docker run -p 8000:8000 amazon/dynamodb-local
Run with Custom Port:
docker run -p 8001:8000 amazon/dynamodb-local
Run with Persistent Storage:
docker run -p 8000:8000 \
-v $(pwd)/dynamodb-data:/home/dynamodblocal/data \
amazon/dynamodb-local \
-sharedDb
2. Java JAR File:
Download DynamoDB Local:
# Download DynamoDB Local JAR
wget https://s3-us-west-2.amazonaws.com/dynamodb-local/dynamodb_local_latest.tar.gz
# Extract
tar -xzf dynamodb_local_latest.tar.gz
# Run
java -Djava.library.path=./DynamoDBLocal_lib \
-jar DynamoDBLocal.jar \
-sharedDb \
-port 8000
3. Homebrew (macOS):
brew install dynamodb-local
dynamodb-local
4. npm (Node.js):
npm install -g dynamodb-local
dynamodb-local
Configuration
Command-Line Options:
java -jar DynamoDBLocal.jar \
-port 8000 \ # Port number (default: 8000)
-sharedDb \ # Use single database file
-dbPath ./data \ # Database file path
-optimizeDbBeforeStartup # Optimize database on startup
Environment Variables:
export AWS_ACCESS_KEY_ID=local
export AWS_SECRET_ACCESS_KEY=local
export AWS_DEFAULT_REGION=us-east-1
Connecting to DynamoDB Local
AWS CLI:
# Set endpoint
aws dynamodb list-tables \
--endpoint-url http://localhost:8000
boto3 (Python):
import boto3
# Create DynamoDB client pointing to local instance
dynamodb = boto3.resource(
'dynamodb',
endpoint_url='http://localhost:8000',
region_name='us-east-1',
aws_access_key_id='local',
aws_secret_access_key='local'
)
# Use normally
table = dynamodb.Table('my-table')
AWS SDK (JavaScript):
const AWS = require('aws-sdk');
const dynamodb = new AWS.Dynamodb({
endpoint: 'http://localhost:8000',
region: 'us-east-1',
accessKeyId: 'local',
secretAccessKey: 'local'
});
// Use normally
dynamodb.listTables({}, (err, data) => {
console.log(data);
});
Java SDK:
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
.withEndpointConfiguration(
new AwsClientBuilder.EndpointConfiguration(
"http://localhost:8000", "us-east-1"))
.build();
Creating Tables Locally
Using AWS CLI:
aws dynamodb create-table \
--table-name Users \
--attribute-definitions \
AttributeName=UserId,AttributeType=S \
--key-schema \
AttributeName=UserId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--endpoint-url http://localhost:8000
Using Python:
import boto3
dynamodb = boto3.resource(
'dynamodb',
endpoint_url='http://localhost:8000',
region_name='us-east-1',
aws_access_key_id='local',
aws_secret_access_key='local'
)
table = dynamodb.create_table(
TableName='Users',
KeySchema=[
{
'AttributeName': 'UserId',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'UserId',
'AttributeType': 'S'
}
],
BillingMode='PAY_PER_REQUEST'
)
table.wait_until_exists()
Data Management
Importing Data:
# Export from AWS DynamoDB
aws dynamodb scan \
--table-name Users \
--output json > data.json
# Import to Local
aws dynamodb batch-write-item \
--request-items file://data.json \
--endpoint-url http://localhost:8000
Exporting Data:
aws dynamodb scan \
--table-name Users \
--endpoint-url http://localhost:8000 \
--output json > local-data.json
Limitations of DynamoDB Local
Not Supported:
- Global Tables
- Streams (limited support)
- Point-in-time recovery
- On-demand backup
- Some advanced features
Differences:
- No actual network latency
- No throttling (unless configured)
- File-based storage (not distributed)
- Limited to single machine
Remote Deployment (AWS Cloud)
AWS DynamoDB Service:
- Fully managed service
- No server management
- Automatic scaling
- Multi-region support
- Production-ready
Prerequisites
1. AWS Account:
- Create AWS account at aws.amazon.com
- Set up billing and payment method
- Configure IAM user/role
2. AWS CLI Installation:
# macOS
brew install awscli
# Linux
pip install awscli
# Windows
# Download from AWS website
3. AWS CLI Configuration:
aws configure
# Enter:
# - AWS Access Key ID
# - AWS Secret Access Key
# - Default region (e.g., us-east-1)
# - Default output format (json)
4. IAM Permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:*"
],
"Resource": "*"
}
]
}
Creating Tables in AWS
1. Using AWS Console:
Steps:
- Log in to AWS Console
- Navigate to DynamoDB service
- Click “Create table”
- Enter table name
- Define partition key (and sort key if needed)
- Choose billing mode (On-demand or Provisioned)
- Configure settings (encryption, tags, etc.)
- Click “Create table”
2. Using AWS CLI:
aws dynamodb create-table \
--table-name Users \
--attribute-definitions \
AttributeName=UserId,AttributeType=S \
--key-schema \
AttributeName=UserId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
3. Using CloudFormation:
Resources:
UsersTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: Users
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: UserId
AttributeType: S
KeySchema:
- AttributeName: UserId
KeyType: HASH
4. Using Terraform:
resource "aws_dynamodb_table" "users" {
name = "Users"
billing_mode = "PAY_PER_REQUEST"
hash_key = "UserId"
attribute {
name = "UserId"
type = "S"
}
}
5. Using AWS SDK (Python):
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.create_table(
TableName='Users',
KeySchema=[
{
'AttributeName': 'UserId',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'UserId',
'AttributeType': 'S'
}
],
BillingMode='PAY_PER_REQUEST'
)
table.wait_until_exists()
Connecting to AWS DynamoDB
Using AWS SDK (Python):
import boto3
# Default credentials (from ~/.aws/credentials)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
# Explicit credentials
dynamodb = boto3.resource(
'dynamodb',
region_name='us-east-1',
aws_access_key_id='YOUR_ACCESS_KEY',
aws_secret_access_key='YOUR_SECRET_KEY'
)
# Using IAM role (EC2/Lambda)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
Using AWS SDK (JavaScript/Node.js):
const AWS = require('aws-sdk');
// Configure region
AWS.config.update({ region: 'us-east-1' });
// Create DynamoDB client
const dynamodb = new AWS.DynamoDB.DocumentClient();
// Use normally
dynamodb.get({
TableName: 'Users',
Key: { UserId: 'user123' }
}, (err, data) => {
console.log(data);
});
Using AWS SDK (Java):
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
.withRegion(Regions.US_EAST_1)
.build();
Deployment Best Practices
1. Environment Configuration:
import os
import boto3
# Determine environment
ENV = os.getenv('ENVIRONMENT', 'local')
if ENV == 'local':
dynamodb = boto3.resource(
'dynamodb',
endpoint_url='http://localhost:8000',
region_name='us-east-1',
aws_access_key_id='local',
aws_secret_access_key='local'
)
else:
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
2. Infrastructure as Code:
- Use CloudFormation or Terraform
- Version control your infrastructure
- Deploy to multiple environments
- Use parameterized templates
3. Multi-Region Deployment:
# Create table in multiple regions
aws dynamodb create-table \
--table-name Users \
--attribute-definitions \
AttributeName=UserId,AttributeType=S \
--key-schema \
AttributeName=UserId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
# Enable Global Tables
aws dynamodb update-table \
--table-name Users \
--replica-updates \
Add={RegionName=eu-west-1} \
--region us-east-1
4. Security Configuration:
- Use IAM roles (not access keys)
- Enable encryption at rest
- Enable encryption in transit
- Use VPC endpoints for private access
- Enable CloudTrail logging
5. Monitoring and Alerts:
# Create CloudWatch alarm
aws cloudwatch put-metric-alarm \
--alarm-name DynamoDB-Throttling \
--alarm-description "Alert on DynamoDB throttling" \
--metric-name UserErrors \
--namespace AWS/DynamoDB \
--statistic Sum \
--period 300 \
--evaluation-periods 1 \
--threshold 1 \
--comparison-operator GreaterThanThreshold
Migration from Local to AWS
1. Export Local Data:
aws dynamodb scan \
--table-name Users \
--endpoint-url http://localhost:8000 \
--output json > local-data.json
2. Transform Data Format:
import json
import boto3
# Read local data
with open('local-data.json', 'r') as f:
data = json.load(f)
# Transform to batch write format
items = []
for item in data['Items']:
items.append({
'PutRequest': {
'Item': item
}
})
# Write to AWS
dynamodb = boto3.client('dynamodb', region_name='us-east-1')
# Batch write (max 25 items per batch)
for i in range(0, len(items), 25):
batch = items[i:i+25]
dynamodb.batch_write_item(
RequestItems={
'Users': batch
}
)
3. Verify Migration:
# Compare item counts
aws dynamodb describe-table \
--table-name Users \
--region us-east-1 \
--query 'Table.ItemCount'
Deployment Comparison
| Feature | Local (DynamoDB Local) | Remote (AWS) |
|---|---|---|
| Cost | Free | Pay-per-use |
| Setup | Easy (Docker/JAR) | AWS account required |
| Scalability | Single machine | Unlimited |
| Features | Limited | Full feature set |
| Use Case | Development/Testing | Production |
| Network | Localhost | Internet |
| Latency | Very low | Network dependent |
| Backup | Manual | Automatic |
| Monitoring | Limited | CloudWatch |
Choosing Deployment Method
Use Local When:
- Developing locally
- Running tests
- Learning DynamoDB
- Offline development
- Cost-sensitive testing
Use AWS When:
- Production deployment
- Need full feature set
- Require scalability
- Need multi-region
- Production workloads
Additional Resources
Official Documentation
- AWS DynamoDB Documentation: https://docs.aws.amazon.com/dynamodb/
- DynamoDB Best Practices: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html
- DynamoDB Data Modeling: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html#bp-data-access-patterns
Video Tutorials
DynamoDB Deep Dive:
- YouTube: DynamoDB Deep Dive
- Comprehensive overview of DynamoDB architecture, data modeling, and best practices
- Covers partitioning, indexes, consistency models, and performance optimization
Books and Courses
- The DynamoDB Book: Comprehensive guide to DynamoDB data modeling
- AWS Certified Solutions Architect: Covers DynamoDB in detail
- AWS re:Invent Sessions: Annual conference sessions on DynamoDB
Community Resources
- DynamoDB Forum: AWS Forums
- Stack Overflow: DynamoDB tagged questions
- GitHub: DynamoDB examples and tools
What Interviewers Look For
DynamoDB Knowledge & Application
- Data Modeling Skills
- Partition key design
- Sort key design
- GSI/LSI usage
- Red Flags: Poor key design, hot partitions, inefficient queries
- Access Pattern Understanding
- Query vs scan
- Index selection
- Red Flags: Wrong access patterns, scans everywhere, poor performance
- Consistency Model
- Strong vs eventual consistency
- When to use each
- Red Flags: Wrong consistency, no understanding
System Design Skills
- When to Use DynamoDB
- High-scale applications
- Simple access patterns
- Cloud-native apps
- Red Flags: Wrong use case, complex queries, can’t justify
- Scalability Design
- Automatic scaling
- Partition key design
- Red Flags: Manual scaling, hot partitions, bottlenecks
- Cost Optimization
- On-demand vs provisioned
- Index optimization
- Red Flags: No optimization, high costs, inefficient
Problem-Solving Approach
- Trade-off Analysis
- Cost vs performance
- Consistency vs availability
- Red Flags: No trade-offs, dogmatic choices
- Edge Cases
- Hot partitions
- Throttling
- Item size limits
- Red Flags: Ignoring edge cases, no handling
- Data Modeling
- Denormalization
- Query-first design
- Red Flags: Normalized design, query issues, poor modeling
Communication Skills
- DynamoDB Explanation
- Can explain DynamoDB features
- Understands data modeling
- Red Flags: No understanding, vague explanations
- Decision Justification
- Explains why DynamoDB
- Discusses alternatives
- Red Flags: No justification, no alternatives
Meta-Specific Focus
- NoSQL Expertise
- Deep DynamoDB knowledge
- Data modeling skills
- Key: Show NoSQL expertise
- Cloud-Native Design
- Managed services understanding
- Serverless architecture
- Key: Demonstrate cloud-native thinking
Conclusion
Amazon DynamoDB is a powerful, fully managed NoSQL database that provides predictable performance at any scale. Its serverless architecture, automatic scaling, and built-in features make it ideal for modern cloud-native applications.
Key Takeaways:
- Fully Managed: No server management, automatic scaling, built-in backups
- Performance: Single-digit millisecond latency, handles millions of requests
- Scalability: Automatic scaling, no manual sharding needed
- Flexibility: Key-value and document database, flexible schema
- Global: Multi-region replication with Global Tables
- Cost-Effective: Pay-per-use pricing, no upfront costs
When to Use DynamoDB:
- High-scale applications
- Predictable performance requirements
- Simple data access patterns
- Need for automatic scaling
- Cloud-native applications
When Not to Use DynamoDB:
- Complex queries across multiple entities
- Need for complex joins
- Very large items (>400 KB)
- Cost-sensitive for steady workloads (consider provisioned capacity)
DynamoDB is an excellent choice for applications that need high performance, automatic scaling, and minimal operational overhead. With proper data modeling and index design, DynamoDB can handle massive workloads while maintaining low latency and high availability.