Introduction
Smart glasses represent the next frontier in wearable technology, combining augmented reality, computer vision, and voice control to create immersive experiences. One of the most compelling use cases is automatic memory creation—users can simply say “Create a memory video of my wife and me in Paris” and the system intelligently finds, trims, and merges relevant video clips.
This post designs a scalable hybrid system architecture for smart glasses that integrates with a companion phone app and cloud services. The system supports:
- Smart Glasses: Primary device for voice-controlled capture and AR display
- Phone App: Companion app for management, viewing, and offline sync
- Cloud Services: Backend processing, storage, and AI services
The architecture handles voice-controlled media capture and intelligent memory video generation, with a focus on managing read-heavy search queries and write-heavy video ingestion/processing workloads, while supporting seamless offline/online operation.
Table of Contents
- Requirements
- Capacity Estimation
- Workload Analysis
- Core Entities
- API
- Data Flow
- Database Design
- High-Level Design
- Deep Dive
Requirements
Functional Requirements
- Voice-Controlled Media Capture (Smart Glasses)
- Users can take pictures/videos using voice commands
- “Take a picture”
- “Record a video”
- “Stop recording”
- Works offline with local storage
- Phone App Integration
- Companion app for iOS/Android
- View and manage captured media
- Create memory videos via app interface
- Offline viewing of cached content
- Sync with cloud when online
- Push notifications for completed memory videos
- Natural Language Memory Creation
- Users can request memory videos using natural language
- Example: “Create a memory video of my wife and me in Paris”
- Available on both smart glasses (voice) and phone app (text/voice)
- System finds related video clips from user’s albums
- Automatically trims and merges clips into 2-minute video
- Returns result quickly (< 2-5 seconds for cached, < 30s for new)
- Intelligent Video Search
- Search by location, people, objects, time
- Semantic search using natural language
- Face recognition and person identification
- Object and scene detection
- Works across smart glasses and phone app
- Hybrid Cloud/Offline Operation
- Smart glasses can operate offline
- Phone app syncs with cloud when online
- Automatic background sync
- Conflict resolution for offline edits
- Cloud processing for AI features
- Video Processing
- Automatic video trimming
- Clip merging and transitions
- Video enhancement and optimization
- Thumbnail generation
- Cloud-based processing with phone app preview
Non-Functional Requirements
- High Read Concurrency: Many users searching simultaneously
- High Write Throughput: Videos being uploaded and processed continuously
- Scalability: Handle millions of users and billions of video clips
- Low Latency: Memory video creation in 2-5 seconds (cached), < 30s (new)
- Offline Support: Smart glasses and phone app work offline
- Sync Reliability: Reliable sync between devices and cloud
- Durability: Videos and metadata never lost
- Availability: 99.9% uptime for cloud services
- Cost Efficiency: Optimize storage and processing costs
- Battery Efficiency: Optimize for smart glasses battery life
Capacity Estimation
Traffic Estimates
Users:
- 10 million active users
- 1 million concurrent users during peak hours
Media:
- Average user uploads 10 videos/day
- Average video size: 50MB (1080p, 30 seconds)
- Daily uploads: 100 million videos = 5TB/day
- Storage: 1.8PB/year (with 3x replication = 5.4PB)
Queries:
- 50 million memory video requests/day
- Peak: 100K requests/second
- Average query processes 10-20 video clips
Processing:
- Video processing: 2-5 seconds per memory video
- Concurrent processing: 10K videos/second
Workload Analysis
Read-Heavy Operations
| Operation | Type | Frequency | Notes |
|---|---|---|---|
| Video metadata search | Read-heavy | 50M/day | Users querying by text tags, NLP embeddings |
| Recent memories lookup | Read-heavy | 100M/day | Frequently accessed, can cache |
| User album browsing | Read-heavy | 200M/day | Paginated queries |
| Face/person search | Read-heavy | 30M/day | Vector similarity search |
Characteristics:
- High read concurrency
- Need fast search (sub-second)
- Can benefit from caching
- Requires semantic/vector search
Write-Heavy Operations
| Operation | Type | Frequency | Notes |
|---|---|---|---|
| Video upload | Write-heavy | 100M/day | Large files to blob storage |
| Metadata ingestion | Write-heavy | 100M/day | Metadata inserted on upload |
| Video trimming jobs | Write-heavy | 50M/day | Async video processing |
| Metadata updates (tags, embeddings) | Write | 200M/day | AI processing updates |
Characteristics:
- High write throughput
- Large file storage
- Async processing needed
- Batch processing for efficiency
Storage Estimates
Video Storage:
- Daily uploads: 100 million videos = 5TB/day
- Annual storage: 1.8PB/year
- With 3x replication: 5.4PB/year
Metadata Storage:
- Per video: ~10KB metadata
- 100M videos/day × 10KB = 1TB/day metadata
- Annual: ~365TB metadata
Bandwidth Estimates
Upload Bandwidth:
- 100M videos/day × 50MB = 5TB/day
- Peak: ~100GB/hour
Download Bandwidth:
- Memory video downloads: 50M/day × 20MB = 1TB/day
- Video streaming: Variable based on concurrent viewers
Core Entities
User
- Attributes: user_id, username, email, created_at, subscription_tier
- Relationships: Owns videos, has memory videos, has voice commands
Video
- Attributes: video_id, user_id, file_url, duration, size, created_at, metadata
- Relationships: Belongs to user, processed into memory videos, has tags/embeddings
Memory Video
- Attributes: memory_id, user_id, query_text, video_clips, created_at, status
- Relationships: Belongs to user, contains video clips
Video Clip
- Attributes: clip_id, video_id, start_time, end_time, duration, tags
- Relationships: Belongs to video, part of memory videos
Voice Command
- Attributes: command_id, user_id, command_text, intent, executed_at, result
- Relationships: Belongs to user, triggers actions
API
Upload Video (Smart Glasses / Phone App)
POST /api/v1/videos/upload
Authorization: Bearer {token}
Content-Type: multipart/form-data
{
"video": <binary>,
"device_type": "smart_glass|phone_app",
"device_id": "device_uuid",
"metadata": {
"duration": 30,
"location": {...},
"captured_at": "2025-11-08T10:00:00Z"
},
"sync_token": "optional_sync_token_for_offline_uploads"
}
Response: 202 Accepted
{
"video_id": "uuid",
"status": "uploading",
"upload_url": "https://s3.example.com/upload/...",
"sync_token": "sync_token_for_tracking"
}
Sync Status (Phone App)
GET /api/v1/sync/status
Authorization: Bearer {token}
Response: 200 OK
{
"pending_uploads": 5,
"pending_downloads": 2,
"last_sync": "2025-11-08T10:00:00Z",
"sync_in_progress": false
}
Trigger Sync (Phone App)
POST /api/v1/sync/trigger
Authorization: Bearer {token}
Response: 200 OK
{
"status": "sync_started",
"estimated_completion": "2025-11-08T10:05:00Z"
}
Create Memory Video
POST /api/v1/memories/create
Authorization: Bearer {token}
Content-Type: application/json
{
"query": "Create a memory video of my wife and me in Paris",
"max_duration": 120
}
Response: 202 Accepted
{
"memory_id": "uuid",
"status": "processing",
"estimated_completion": "2025-11-08T10:05:00Z"
}
Get Memory Video
GET /api/v1/memories/{memory_id}
Response: 200 OK
{
"memory_id": "uuid",
"status": "completed",
"video_url": "https://cdn.example.com/memories/uuid.mp4",
"clips_used": [...],
"created_at": "2025-11-08T10:00:00Z"
}
Search Videos
POST /api/v1/videos/search
Authorization: Bearer {token}
Content-Type: application/json
{
"query": "videos with my wife in Paris",
"filters": {
"date_range": {...},
"people": [...]
}
}
Response: 200 OK
{
"videos": [
{
"video_id": "uuid",
"thumbnail_url": "...",
"duration": 30,
"matched_clips": [...]
}
],
"total": 25
}
Data Flow
Video Upload Flow (Hybrid)
Smart Glasses (Online):
- Smart Glass captures video → Local storage (temporary)
- Device → Upload Service (chunked upload) via Bluetooth/WiFi
- Upload Service → Blob Storage (S3) - direct upload with signed URL
- Upload Service → Message Queue (Kafka) - publish video-upload event
- Message Queue → Metadata Extraction Service
- Metadata Extraction Service processes:
- Extract faces → Identify people
- Detect objects/scenes
- Generate embeddings
- Extract location/time
- Metadata Extraction Service → Metadata Database (store metadata)
- Metadata Extraction Service → Vector Database (store embeddings)
- Response returned to smart glasses
- Smart glasses → Phone App (via Bluetooth) - notification of upload
Smart Glasses (Offline):
- Smart Glass captures video → Local storage (persistent)
- Video queued for upload with sync token
- When online → Resume upload flow above
- Phone App syncs when connected
Phone App Upload:
- User selects video in phone app
- Phone App → Upload Service (chunked upload)
- Upload Service → Blob Storage (S3)
- Rest of flow same as smart glasses
- Phone App receives notification when processing complete
Sync Flow (Phone App ↔ Cloud)
- Phone App checks sync status
- Upload pending videos from phone
- Download new videos/metadata from cloud
- Resolve conflicts (last-write-wins or merge)
- Update local cache
- Notify user of sync completion
Memory Video Creation Flow
- User speaks command → Device
- Device → Voice Processing Service (NLP)
- Voice Processing Service → Memory Video Service (create request)
- Memory Video Service → Vector Search Service (search videos by query)
- Vector Search Service → Vector Database (semantic search)
- Vector Search Service → Memory Video Service (return matching clips)
- Memory Video Service → Video Processing Service (trim and merge clips)
- Video Processing Service → Blob Storage (store memory video)
- Video Processing Service → Memory Video Service (update status)
- Memory Video Service → Device (return memory video URL)
Database Design
Schema Design
Users Table:
CREATE TABLE users (
user_id VARCHAR(36) PRIMARY KEY,
username VARCHAR(255) NOT NULL,
email VARCHAR(255),
subscription_tier VARCHAR(50),
created_at TIMESTAMP,
INDEX idx_email (email)
);
Videos Table:
CREATE TABLE videos (
video_id VARCHAR(36) PRIMARY KEY,
user_id VARCHAR(36) NOT NULL,
file_url VARCHAR(512) NOT NULL,
duration INT,
size BIGINT,
metadata JSON,
created_at TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_created_at (created_at),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
Memory Videos Table:
CREATE TABLE memory_videos (
memory_id VARCHAR(36) PRIMARY KEY,
user_id VARCHAR(36) NOT NULL,
query_text TEXT NOT NULL,
video_clips JSON,
status ENUM('processing', 'completed', 'failed') DEFAULT 'processing',
video_url VARCHAR(512),
created_at TIMESTAMP,
completed_at TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_status (status),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
Video Clips Table:
CREATE TABLE video_clips (
clip_id VARCHAR(36) PRIMARY KEY,
video_id VARCHAR(36) NOT NULL,
start_time INT NOT NULL,
end_time INT NOT NULL,
duration INT,
tags JSON,
embeddings JSON,
INDEX idx_video_id (video_id),
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);
Database Sharding Strategy
Shard by User ID:
- User data, videos, and memory videos on same shard
- Enables efficient user queries
- Use consistent hashing for distribution
Vector Database:
- Use specialized vector DB (Pinecone, Weaviate, Milvus)
- Partition by user_id for isolation
- Optimize for similarity search
High-Level Design
High-Level Architecture (Hybrid)
┌─────────────────────────────────────────────────────────────┐
│ Client Layer │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Smart Glasses │◄──BT───►│ Phone App │ │
│ │ - Voice Control │ │ - Management │ │
│ │ - AR Display │ │ - Viewing │ │
│ │ - Local Storage │ │ - Offline Cache │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
└───────────┼─────────────────────────────┼───────────────────┘
│ │
│ HTTPS / WebSocket │ HTTPS / WebSocket
│ (WiFi/Cellular) │ (WiFi/Cellular)
│ │
┌───────────▼─────────────────────────────▼───────────────────┐
│ API Gateway / Load Balancer │
│ (Authentication, Rate Limiting, Routing) │
└───────────┬─────────────────────────────────────────────────┘
│
┌───────┴───────┐
│ │
┌───▼──────┐ ┌─────▼──────┐
│ Read Path│ │ Write Path │
│ (Search) │ │ (Ingestion)│
└──────────┘ └────────────┘
│ │
│ │
┌───▼──────┐ ┌─────▼──────┐
│ Cache │ │ Queue │
│ (Redis) │ │ (Kafka) │
└──────────┘ └────────────┘
│ │
┌───▼──────┐ ┌─────▼──────┐
│ Read DB │ │ Write DB │
│(Elastic- │ │ (Cassandra)│
│ search/ │ │ │
│Vector DB)│ │ │
└──────────┘ └────────────┘
│ │
└───────┬───────┘
│
┌───────▼───────┐
│ Blob Storage │
│ (S3/Azure Blob)│
└───────────────┘
│
┌───────▼───────┐
│ Sync Service │
│ (Phone App) │
└───────────────┘
Hybrid Architecture Components
- Smart Glasses
- Primary capture device
- Voice control interface
- AR display
- Local storage for offline operation
- Bluetooth connection to phone app
- Phone App
- Companion management app
- Media viewing and management
- Offline cache
- Cloud sync coordinator
- Push notifications
- Cloud Services
- Backend processing
- AI/ML services
- Storage and metadata
- Sync coordination
Architecture Principles
- CQRS Pattern: Separate read and write paths
- Event-Driven: Async processing for video operations
- Microservices: Independent scaling of components
- Caching: Multiple cache layers for performance
- Horizontal Scaling: All components scale independently
Deep Dive
Component Design
1. Smart Glasses
Responsibilities:
- Voice command capture and processing
- Media capture (photos/videos)
- AR display of memories
- Local storage for offline operation
- Bluetooth communication with phone app
Key Features:
- Voice recognition (on-device for basic commands, cloud for complex)
- Real-time video preview
- Background upload when online
- Local storage (up to 10GB for offline videos)
- Low-power operation
- Bluetooth Low Energy (BLE) for phone app connection
Technology:
- Embedded OS (custom or Android-based)
- On-device ML models (lightweight)
- Local SQLite for metadata cache
- BLE stack for phone connectivity
Offline Operation:
- Store videos locally when offline
- Queue uploads with sync tokens
- Resume uploads when online
- Basic voice commands work offline
1.1 Phone App (Companion App)
Responsibilities:
- Media viewing and management
- Memory video creation (text/voice input)
- Cloud sync coordination
- Offline cache management
- Push notification handling
Key Features:
- Full media library browsing
- Create memory videos via app
- Offline viewing of cached content
- Background sync with cloud
- Conflict resolution for offline edits
- Push notifications for completed processing
Technology:
- Native apps (iOS Swift, Android Kotlin)
- Local SQLite for offline cache
- Background sync service
- Push notification service (FCM/APNS)
Offline Support:
- Cache recent videos (up to 5GB)
- Cache metadata for offline search
- Queue operations for sync
- View cached content offline
Sync Strategy:
- Incremental sync (only changed data)
- Conflict resolution (last-write-wins)
- Background sync every 15 minutes
- Manual sync trigger
- Sync status indicators
2. API Gateway
Responsibilities:
- Request routing
- Authentication and authorization
- Rate limiting
- Request/response transformation
- Load balancing
Features:
- JWT token validation
- User quota management
- Request throttling
- API versioning
Technology:
- AWS API Gateway
- Azure API Management
- Kong / Envoy
3. Sync Service (Phone App ↔ Cloud)
Responsibilities:
- Coordinate sync between phone app and cloud
- Handle offline uploads
- Resolve conflicts
- Manage sync tokens
- Track sync status
Sync Flow:
Phone App (Offline) → Queue Operations →
When Online → Sync Service →
Upload Pending Videos →
Download New Content →
Resolve Conflicts →
Update Local Cache
Conflict Resolution:
- Last-write-wins for metadata
- Merge for tags/annotations
- User notification for conflicts
- Manual resolution option
Sync Tokens:
- Track sync state per device
- Incremental sync (only changes)
- Resume interrupted syncs
- Handle concurrent syncs
Technology:
- REST API for sync operations
- WebSocket for real-time updates
- Sync queue in phone app
- Background sync service
4. Write Path (Write-Heavy)
3.1 Video Upload Service
Flow:
Smart Glass → Upload Service → Blob Storage → Metadata Extraction → Write DB
Process:
- Receive video upload (chunked upload for large files)
- Store video in blob storage (S3/Azure Blob)
- Trigger metadata extraction
- Insert metadata into write-optimized DB
- Queue video for processing
Optimizations:
- Chunked uploads (resumable)
- Compression before upload
- Direct upload to blob storage (signed URLs)
- Async metadata extraction
3.2 Metadata Extraction Service
Extracted Metadata:
- Temporal: Timestamp, duration
- Spatial: GPS coordinates, location name
- People: Face detection, person identification
- Objects: Scene detection, object recognition
- Embeddings: Vector embeddings for semantic search
- Audio: Speech-to-text, audio features
- Video: Resolution, fps, codec
AI/ML Models:
- Face recognition (AWS Rekognition, Azure Face API)
- Object detection (YOLO, TensorFlow)
- Scene classification (CNN models)
- NLP embeddings (BERT, OpenAI embeddings)
- Speech-to-text (Whisper, Google Speech)
Technology:
- Microservice architecture
- GPU clusters for ML inference
- Batch processing for cost efficiency
- Real-time processing for recent videos
3.3 Write Database
Requirements:
- High write throughput (100M writes/day)
- Scalable and distributed
- Flexible schema for metadata
- Fast ingestion
Database Choice: Cassandra / DynamoDB
Why:
- Excellent write performance
- Horizontal scaling
- NoSQL flexibility for metadata
- High availability
Schema Design (Cassandra):
CREATE TABLE video_metadata (
video_id UUID PRIMARY KEY,
user_id UUID,
upload_timestamp TIMESTAMP,
blob_url TEXT,
duration_seconds INT,
location_name TEXT,
gps_lat DOUBLE,
gps_lon DOUBLE,
detected_faces LIST<UUID>, -- Person IDs
detected_objects LIST<TEXT>,
scene_tags LIST<TEXT>,
embedding_vector BLOB, -- Vector embedding
processing_status TEXT,
created_at TIMESTAMP
);
CREATE INDEX ON video_metadata (user_id);
CREATE INDEX ON video_metadata (location_name);
Partitioning:
- Partition by
user_idfor user queries - Replication factor: 3
- Consistency level: QUORUM for writes
3.4 Video Processing Queue
Purpose:
- Async video processing (trimming, merging)
- Decouple upload from processing
- Handle burst traffic
- Retry failed processing
Queue Choice: Kafka
Why:
- High throughput (millions of messages/second)
- Message replayability
- Multiple consumer groups
- Long retention for reprocessing
Topics:
video-uploads: New video uploadsvideo-processing: Video trimming/merging jobsmetadata-updates: Metadata enrichmentmemory-video-creation: Memory video generation requests
Message Format:
{
"video_id": "uuid",
"user_id": "uuid",
"operation": "trim|merge|create_memory",
"parameters": {
"start_time": 10,
"end_time": 30,
"clip_ids": ["uuid1", "uuid2"]
},
"priority": "high|normal|low",
"timestamp": "2024-01-01T00:00:00Z"
}
3.5 Video Processing Workers
Responsibilities:
- Video trimming
- Clip merging
- Video encoding/transcoding
- Thumbnail generation
- Quality optimization
Technology:
- FFmpeg for video processing
- GPU acceleration (NVENC)
- Containerized workers (Docker/Kubernetes)
- Auto-scaling based on queue depth
Processing Pipeline:
Video Clip → Decode → Trim/Merge → Encode → Upload → Update Metadata
Optimization:
- Parallel processing
- GPU acceleration
- Adaptive bitrate encoding
- Caching intermediate results
5. Read Path (Read-Heavy)
4.1 Query Processing Service
Flow:
User Query → NLP Processing → Cache Check → Search DB → Fetch Videos → Process → Return
Natural Language Processing:
- Intent Recognition: Extract intent (create memory video)
- Entity Extraction: Extract entities (wife, Paris, date range)
- Query Expansion: Expand to related terms
- Vector Embedding: Convert to embedding vector
Example Query Processing:
Input: "Create a memory video of my wife and me in Paris"
Processing:
- Intent: CREATE_MEMORY_VIDEO
- Entities:
- People: ["wife", "me"]
- Location: "Paris"
- Relationship: "wife" → person_id mapping
- Time: (optional, default: all time)
- Embedding: [0.123, 0.456, ...] (768-dim vector)
4.2 Cache Layer (Redis)
Purpose:
- Cache hot query results
- Cache frequently accessed metadata
- Cache user-specific data
- Reduce database load
Cache Strategy:
- Query Result Cache
Key: user:{user_id}:query:{query_hash} Value: {video_ids: [...], metadata: {...}} TTL: 10 minutes - Recent Memories Cache
Key: user:{user_id}:recent:memories Value: List of recent memory video IDs TTL: 1 hour - Metadata Cache
Key: video:{video_id}:metadata Value: Video metadata JSON TTL: 1 hour - User Profile Cache
Key: user:{user_id}:profile Value: User profile, person mappings TTL: 24 hours
Cache Invalidation:
- Invalidate on video upload
- Invalidate on metadata update
- TTL-based expiration
- Manual invalidation API
4.3 Read Database - Search Engine
Database Choice: Elasticsearch
Why:
- Full-text search capabilities
- Vector search support (kNN)
- Fast search performance
- Horizontal scaling
- Rich query DSL
Index Design:
{
"mappings": {
"properties": {
"video_id": {"type": "keyword"},
"user_id": {"type": "keyword"},
"upload_timestamp": {"type": "date"},
"location_name": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
"gps": {"type": "geo_point"},
"detected_faces": {"type": "keyword"},
"detected_objects": {"type": "text"},
"scene_tags": {"type": "text"},
"embedding_vector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine"
},
"duration_seconds": {"type": "integer"},
"processing_status": {"type": "keyword"}
}
}
}
Search Queries:
- Text Search (location, tags):
{ "query": { "bool": { "must": [ {"match": {"location_name": "Paris"}}, {"terms": {"detected_faces": ["person_123", "person_456"]}} ], "filter": [ {"term": {"user_id": "user_789"}}, {"range": {"upload_timestamp": {"gte": "2024-01-01"}}} ] } } } - Vector Search (semantic similarity):
{ "query": { "script_score": { "query": {"match_all": {}}, "script": { "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0", "params": {"query_vector": [0.123, 0.456, ...]} } } } } - Hybrid Search (text + vector):
{ "query": { "bool": { "should": [ {"match": {"location_name": "Paris"}}, {"script_score": { "script": { "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0", "params": {"query_vector": [...]} } }} ], "minimum_should_match": 1 } } }
4.4 Vector Database (Alternative/Complementary)
Database Choice: Milvus / Pinecone / Weaviate
Why:
- Optimized for vector search
- Better performance for large-scale vector search
- Advanced vector indexing (IVF, HNSW)
- Can complement Elasticsearch
Use Cases:
- Primary vector search for semantic queries
- Person similarity search
- Scene similarity search
- Cross-modal search (text-to-video)
Integration:
- Use for pure vector search queries
- Elasticsearch for hybrid (text + vector) queries
- Sync embeddings between systems
6. Memory Video Creation Service
Workflow:
User Query → Query Processing → Search Videos → Select Clips →
Trim & Merge → Generate Video → Store → Return URL
Step-by-Step:
- Query Processing
- Parse natural language query
- Extract entities (people, location, time)
- Generate search criteria
- Video Search
- Search Elasticsearch/Vector DB
- Filter by user, location, people, time
- Rank by relevance
- Select top N clips (10-20 clips)
- Clip Selection Algorithm
def select_clips(videos, target_duration=120): # Sort by relevance score sorted_videos = sort_by_relevance(videos) # Select clips covering time range selected = [] total_duration = 0 for video in sorted_videos: # Extract best segment (e.g., 10-15 seconds) segment = extract_best_segment(video) if total_duration + segment.duration <= target_duration: selected.append(segment) total_duration += segment.duration else: break return selected - Video Processing
- Trim selected clips
- Add transitions
- Merge into single video
- Add music/effects (optional)
- Generate thumbnail
- Optimization
- Cache common queries
- Pre-generate popular memories
- Use GPU acceleration
- Parallel processing
Performance Optimization:
- Caching: Cache common memory videos
- Pre-computation: Pre-generate popular memories
- Lazy Generation: Generate on-demand, cache result
- Progressive Loading: Return partial results quickly
7. Blob Storage
Storage Choice: S3 / Azure Blob Storage
Organization:
s3://smart-glass-videos/
├── raw/
│ └── {user_id}/
│ └── {year}/{month}/{day}/
│ └── {video_id}.mp4
├── processed/
│ └── {user_id}/
│ └── {video_id}/
│ ├── 1080p.mp4
│ ├── 720p.mp4
│ └── thumbnail.jpg
└── memories/
└── {user_id}/
└── {memory_id}.mp4
Features:
- Lifecycle policies (move to cheaper storage)
- CDN integration (CloudFront, Azure CDN)
- Versioning for recovery
- Encryption at rest
- Cross-region replication
Optimization:
- Use different storage tiers
- Hot: Recent videos (S3 Standard)
- Warm: Older videos (S3 Standard-IA)
- Cold: Archived videos (S3 Glacier)
Detailed Design
Video Upload Flow (Hybrid)
Smart Glasses (Online):
1. Smart Glass captures video → Local temp storage
↓
2. Upload Service receives chunked upload (via WiFi/Cellular)
↓
3. Store in S3 (direct upload with signed URL)
↓
4. Publish to Kafka topic: video-uploads
↓
5. Metadata Extraction Service processes:
- Extract faces → Identify people
- Detect objects/scenes
- Generate embeddings
- Extract GPS, timestamp
↓
6. Insert metadata into Cassandra (write DB)
↓
7. Index metadata in Elasticsearch (read DB)
↓
8. Cache metadata in Redis
↓
9. Trigger video processing (trimming, encoding)
↓
10. Notify phone app via push notification
Smart Glasses (Offline):
1. Smart Glass captures video → Local persistent storage
↓
2. Store with sync token in local queue
↓
3. When online → Resume upload flow above
↓
4. Phone app syncs when connected
Phone App Upload:
1. User selects video in phone app
↓
2. Phone App → Upload Service (chunked upload)
↓
3. Rest of flow same as smart glasses
↓
4. Phone App receives push notification when complete
Sync Flow (Phone App ↔ Cloud)
1. Phone App checks sync status
↓
2. Upload pending videos (from offline queue)
↓
3. Download new videos/metadata from cloud
↓
4. Resolve conflicts:
- Last-write-wins for metadata
- Merge for tags
- User notification for conflicts
↓
5. Update local cache
↓
6. Notify user of sync completion
Memory Video Creation Flow
1. User: "Create a memory video of my wife and me in Paris"
↓
2. API Gateway receives request
↓
3. Query Processing Service:
- NLP: Extract "wife", "me", "Paris"
- Map "wife" → person_id (from user profile)
- Generate query embedding
↓
4. Check Redis cache:
- Key: user:{user_id}:query:{hash}
- If hit → return cached result
↓
5. If cache miss → Search Elasticsearch:
- Filter: user_id, location="Paris", faces=[wife_id, user_id]
- Vector search: semantic similarity
- Return top 20 clips
↓
6. Fetch video metadata from Cassandra
↓
7. Select best clips (algorithm):
- Rank by relevance
- Select segments totaling ~2 minutes
↓
8. Video Processing:
- Trim clips
- Merge with transitions
- Generate video
↓
9. Store in S3 (memories bucket)
↓
10. Update metadata, cache result
↓
11. Return video URL to user
Scalability & Performance
Read Scaling
Strategies:
- Read Replicas: Multiple Elasticsearch replicas
- Caching: Multi-layer caching (Redis, CDN)
- Sharding: Partition data by user_id
- CDN: Cache popular memory videos
- Query Optimization: Index optimization, query tuning
Metrics:
- Elasticsearch: 10+ nodes, 3 replicas per shard
- Redis: Cluster mode, 6+ nodes
- CDN: Global edge locations
Write Scaling
Strategies:
- Horizontal Partitioning: Cassandra sharding by user_id
- Async Processing: Queue-based processing
- Batch Processing: Batch metadata updates
- Direct Upload: Signed URLs for direct S3 upload
- Worker Scaling: Auto-scale processing workers
Metrics:
- Cassandra: 10+ nodes, replication factor 3
- Kafka: 6+ brokers, 3 partitions per topic
- Processing Workers: Auto-scale 10-1000 instances
Performance Targets
| Operation | Target Latency | Throughput |
|---|---|---|
| Video Upload | < 5s | 100K uploads/sec |
| Metadata Search | < 500ms | 100K queries/sec |
| Memory Video Creation | < 5s | 10K creations/sec |
| Video Processing | < 30s | 10K videos/sec |
Reliability & Durability
Data Durability
- Video Storage
- S3: 99.999999999% (11 9’s) durability
- Cross-region replication
- Versioning enabled
- Metadata
- Cassandra: Replication factor 3
- Elasticsearch: Replica count 2
- Regular backups
- Queue
- Kafka: Replication factor 3
- Message retention: 7 days
- Idempotent producers
High Availability
- Multi-Region Deployment
- Active-active regions
- Data replication across regions
- Failover mechanisms
- Health Checks
- Service health endpoints
- Database connectivity checks
- Queue depth monitoring
- Circuit Breakers
- Prevent cascade failures
- Fallback mechanisms
- Graceful degradation
Disaster Recovery
- Backup Strategy
- Daily backups of metadata
- S3 versioning for videos
- Point-in-time recovery
- Recovery Procedures
- RTO: 1 hour
- RPO: 15 minutes
- Automated failover
Security & Privacy
Authentication & Authorization
- User Authentication
- OAuth 2.0 / JWT tokens
- Multi-factor authentication
- Device registration
- Authorization
- User can only access own videos
- Role-based access control
- API key management
Data Privacy
- Encryption
- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- End-to-end encryption (optional)
- Face Recognition
- On-device processing option
- User consent for face recognition
- GDPR compliance
- Data Retention
- User-controlled retention
- Automatic deletion policies
- Right to deletion
Cost Optimization
Storage Costs
- Storage Tiers
- Hot: Recent videos (S3 Standard)
- Warm: 30-90 days (S3 Standard-IA)
- Cold: Archive (S3 Glacier)
- Compression
- Video compression (H.265)
- Metadata compression
- Efficient encoding
Compute Costs
- Processing Optimization
- GPU acceleration (cheaper than CPU)
- Batch processing
- Spot instances for non-critical jobs
- Caching
- Reduce database queries
- CDN for video delivery
- Cache hit ratio > 80%
Database Costs
- Right-Sizing
- Appropriate instance types
- Reserved instances for steady load
- Auto-scaling
Monitoring & Observability
Key Metrics
- Performance Metrics
- API latency (p50, p95, p99)
- Video processing time
- Search query latency
- Cache hit ratio
- System Metrics
- CPU, memory, disk usage
- Queue depth
- Database connection pool
- Error rates
- Business Metrics
- Videos uploaded per day
- Memory videos created
- Active users
- Storage usage
Logging
- Structured Logging
- JSON format
- Correlation IDs
- Centralized logging (ELK stack)
- Distributed Tracing
- Request tracing across services
- Performance profiling
- Error tracking
Alerting
- Critical Alerts
- Service downtime
- High error rates
- Storage capacity
- Queue backlog
Technology Stack Summary
| Component | Technology | Rationale |
|---|---|---|
| API Gateway | AWS API Gateway / Kong | Request routing, auth, rate limiting |
| Write DB | Cassandra / DynamoDB | High write throughput, scalability |
| Read DB | Elasticsearch | Full-text + vector search |
| Vector DB | Milvus / Pinecone | Optimized vector search |
| Cache | Redis Cluster | Fast in-memory caching |
| Queue | Kafka | High throughput, replayability |
| Blob Storage | S3 / Azure Blob | Scalable object storage |
| Video Processing | FFmpeg + GPU | Video encoding/transcoding |
| ML/AI | AWS Rekognition / Custom | Face recognition, object detection |
| NLP | BERT / OpenAI Embeddings | Natural language understanding |
| CDN | CloudFront / Azure CDN | Global content delivery |
Future Enhancements
- Real-Time Collaboration
- Shared memory videos
- Collaborative editing
- Real-time notifications
- Advanced AI Features
- Automatic video editing
- Music selection
- Story generation
- Emotion detection
- AR Integration
- Overlay memories in AR view
- Location-based memory triggers
- Augmented reality previews
- Social Features
- Share memory videos
- Comments and reactions
- Memory collections
What Interviewers Look For
CQRS Architecture Skills
- Read/Write Separation
- Separate read and write paths
- Appropriate database choices
- Red Flags: Single database, no separation, poor performance
- Write Path Design
- High write throughput
- Cassandra for writes
- Kafka for async processing
- Red Flags: Wrong database, synchronous processing, bottlenecks
- Read Path Design
- Fast search queries
- Elasticsearch for reads
- Redis caching
- Red Flags: Wrong database, no caching, slow queries
Video Processing Skills
- Async Processing
- Message queue for video operations
- Worker pool design
- Red Flags: Synchronous processing, blocking, poor UX
- Video Storage
- S3 for object storage
- Lifecycle policies
- Red Flags: Wrong storage, no lifecycle, high costs
- GPU Acceleration
- Video processing optimization
- Cost-effective processing
- Red Flags: No optimization, slow processing, high costs
NLP/Voice Processing Skills
- Voice Command Processing
- Speech-to-text
- Natural language understanding
- Red Flags: No NLP, poor accuracy, slow processing
- Query Understanding
- Intent extraction
- Entity recognition
- Red Flags: No understanding, poor queries, no accuracy
Problem-Solving Approach
- Workload Analysis
- Read-heavy vs. write-heavy
- Appropriate architecture
- Red Flags: No analysis, wrong architecture, poor performance
- Edge Cases
- Video processing failures
- Search failures
- Network issues
- Red Flags: Ignoring edge cases, no handling
- Trade-off Analysis
- Consistency vs. performance
- Cost vs. features
- Red Flags: No trade-offs, dogmatic choices
System Design Skills
- Component Design
- Video service
- Search service
- NLP service
- Red Flags: Monolithic, unclear boundaries
- Caching Strategy
- Multi-layer caching
- CDN for videos
- Red Flags: No caching, poor strategy, slow delivery
- Scalability Design
- Horizontal scaling
- Independent scaling
- Red Flags: Vertical scaling, bottlenecks, no scaling
Communication Skills
- CQRS Explanation
- Can explain read/write separation
- Understands benefits
- Red Flags: No understanding, vague explanations
- Architecture Justification
- Explains design decisions
- Discusses alternatives
- Red Flags: No justification, no alternatives
Meta-Specific Focus
- CQRS Expertise
- Deep understanding of CQRS
- Appropriate use cases
- Key: Show CQRS expertise
- Workload-Aware Design
- Understanding of read/write patterns
- Appropriate architecture
- Key: Demonstrate workload analysis skills
Conclusion
This smart glass system design addresses the unique challenges of handling both read-heavy search queries and write-heavy video processing workloads through:
- CQRS Architecture: Separating read and write paths
- Appropriate Database Choices: Cassandra for writes, Elasticsearch for reads
- Multi-Layer Caching: Redis for hot data, CDN for videos
- Async Processing: Kafka queue for video operations
- Horizontal Scaling: All components scale independently
Key Design Decisions:
- Write Path: Cassandra + Kafka for high throughput
- Read Path: Elasticsearch + Redis for fast search
- Storage: S3 with lifecycle policies for cost optimization
- Processing: Async workers with GPU acceleration
The system is designed to scale to millions of users while maintaining low latency for memory video creation and high reliability for video storage and processing.