Design a Smart Glass System: Voice-Controlled Memory Video Creation

Introduction

Smart glasses represent the next frontier in wearable technology, combining augmented reality, computer vision, and voice control to create immersive experiences. One of the most compelling use cases is automatic memory creation—users can simply say “Create a memory video of my wife and me in Paris” and the system intelligently finds, trims, and merges relevant video clips.

This post designs a scalable hybrid system architecture for smart glasses that integrates with a companion phone app and cloud services. The system supports:

Smart Glasses: Primary device for voice-controlled capture and AR display
Phone App: Companion app for management, viewing, and offline sync
Cloud Services: Backend processing, storage, and AI services

The architecture handles voice-controlled media capture and intelligent memory video generation, with a focus on managing read-heavy search queries and write-heavy video ingestion/processing workloads, while supporting seamless offline/online operation.

Requirements
- Functional Requirements
- Non-Functional Requirements
Capacity Estimation
Workload Analysis
Core Entities
API
Data Flow
Database Design
- Schema Design
- Database Sharding Strategy
High-Level Design
Deep Dive

Requirements

Functional Requirements

Voice-Controlled Media Capture (Smart Glasses)
- Users can take pictures/videos using voice commands
- “Take a picture”
- “Record a video”
- “Stop recording”
- Works offline with local storage
Phone App Integration
- Companion app for iOS/Android
- View and manage captured media
- Create memory videos via app interface
- Offline viewing of cached content
- Sync with cloud when online
- Push notifications for completed memory videos
Natural Language Memory Creation
- Users can request memory videos using natural language
- Example: “Create a memory video of my wife and me in Paris”
- Available on both smart glasses (voice) and phone app (text/voice)
- System finds related video clips from user’s albums
- Automatically trims and merges clips into 2-minute video
- Returns result quickly (< 2-5 seconds for cached, < 30s for new)
Intelligent Video Search
- Search by location, people, objects, time
- Semantic search using natural language
- Face recognition and person identification
- Object and scene detection
- Works across smart glasses and phone app
Hybrid Cloud/Offline Operation
- Smart glasses can operate offline
- Phone app syncs with cloud when online
- Automatic background sync
- Conflict resolution for offline edits
- Cloud processing for AI features
Video Processing
- Automatic video trimming
- Clip merging and transitions
- Video enhancement and optimization
- Thumbnail generation
- Cloud-based processing with phone app preview

Non-Functional Requirements

High Read Concurrency: Many users searching simultaneously
High Write Throughput: Videos being uploaded and processed continuously
Scalability: Handle millions of users and billions of video clips
Low Latency: Memory video creation in 2-5 seconds (cached), < 30s (new)
Offline Support: Smart glasses and phone app work offline
Sync Reliability: Reliable sync between devices and cloud
Durability: Videos and metadata never lost
Availability: 99.9% uptime for cloud services
Cost Efficiency: Optimize storage and processing costs
Battery Efficiency: Optimize for smart glasses battery life

Capacity Estimation

Traffic Estimates

Users:

10 million active users
1 million concurrent users during peak hours

Media:

Average user uploads 10 videos/day
Average video size: 50MB (1080p, 30 seconds)
Daily uploads: 100 million videos = 5TB/day
Storage: 1.8PB/year (with 3x replication = 5.4PB)

Queries:

50 million memory video requests/day
Peak: 100K requests/second
Average query processes 10-20 video clips

Processing:

Video processing: 2-5 seconds per memory video
Concurrent processing: 10K videos/second

Workload Analysis

Read-Heavy Operations

Operation	Type	Frequency	Notes
Video metadata search	Read-heavy	50M/day	Users querying by text tags, NLP embeddings
Recent memories lookup	Read-heavy	100M/day	Frequently accessed, can cache
User album browsing	Read-heavy	200M/day	Paginated queries
Face/person search	Read-heavy	30M/day	Vector similarity search

Characteristics:

High read concurrency
Need fast search (sub-second)
Can benefit from caching
Requires semantic/vector search

Write-Heavy Operations

Operation	Type	Frequency	Notes
Video upload	Write-heavy	100M/day	Large files to blob storage
Metadata ingestion	Write-heavy	100M/day	Metadata inserted on upload
Video trimming jobs	Write-heavy	50M/day	Async video processing
Metadata updates (tags, embeddings)	Write	200M/day	AI processing updates

Characteristics:

High write throughput
Large file storage
Async processing needed
Batch processing for efficiency

Storage Estimates

Video Storage:

Daily uploads: 100 million videos = 5TB/day
Annual storage: 1.8PB/year
With 3x replication: 5.4PB/year

Metadata Storage:

Per video: ~10KB metadata
100M videos/day × 10KB = 1TB/day metadata
Annual: ~365TB metadata

Bandwidth Estimates

Upload Bandwidth:

100M videos/day × 50MB = 5TB/day
Peak: ~100GB/hour

Download Bandwidth:

Memory video downloads: 50M/day × 20MB = 1TB/day
Video streaming: Variable based on concurrent viewers

Core Entities

User

Attributes: user_id, username, email, created_at, subscription_tier
Relationships: Owns videos, has memory videos, has voice commands

Video

Attributes: video_id, user_id, file_url, duration, size, created_at, metadata
Relationships: Belongs to user, processed into memory videos, has tags/embeddings

Memory Video

Attributes: memory_id, user_id, query_text, video_clips, created_at, status
Relationships: Belongs to user, contains video clips

Video Clip

Attributes: clip_id, video_id, start_time, end_time, duration, tags
Relationships: Belongs to video, part of memory videos

Voice Command

Attributes: command_id, user_id, command_text, intent, executed_at, result
Relationships: Belongs to user, triggers actions

API

Upload Video (Smart Glasses / Phone App)

POST /api/v1/videos/upload
Authorization: Bearer {token}
Content-Type: multipart/form-data

{
  "video": <binary>,
  "device_type": "smart_glass|phone_app",
  "device_id": "device_uuid",
  "metadata": {
    "duration": 30,
    "location": {...},
    "captured_at": "2025-11-08T10:00:00Z"
  },
  "sync_token": "optional_sync_token_for_offline_uploads"
}

Response: 202 Accepted
{
  "video_id": "uuid",
  "status": "uploading",
  "upload_url": "https://s3.example.com/upload/...",
  "sync_token": "sync_token_for_tracking"
}

Sync Status (Phone App)

GET /api/v1/sync/status
Authorization: Bearer {token}

Response: 200 OK
{
  "pending_uploads": 5,
  "pending_downloads": 2,
  "last_sync": "2025-11-08T10:00:00Z",
  "sync_in_progress": false
}

Trigger Sync (Phone App)

POST /api/v1/sync/trigger
Authorization: Bearer {token}

Response: 200 OK
{
  "status": "sync_started",
  "estimated_completion": "2025-11-08T10:05:00Z"
}

Create Memory Video

POST /api/v1/memories/create
Authorization: Bearer {token}
Content-Type: application/json

{
  "query": "Create a memory video of my wife and me in Paris",
  "max_duration": 120
}

Response: 202 Accepted
{
  "memory_id": "uuid",
  "status": "processing",
  "estimated_completion": "2025-11-08T10:05:00Z"
}

Get Memory Video

GET /api/v1/memories/{memory_id}

Response: 200 OK
{
  "memory_id": "uuid",
  "status": "completed",
  "video_url": "https://cdn.example.com/memories/uuid.mp4",
  "clips_used": [...],
  "created_at": "2025-11-08T10:00:00Z"
}

Search Videos

POST /api/v1/videos/search
Authorization: Bearer {token}
Content-Type: application/json

{
  "query": "videos with my wife in Paris",
  "filters": {
    "date_range": {...},
    "people": [...]
  }
}

Response: 200 OK
{
  "videos": [
    {
      "video_id": "uuid",
      "thumbnail_url": "...",
      "duration": 30,
      "matched_clips": [...]
    }
  ],
  "total": 25
}

Data Flow

Video Upload Flow (Hybrid)

Smart Glasses (Online):

Smart Glass captures video → Local storage (temporary)
Device → Upload Service (chunked upload) via Bluetooth/WiFi
Upload Service → Blob Storage (S3) - direct upload with signed URL
Upload Service → Message Queue (Kafka) - publish video-upload event
Message Queue → Metadata Extraction Service
Metadata Extraction Service processes:
- Extract faces → Identify people
- Detect objects/scenes
- Generate embeddings
- Extract location/time
Metadata Extraction Service → Metadata Database (store metadata)
Metadata Extraction Service → Vector Database (store embeddings)
Response returned to smart glasses
Smart glasses → Phone App (via Bluetooth) - notification of upload

Smart Glasses (Offline):

Smart Glass captures video → Local storage (persistent)
Video queued for upload with sync token
When online → Resume upload flow above
Phone App syncs when connected

Phone App Upload:

User selects video in phone app
Phone App → Upload Service (chunked upload)
Upload Service → Blob Storage (S3)
Rest of flow same as smart glasses
Phone App receives notification when processing complete

Sync Flow (Phone App ↔ Cloud)

Phone App checks sync status
Upload pending videos from phone
Download new videos/metadata from cloud
Resolve conflicts (last-write-wins or merge)
Update local cache
Notify user of sync completion

Memory Video Creation Flow

User speaks command → Device
Device → Voice Processing Service (NLP)
Voice Processing Service → Memory Video Service (create request)
Memory Video Service → Vector Search Service (search videos by query)
Vector Search Service → Vector Database (semantic search)
Vector Search Service → Memory Video Service (return matching clips)
Memory Video Service → Video Processing Service (trim and merge clips)
Video Processing Service → Blob Storage (store memory video)
Video Processing Service → Memory Video Service (update status)
Memory Video Service → Device (return memory video URL)

Database Design

Schema Design

Users Table:

CREATE TABLE users (
    user_id VARCHAR(36) PRIMARY KEY,
    username VARCHAR(255) NOT NULL,
    email VARCHAR(255),
    subscription_tier VARCHAR(50),
    created_at TIMESTAMP,
    INDEX idx_email (email)
);

Videos Table:

CREATE TABLE videos (
    video_id VARCHAR(36) PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    file_url VARCHAR(512) NOT NULL,
    duration INT,
    size BIGINT,
    metadata JSON,
    created_at TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Memory Videos Table:

CREATE TABLE memory_videos (
    memory_id VARCHAR(36) PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    query_text TEXT NOT NULL,
    video_clips JSON,
    status ENUM('processing', 'completed', 'failed') DEFAULT 'processing',
    video_url VARCHAR(512),
    created_at TIMESTAMP,
    completed_at TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_status (status),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Video Clips Table:

CREATE TABLE video_clips (
    clip_id VARCHAR(36) PRIMARY KEY,
    video_id VARCHAR(36) NOT NULL,
    start_time INT NOT NULL,
    end_time INT NOT NULL,
    duration INT,
    tags JSON,
    embeddings JSON,
    INDEX idx_video_id (video_id),
    FOREIGN KEY (video_id) REFERENCES videos(video_id)
);

Database Sharding Strategy

Shard by User ID:

User data, videos, and memory videos on same shard
Enables efficient user queries
Use consistent hashing for distribution

Vector Database:

Use specialized vector DB (Pinecone, Weaviate, Milvus)
Partition by user_id for isolation
Optimize for similarity search

High-Level Design

High-Level Architecture (Hybrid)

┌─────────────────────────────────────────────────────────────┐
│                    Client Layer                              │
│  ┌──────────────────┐         ┌──────────────────┐        │
│  │  Smart Glasses   │◄──BT───►│   Phone App       │        │
│  │  - Voice Control │         │  - Management     │        │
│  │  - AR Display    │         │  - Viewing        │        │
│  │  - Local Storage │         │  - Offline Cache  │        │
│  └────────┬─────────┘         └────────┬─────────┘        │
└───────────┼─────────────────────────────┼───────────────────┘
            │                             │
            │ HTTPS / WebSocket            │ HTTPS / WebSocket
            │ (WiFi/Cellular)              │ (WiFi/Cellular)
            │                             │
┌───────────▼─────────────────────────────▼───────────────────┐
│              API Gateway / Load Balancer                      │
│         (Authentication, Rate Limiting, Routing)            │
└───────────┬─────────────────────────────────────────────────┘
            │
    ┌───────┴───────┐
    │               │
┌───▼──────┐  ┌─────▼──────┐
│ Read Path│  │ Write Path │
│ (Search) │  │ (Ingestion)│
└──────────┘  └────────────┘
    │               │
    │               │
┌───▼──────┐  ┌─────▼──────┐
│  Cache   │  │   Queue    │
│ (Redis)  │  │  (Kafka)    │
└──────────┘  └────────────┘
    │               │
┌───▼──────┐  ┌─────▼──────┐
│ Read DB  │  │ Write DB   │
│(Elastic- │  │ (Cassandra)│
│ search/  │  │            │
│Vector DB)│  │            │
└──────────┘  └────────────┘
    │               │
    └───────┬───────┘
            │
    ┌───────▼───────┐
    │ Blob Storage  │
    │ (S3/Azure Blob)│
    └───────────────┘
            │
    ┌───────▼───────┐
    │  Sync Service │
    │  (Phone App)  │
    └───────────────┘

Hybrid Architecture Components

Smart Glasses
- Primary capture device
- Voice control interface
- AR display
- Local storage for offline operation
- Bluetooth connection to phone app
Phone App
- Companion management app
- Media viewing and management
- Offline cache
- Cloud sync coordinator
- Push notifications
Cloud Services
- Backend processing
- AI/ML services
- Storage and metadata
- Sync coordination

Architecture Principles

CQRS Pattern: Separate read and write paths
Event-Driven: Async processing for video operations
Microservices: Independent scaling of components
Caching: Multiple cache layers for performance
Horizontal Scaling: All components scale independently

Deep Dive

Component Design

1. Smart Glasses

Responsibilities:

Voice command capture and processing
Media capture (photos/videos)
AR display of memories
Local storage for offline operation
Bluetooth communication with phone app

Key Features:

Voice recognition (on-device for basic commands, cloud for complex)
Real-time video preview
Background upload when online
Local storage (up to 10GB for offline videos)
Low-power operation
Bluetooth Low Energy (BLE) for phone app connection

Technology:

Embedded OS (custom or Android-based)
On-device ML models (lightweight)
Local SQLite for metadata cache
BLE stack for phone connectivity

Offline Operation:

Store videos locally when offline
Queue uploads with sync tokens
Resume uploads when online
Basic voice commands work offline

1.1 Phone App (Companion App)

Responsibilities:

Media viewing and management
Memory video creation (text/voice input)
Cloud sync coordination
Offline cache management
Push notification handling

Key Features:

Full media library browsing
Create memory videos via app
Offline viewing of cached content
Background sync with cloud
Conflict resolution for offline edits
Push notifications for completed processing

Technology:

Native apps (iOS Swift, Android Kotlin)
Local SQLite for offline cache
Background sync service
Push notification service (FCM/APNS)

Offline Support:

Cache recent videos (up to 5GB)
Cache metadata for offline search
Queue operations for sync
View cached content offline

Sync Strategy:

Incremental sync (only changed data)
Conflict resolution (last-write-wins)
Background sync every 15 minutes
Manual sync trigger
Sync status indicators

2. API Gateway

Responsibilities:

Request routing
Authentication and authorization
Rate limiting
Request/response transformation
Load balancing

Features:

JWT token validation
User quota management
Request throttling
API versioning

Technology:

AWS API Gateway
Azure API Management
Kong / Envoy

3. Sync Service (Phone App ↔ Cloud)

Responsibilities:

Coordinate sync between phone app and cloud
Handle offline uploads
Resolve conflicts
Manage sync tokens
Track sync status

Sync Flow:

Phone App (Offline) → Queue Operations → 
When Online → Sync Service → 
Upload Pending Videos → 
Download New Content → 
Resolve Conflicts → 
Update Local Cache

Conflict Resolution:

Last-write-wins for metadata
Merge for tags/annotations
User notification for conflicts
Manual resolution option

Sync Tokens:

Track sync state per device
Incremental sync (only changes)
Resume interrupted syncs
Handle concurrent syncs

Technology:

REST API for sync operations
WebSocket for real-time updates
Sync queue in phone app
Background sync service

4. Write Path (Write-Heavy)

3.1 Video Upload Service

Flow:

Smart Glass → Upload Service → Blob Storage → Metadata Extraction → Write DB

Process:

Receive video upload (chunked upload for large files)
Store video in blob storage (S3/Azure Blob)
Trigger metadata extraction
Insert metadata into write-optimized DB
Queue video for processing

Optimizations:

Chunked uploads (resumable)
Compression before upload
Direct upload to blob storage (signed URLs)
Async metadata extraction

3.2 Metadata Extraction Service

Extracted Metadata:

Temporal: Timestamp, duration
Spatial: GPS coordinates, location name
People: Face detection, person identification
Objects: Scene detection, object recognition
Embeddings: Vector embeddings for semantic search
Audio: Speech-to-text, audio features
Video: Resolution, fps, codec

AI/ML Models:

Face recognition (AWS Rekognition, Azure Face API)
Object detection (YOLO, TensorFlow)
Scene classification (CNN models)
NLP embeddings (BERT, OpenAI embeddings)
Speech-to-text (Whisper, Google Speech)

Technology:

Microservice architecture
GPU clusters for ML inference
Batch processing for cost efficiency
Real-time processing for recent videos

3.3 Write Database

Requirements:

High write throughput (100M writes/day)
Scalable and distributed
Flexible schema for metadata
Fast ingestion

Database Choice: Cassandra / DynamoDB

Why:

Excellent write performance
Horizontal scaling
NoSQL flexibility for metadata
High availability

Schema Design (Cassandra):

CREATE TABLE video_metadata (
    video_id UUID PRIMARY KEY,
    user_id UUID,
    upload_timestamp TIMESTAMP,
    blob_url TEXT,
    duration_seconds INT,
    location_name TEXT,
    gps_lat DOUBLE,
    gps_lon DOUBLE,
    detected_faces LIST<UUID>,  -- Person IDs
    detected_objects LIST<TEXT>,
    scene_tags LIST<TEXT>,
    embedding_vector BLOB,  -- Vector embedding
    processing_status TEXT,
    created_at TIMESTAMP
);

CREATE INDEX ON video_metadata (user_id);
CREATE INDEX ON video_metadata (location_name);

Partitioning:

Partition by user_id for user queries
Replication factor: 3
Consistency level: QUORUM for writes

3.4 Video Processing Queue

Purpose:

Async video processing (trimming, merging)
Decouple upload from processing
Handle burst traffic
Retry failed processing

Queue Choice: Kafka

Why:

High throughput (millions of messages/second)
Message replayability
Multiple consumer groups
Long retention for reprocessing

Topics:

video-uploads: New video uploads
video-processing: Video trimming/merging jobs
metadata-updates: Metadata enrichment
memory-video-creation: Memory video generation requests

Message Format:

{
  "video_id": "uuid",
  "user_id": "uuid",
  "operation": "trim|merge|create_memory",
  "parameters": {
    "start_time": 10,
    "end_time": 30,
    "clip_ids": ["uuid1", "uuid2"]
  },
  "priority": "high|normal|low",
  "timestamp": "2024-01-01T00:00:00Z"
}

3.5 Video Processing Workers

Responsibilities:

Video trimming
Clip merging
Video encoding/transcoding
Thumbnail generation
Quality optimization

Technology:

FFmpeg for video processing
GPU acceleration (NVENC)
Containerized workers (Docker/Kubernetes)
Auto-scaling based on queue depth

Processing Pipeline:

Video Clip → Decode → Trim/Merge → Encode → Upload → Update Metadata

Optimization:

Parallel processing
GPU acceleration
Adaptive bitrate encoding
Caching intermediate results

5. Read Path (Read-Heavy)

4.1 Query Processing Service

Flow:

User Query → NLP Processing → Cache Check → Search DB → Fetch Videos → Process → Return

Natural Language Processing:

Intent Recognition: Extract intent (create memory video)
Entity Extraction: Extract entities (wife, Paris, date range)
Query Expansion: Expand to related terms
Vector Embedding: Convert to embedding vector

Example Query Processing:

Input: "Create a memory video of my wife and me in Paris"

Processing:
- Intent: CREATE_MEMORY_VIDEO
- Entities:
  - People: ["wife", "me"]
  - Location: "Paris"
  - Relationship: "wife" → person_id mapping
- Time: (optional, default: all time)
- Embedding: [0.123, 0.456, ...] (768-dim vector)

4.2 Cache Layer (Redis)

Purpose:

Cache hot query results
Cache frequently accessed metadata
Cache user-specific data
Reduce database load

Cache Strategy:

Query Result Cache

Key: user:{user_id}:query:{query_hash}
Value: {video_ids: [...], metadata: {...}}
TTL: 10 minutes

Recent Memories Cache

Key: user:{user_id}:recent:memories
Value: List of recent memory video IDs
TTL: 1 hour

Metadata Cache

Key: video:{video_id}:metadata
Value: Video metadata JSON
TTL: 1 hour

User Profile Cache

Key: user:{user_id}:profile
Value: User profile, person mappings
TTL: 24 hours

Cache Invalidation:

Invalidate on video upload
Invalidate on metadata update
TTL-based expiration
Manual invalidation API

4.3 Read Database - Search Engine

Database Choice: Elasticsearch

Why:

Full-text search capabilities
Vector search support (kNN)
Fast search performance
Horizontal scaling
Rich query DSL

Index Design:

{
  "mappings": {
    "properties": {
      "video_id": {"type": "keyword"},
      "user_id": {"type": "keyword"},
      "upload_timestamp": {"type": "date"},
      "location_name": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
      "gps": {"type": "geo_point"},
      "detected_faces": {"type": "keyword"},
      "detected_objects": {"type": "text"},
      "scene_tags": {"type": "text"},
      "embedding_vector": {
        "type": "dense_vector",
        "dims": 768,
        "index": true,
        "similarity": "cosine"
      },
      "duration_seconds": {"type": "integer"},
      "processing_status": {"type": "keyword"}
    }
  }
}

Search Queries:

Text Search (location, tags):

{
  "query": {
 "bool": {
   "must": [
     {"match": {"location_name": "Paris"}},
     {"terms": {"detected_faces": ["person_123", "person_456"]}}
   ],
   "filter": [
     {"term": {"user_id": "user_789"}},
     {"range": {"upload_timestamp": {"gte": "2024-01-01"}}}
   ]
 }
  }
}

Vector Search (semantic similarity):

{
  "query": {
 "script_score": {
   "query": {"match_all": {}},
   "script": {
     "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0",
     "params": {"query_vector": [0.123, 0.456, ...]}
   }
 }
  }
}

Hybrid Search (text + vector):

{
  "query": {
 "bool": {
   "should": [
     {"match": {"location_name": "Paris"}},
     {"script_score": {
       "script": {
         "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0",
         "params": {"query_vector": [...]}
       }
     }}
   ],
   "minimum_should_match": 1
 }
  }
}

4.4 Vector Database (Alternative/Complementary)

Database Choice: Milvus / Pinecone / Weaviate

Why:

Optimized for vector search
Better performance for large-scale vector search
Advanced vector indexing (IVF, HNSW)
Can complement Elasticsearch

Use Cases:

Primary vector search for semantic queries
Person similarity search
Scene similarity search
Cross-modal search (text-to-video)

Integration:

Use for pure vector search queries
Elasticsearch for hybrid (text + vector) queries
Sync embeddings between systems

6. Memory Video Creation Service

Workflow:

User Query → Query Processing → Search Videos → Select Clips → 
Trim & Merge → Generate Video → Store → Return URL

Step-by-Step:

Query Processing
- Parse natural language query
- Extract entities (people, location, time)
- Generate search criteria
Video Search
- Search Elasticsearch/Vector DB
- Filter by user, location, people, time
- Rank by relevance
- Select top N clips (10-20 clips)

Clip Selection Algorithm

def select_clips(videos, target_duration=120):
    # Sort by relevance score
    sorted_videos = sort_by_relevance(videos)
       
    # Select clips covering time range
    selected = []
    total_duration = 0
       
    for video in sorted_videos:
        # Extract best segment (e.g., 10-15 seconds)
        segment = extract_best_segment(video)
           
        if total_duration + segment.duration <= target_duration:
            selected.append(segment)
            total_duration += segment.duration
        else:
            break
       
    return selected

Video Processing
- Trim selected clips
- Add transitions
- Merge into single video
- Add music/effects (optional)
- Generate thumbnail
Optimization
- Cache common queries
- Pre-generate popular memories
- Use GPU acceleration
- Parallel processing

Performance Optimization:

Caching: Cache common memory videos
Pre-computation: Pre-generate popular memories
Lazy Generation: Generate on-demand, cache result
Progressive Loading: Return partial results quickly

7. Blob Storage

Storage Choice: S3 / Azure Blob Storage

Organization:

s3://smart-glass-videos/
  ├── raw/
  │   └── {user_id}/
  │       └── {year}/{month}/{day}/
  │           └── {video_id}.mp4
  ├── processed/
  │   └── {user_id}/
  │       └── {video_id}/
  │           ├── 1080p.mp4
  │           ├── 720p.mp4
  │           └── thumbnail.jpg
  └── memories/
      └── {user_id}/
          └── {memory_id}.mp4

Features:

Lifecycle policies (move to cheaper storage)
CDN integration (CloudFront, Azure CDN)
Versioning for recovery
Encryption at rest
Cross-region replication

Optimization:

Use different storage tiers
Hot: Recent videos (S3 Standard)
Warm: Older videos (S3 Standard-IA)
Cold: Archived videos (S3 Glacier)

Detailed Design

Video Upload Flow (Hybrid)

Smart Glasses (Online):

1. Smart Glass captures video → Local temp storage
   ↓
2. Upload Service receives chunked upload (via WiFi/Cellular)
   ↓
3. Store in S3 (direct upload with signed URL)
   ↓
4. Publish to Kafka topic: video-uploads
   ↓
5. Metadata Extraction Service processes:
   - Extract faces → Identify people
   - Detect objects/scenes
   - Generate embeddings
   - Extract GPS, timestamp
   ↓
6. Insert metadata into Cassandra (write DB)
   ↓
7. Index metadata in Elasticsearch (read DB)
   ↓
8. Cache metadata in Redis
   ↓
9. Trigger video processing (trimming, encoding)
   ↓
10. Notify phone app via push notification

Smart Glasses (Offline):

1. Smart Glass captures video → Local persistent storage
   ↓
2. Store with sync token in local queue
   ↓
3. When online → Resume upload flow above
   ↓
4. Phone app syncs when connected

Phone App Upload:

1. User selects video in phone app
   ↓
2. Phone App → Upload Service (chunked upload)
   ↓
3. Rest of flow same as smart glasses
   ↓
4. Phone App receives push notification when complete

Sync Flow (Phone App ↔ Cloud)

1. Phone App checks sync status
   ↓
2. Upload pending videos (from offline queue)
   ↓
3. Download new videos/metadata from cloud
   ↓
4. Resolve conflicts:
   - Last-write-wins for metadata
   - Merge for tags
   - User notification for conflicts
   ↓
5. Update local cache
   ↓
6. Notify user of sync completion

Memory Video Creation Flow

1. User: "Create a memory video of my wife and me in Paris"
   ↓
2. API Gateway receives request
   ↓
3. Query Processing Service:
   - NLP: Extract "wife", "me", "Paris"
   - Map "wife" → person_id (from user profile)
   - Generate query embedding
   ↓
4. Check Redis cache:
   - Key: user:{user_id}:query:{hash}
   - If hit → return cached result
   ↓
5. If cache miss → Search Elasticsearch:
   - Filter: user_id, location="Paris", faces=[wife_id, user_id]
   - Vector search: semantic similarity
   - Return top 20 clips
   ↓
6. Fetch video metadata from Cassandra
   ↓
7. Select best clips (algorithm):
   - Rank by relevance
   - Select segments totaling ~2 minutes
   ↓
8. Video Processing:
   - Trim clips
   - Merge with transitions
   - Generate video
   ↓
9. Store in S3 (memories bucket)
   ↓
10. Update metadata, cache result
    ↓
11. Return video URL to user

Scalability & Performance

Read Scaling

Strategies:

Read Replicas: Multiple Elasticsearch replicas
Caching: Multi-layer caching (Redis, CDN)
Sharding: Partition data by user_id
CDN: Cache popular memory videos
Query Optimization: Index optimization, query tuning

Metrics:

Elasticsearch: 10+ nodes, 3 replicas per shard
Redis: Cluster mode, 6+ nodes
CDN: Global edge locations

Write Scaling

Strategies:

Horizontal Partitioning: Cassandra sharding by user_id
Async Processing: Queue-based processing
Batch Processing: Batch metadata updates
Direct Upload: Signed URLs for direct S3 upload
Worker Scaling: Auto-scale processing workers

Metrics:

Cassandra: 10+ nodes, replication factor 3
Kafka: 6+ brokers, 3 partitions per topic
Processing Workers: Auto-scale 10-1000 instances

Performance Targets

Operation	Target Latency	Throughput
Video Upload	< 5s	100K uploads/sec
Metadata Search	< 500ms	100K queries/sec
Memory Video Creation	< 5s	10K creations/sec
Video Processing	< 30s	10K videos/sec

Reliability & Durability

Data Durability

Video Storage
- S3: 99.999999999% (11 9’s) durability
- Cross-region replication
- Versioning enabled
Metadata
- Cassandra: Replication factor 3
- Elasticsearch: Replica count 2
- Regular backups
Queue
- Kafka: Replication factor 3
- Message retention: 7 days
- Idempotent producers

High Availability

Multi-Region Deployment
- Active-active regions
- Data replication across regions
- Failover mechanisms
Health Checks
- Service health endpoints
- Database connectivity checks
- Queue depth monitoring
Circuit Breakers
- Prevent cascade failures
- Fallback mechanisms
- Graceful degradation

Disaster Recovery

Backup Strategy
- Daily backups of metadata
- S3 versioning for videos
- Point-in-time recovery
Recovery Procedures
- RTO: 1 hour
- RPO: 15 minutes
- Automated failover

Security & Privacy

Authentication & Authorization

User Authentication
- OAuth 2.0 / JWT tokens
- Multi-factor authentication
- Device registration
Authorization
- User can only access own videos
- Role-based access control
- API key management

Data Privacy

Encryption
- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- End-to-end encryption (optional)
Face Recognition
- On-device processing option
- User consent for face recognition
- GDPR compliance
Data Retention
- User-controlled retention
- Automatic deletion policies
- Right to deletion

Cost Optimization

Storage Costs

Storage Tiers
- Hot: Recent videos (S3 Standard)
- Warm: 30-90 days (S3 Standard-IA)
- Cold: Archive (S3 Glacier)
Compression
- Video compression (H.265)
- Metadata compression
- Efficient encoding

Compute Costs

Processing Optimization
- GPU acceleration (cheaper than CPU)
- Batch processing
- Spot instances for non-critical jobs
Caching
- Reduce database queries
- CDN for video delivery
- Cache hit ratio > 80%

Database Costs

Right-Sizing
- Appropriate instance types
- Reserved instances for steady load
- Auto-scaling

Monitoring & Observability

Key Metrics

Performance Metrics
- API latency (p50, p95, p99)
- Video processing time
- Search query latency
- Cache hit ratio
System Metrics
- CPU, memory, disk usage
- Queue depth
- Database connection pool
- Error rates
Business Metrics
- Videos uploaded per day
- Memory videos created
- Active users
- Storage usage

Logging

Structured Logging
- JSON format
- Correlation IDs
- Centralized logging (ELK stack)
Distributed Tracing
- Request tracing across services
- Performance profiling
- Error tracking

Alerting

Critical Alerts
- Service downtime
- High error rates
- Storage capacity
- Queue backlog

Technology Stack Summary

Component	Technology	Rationale
API Gateway	AWS API Gateway / Kong	Request routing, auth, rate limiting
Write DB	Cassandra / DynamoDB	High write throughput, scalability
Read DB	Elasticsearch	Full-text + vector search
Vector DB	Milvus / Pinecone	Optimized vector search
Cache	Redis Cluster	Fast in-memory caching
Queue	Kafka	High throughput, replayability
Blob Storage	S3 / Azure Blob	Scalable object storage
Video Processing	FFmpeg + GPU	Video encoding/transcoding
ML/AI	AWS Rekognition / Custom	Face recognition, object detection
NLP	BERT / OpenAI Embeddings	Natural language understanding
CDN	CloudFront / Azure CDN	Global content delivery

Future Enhancements

Real-Time Collaboration
- Shared memory videos
- Collaborative editing
- Real-time notifications
Advanced AI Features
- Automatic video editing
- Music selection
- Story generation
- Emotion detection
AR Integration
- Overlay memories in AR view
- Location-based memory triggers
- Augmented reality previews
Social Features
- Share memory videos
- Comments and reactions
- Memory collections

What Interviewers Look For

CQRS Architecture Skills

Read/Write Separation
- Separate read and write paths
- Appropriate database choices
- Red Flags: Single database, no separation, poor performance
Write Path Design
- High write throughput
- Cassandra for writes
- Kafka for async processing
- Red Flags: Wrong database, synchronous processing, bottlenecks
Read Path Design
- Fast search queries
- Elasticsearch for reads
- Redis caching
- Red Flags: Wrong database, no caching, slow queries

Video Processing Skills

Async Processing
- Message queue for video operations
- Worker pool design
- Red Flags: Synchronous processing, blocking, poor UX
Video Storage
- S3 for object storage
- Lifecycle policies
- Red Flags: Wrong storage, no lifecycle, high costs
GPU Acceleration
- Video processing optimization
- Cost-effective processing
- Red Flags: No optimization, slow processing, high costs

NLP/Voice Processing Skills

Voice Command Processing
- Speech-to-text
- Natural language understanding
- Red Flags: No NLP, poor accuracy, slow processing
Query Understanding
- Intent extraction
- Entity recognition
- Red Flags: No understanding, poor queries, no accuracy

Problem-Solving Approach

Workload Analysis
- Read-heavy vs. write-heavy
- Appropriate architecture
- Red Flags: No analysis, wrong architecture, poor performance
Edge Cases
- Video processing failures
- Search failures
- Network issues
- Red Flags: Ignoring edge cases, no handling
Trade-off Analysis
- Consistency vs. performance
- Cost vs. features
- Red Flags: No trade-offs, dogmatic choices

System Design Skills

Component Design
- Video service
- Search service
- NLP service
- Red Flags: Monolithic, unclear boundaries
Caching Strategy
- Multi-layer caching
- CDN for videos
- Red Flags: No caching, poor strategy, slow delivery
Scalability Design
- Horizontal scaling
- Independent scaling
- Red Flags: Vertical scaling, bottlenecks, no scaling

Communication Skills

CQRS Explanation
- Can explain read/write separation
- Understands benefits
- Red Flags: No understanding, vague explanations
Architecture Justification
- Explains design decisions
- Discusses alternatives
- Red Flags: No justification, no alternatives

Meta-Specific Focus

CQRS Expertise
- Deep understanding of CQRS
- Appropriate use cases
- Key: Show CQRS expertise
Workload-Aware Design
- Understanding of read/write patterns
- Appropriate architecture
- Key: Demonstrate workload analysis skills

Conclusion

This smart glass system design addresses the unique challenges of handling both read-heavy search queries and write-heavy video processing workloads through:

CQRS Architecture: Separating read and write paths
Appropriate Database Choices: Cassandra for writes, Elasticsearch for reads
Multi-Layer Caching: Redis for hot data, CDN for videos
Async Processing: Kafka queue for video operations
Horizontal Scaling: All components scale independently

Key Design Decisions:

Write Path: Cassandra + Kafka for high throughput
Read Path: Elasticsearch + Redis for fast search
Storage: S3 with lifecycle policies for cost optimization
Processing: Async workers with GPU acceleration

The system is designed to scale to millions of users while maintaining low latency for memory video creation and high reliability for video storage and processing.

Introduction

Table of Contents

Requirements

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Traffic Estimates

Workload Analysis

Read-Heavy Operations

Write-Heavy Operations

Storage Estimates

Bandwidth Estimates

Core Entities

User

Video

Memory Video

Video Clip

Voice Command

API

Upload Video (Smart Glasses / Phone App)

Sync Status (Phone App)

Trigger Sync (Phone App)

Create Memory Video

Get Memory Video

Search Videos

Data Flow

Video Upload Flow (Hybrid)

Sync Flow (Phone App ↔ Cloud)

Memory Video Creation Flow

Database Design

Schema Design

Database Sharding Strategy

High-Level Design

High-Level Architecture (Hybrid)

Hybrid Architecture Components

Architecture Principles

Deep Dive

Component Design

1. Smart Glasses

1.1 Phone App (Companion App)

2. API Gateway

3. Sync Service (Phone App ↔ Cloud)

4. Write Path (Write-Heavy)

3.1 Video Upload Service

3.2 Metadata Extraction Service

3.3 Write Database

3.4 Video Processing Queue

3.5 Video Processing Workers

5. Read Path (Read-Heavy)

4.1 Query Processing Service

4.2 Cache Layer (Redis)

4.3 Read Database - Search Engine

4.4 Vector Database (Alternative/Complementary)

6. Memory Video Creation Service

7. Blob Storage

Detailed Design

Video Upload Flow (Hybrid)

Sync Flow (Phone App ↔ Cloud)

Memory Video Creation Flow

Scalability & Performance

Read Scaling

Write Scaling

Performance Targets

Reliability & Durability

Data Durability

High Availability

Disaster Recovery

Security & Privacy

Authentication & Authorization

Data Privacy

Cost Optimization

Storage Costs

Compute Costs

Database Costs

Monitoring & Observability

Key Metrics

Logging

Alerting

Technology Stack Summary

Future Enhancements