Introduction

Smart glasses represent the next frontier in wearable technology, combining augmented reality, computer vision, and voice control to create immersive experiences. One of the most compelling use cases is automatic memory creation—users can simply say “Create a memory video of my wife and me in Paris” and the system intelligently finds, trims, and merges relevant video clips.

This post designs a scalable hybrid system architecture for smart glasses that integrates with a companion phone app and cloud services. The system supports:

  • Smart Glasses: Primary device for voice-controlled capture and AR display
  • Phone App: Companion app for management, viewing, and offline sync
  • Cloud Services: Backend processing, storage, and AI services

The architecture handles voice-controlled media capture and intelligent memory video generation, with a focus on managing read-heavy search queries and write-heavy video ingestion/processing workloads, while supporting seamless offline/online operation.

Table of Contents

  1. Requirements
  2. Capacity Estimation
  3. Workload Analysis
  4. Core Entities
  5. API
  6. Data Flow
  7. Database Design
  8. High-Level Design
  9. Deep Dive

Requirements

Functional Requirements

  1. Voice-Controlled Media Capture (Smart Glasses)
    • Users can take pictures/videos using voice commands
    • “Take a picture”
    • “Record a video”
    • “Stop recording”
    • Works offline with local storage
  2. Phone App Integration
    • Companion app for iOS/Android
    • View and manage captured media
    • Create memory videos via app interface
    • Offline viewing of cached content
    • Sync with cloud when online
    • Push notifications for completed memory videos
  3. Natural Language Memory Creation
    • Users can request memory videos using natural language
    • Example: “Create a memory video of my wife and me in Paris”
    • Available on both smart glasses (voice) and phone app (text/voice)
    • System finds related video clips from user’s albums
    • Automatically trims and merges clips into 2-minute video
    • Returns result quickly (< 2-5 seconds for cached, < 30s for new)
  4. Intelligent Video Search
    • Search by location, people, objects, time
    • Semantic search using natural language
    • Face recognition and person identification
    • Object and scene detection
    • Works across smart glasses and phone app
  5. Hybrid Cloud/Offline Operation
    • Smart glasses can operate offline
    • Phone app syncs with cloud when online
    • Automatic background sync
    • Conflict resolution for offline edits
    • Cloud processing for AI features
  6. Video Processing
    • Automatic video trimming
    • Clip merging and transitions
    • Video enhancement and optimization
    • Thumbnail generation
    • Cloud-based processing with phone app preview

Non-Functional Requirements

  • High Read Concurrency: Many users searching simultaneously
  • High Write Throughput: Videos being uploaded and processed continuously
  • Scalability: Handle millions of users and billions of video clips
  • Low Latency: Memory video creation in 2-5 seconds (cached), < 30s (new)
  • Offline Support: Smart glasses and phone app work offline
  • Sync Reliability: Reliable sync between devices and cloud
  • Durability: Videos and metadata never lost
  • Availability: 99.9% uptime for cloud services
  • Cost Efficiency: Optimize storage and processing costs
  • Battery Efficiency: Optimize for smart glasses battery life

Capacity Estimation

Traffic Estimates

Users:

  • 10 million active users
  • 1 million concurrent users during peak hours

Media:

  • Average user uploads 10 videos/day
  • Average video size: 50MB (1080p, 30 seconds)
  • Daily uploads: 100 million videos = 5TB/day
  • Storage: 1.8PB/year (with 3x replication = 5.4PB)

Queries:

  • 50 million memory video requests/day
  • Peak: 100K requests/second
  • Average query processes 10-20 video clips

Processing:

  • Video processing: 2-5 seconds per memory video
  • Concurrent processing: 10K videos/second

Workload Analysis

Read-Heavy Operations

Operation Type Frequency Notes
Video metadata search Read-heavy 50M/day Users querying by text tags, NLP embeddings
Recent memories lookup Read-heavy 100M/day Frequently accessed, can cache
User album browsing Read-heavy 200M/day Paginated queries
Face/person search Read-heavy 30M/day Vector similarity search

Characteristics:

  • High read concurrency
  • Need fast search (sub-second)
  • Can benefit from caching
  • Requires semantic/vector search

Write-Heavy Operations

Operation Type Frequency Notes
Video upload Write-heavy 100M/day Large files to blob storage
Metadata ingestion Write-heavy 100M/day Metadata inserted on upload
Video trimming jobs Write-heavy 50M/day Async video processing
Metadata updates (tags, embeddings) Write 200M/day AI processing updates

Characteristics:

  • High write throughput
  • Large file storage
  • Async processing needed
  • Batch processing for efficiency

Storage Estimates

Video Storage:

  • Daily uploads: 100 million videos = 5TB/day
  • Annual storage: 1.8PB/year
  • With 3x replication: 5.4PB/year

Metadata Storage:

  • Per video: ~10KB metadata
  • 100M videos/day × 10KB = 1TB/day metadata
  • Annual: ~365TB metadata

Bandwidth Estimates

Upload Bandwidth:

  • 100M videos/day × 50MB = 5TB/day
  • Peak: ~100GB/hour

Download Bandwidth:

  • Memory video downloads: 50M/day × 20MB = 1TB/day
  • Video streaming: Variable based on concurrent viewers

Core Entities

User

  • Attributes: user_id, username, email, created_at, subscription_tier
  • Relationships: Owns videos, has memory videos, has voice commands

Video

  • Attributes: video_id, user_id, file_url, duration, size, created_at, metadata
  • Relationships: Belongs to user, processed into memory videos, has tags/embeddings

Memory Video

  • Attributes: memory_id, user_id, query_text, video_clips, created_at, status
  • Relationships: Belongs to user, contains video clips

Video Clip

  • Attributes: clip_id, video_id, start_time, end_time, duration, tags
  • Relationships: Belongs to video, part of memory videos

Voice Command

  • Attributes: command_id, user_id, command_text, intent, executed_at, result
  • Relationships: Belongs to user, triggers actions

API

Upload Video (Smart Glasses / Phone App)

POST /api/v1/videos/upload
Authorization: Bearer {token}
Content-Type: multipart/form-data

{
  "video": <binary>,
  "device_type": "smart_glass|phone_app",
  "device_id": "device_uuid",
  "metadata": {
    "duration": 30,
    "location": {...},
    "captured_at": "2025-11-08T10:00:00Z"
  },
  "sync_token": "optional_sync_token_for_offline_uploads"
}

Response: 202 Accepted
{
  "video_id": "uuid",
  "status": "uploading",
  "upload_url": "https://s3.example.com/upload/...",
  "sync_token": "sync_token_for_tracking"
}

Sync Status (Phone App)

GET /api/v1/sync/status
Authorization: Bearer {token}

Response: 200 OK
{
  "pending_uploads": 5,
  "pending_downloads": 2,
  "last_sync": "2025-11-08T10:00:00Z",
  "sync_in_progress": false
}

Trigger Sync (Phone App)

POST /api/v1/sync/trigger
Authorization: Bearer {token}

Response: 200 OK
{
  "status": "sync_started",
  "estimated_completion": "2025-11-08T10:05:00Z"
}

Create Memory Video

POST /api/v1/memories/create
Authorization: Bearer {token}
Content-Type: application/json

{
  "query": "Create a memory video of my wife and me in Paris",
  "max_duration": 120
}

Response: 202 Accepted
{
  "memory_id": "uuid",
  "status": "processing",
  "estimated_completion": "2025-11-08T10:05:00Z"
}

Get Memory Video

GET /api/v1/memories/{memory_id}

Response: 200 OK
{
  "memory_id": "uuid",
  "status": "completed",
  "video_url": "https://cdn.example.com/memories/uuid.mp4",
  "clips_used": [...],
  "created_at": "2025-11-08T10:00:00Z"
}

Search Videos

POST /api/v1/videos/search
Authorization: Bearer {token}
Content-Type: application/json

{
  "query": "videos with my wife in Paris",
  "filters": {
    "date_range": {...},
    "people": [...]
  }
}

Response: 200 OK
{
  "videos": [
    {
      "video_id": "uuid",
      "thumbnail_url": "...",
      "duration": 30,
      "matched_clips": [...]
    }
  ],
  "total": 25
}

Data Flow

Video Upload Flow (Hybrid)

Smart Glasses (Online):

  1. Smart Glass captures video → Local storage (temporary)
  2. Device → Upload Service (chunked upload) via Bluetooth/WiFi
  3. Upload Service → Blob Storage (S3) - direct upload with signed URL
  4. Upload Service → Message Queue (Kafka) - publish video-upload event
  5. Message Queue → Metadata Extraction Service
  6. Metadata Extraction Service processes:
    • Extract faces → Identify people
    • Detect objects/scenes
    • Generate embeddings
    • Extract location/time
  7. Metadata Extraction Service → Metadata Database (store metadata)
  8. Metadata Extraction Service → Vector Database (store embeddings)
  9. Response returned to smart glasses
  10. Smart glasses → Phone App (via Bluetooth) - notification of upload

Smart Glasses (Offline):

  1. Smart Glass captures video → Local storage (persistent)
  2. Video queued for upload with sync token
  3. When online → Resume upload flow above
  4. Phone App syncs when connected

Phone App Upload:

  1. User selects video in phone app
  2. Phone App → Upload Service (chunked upload)
  3. Upload Service → Blob Storage (S3)
  4. Rest of flow same as smart glasses
  5. Phone App receives notification when processing complete

Sync Flow (Phone App ↔ Cloud)

  1. Phone App checks sync status
  2. Upload pending videos from phone
  3. Download new videos/metadata from cloud
  4. Resolve conflicts (last-write-wins or merge)
  5. Update local cache
  6. Notify user of sync completion

Memory Video Creation Flow

  1. User speaks command → Device
  2. Device → Voice Processing Service (NLP)
  3. Voice Processing Service → Memory Video Service (create request)
  4. Memory Video Service → Vector Search Service (search videos by query)
  5. Vector Search Service → Vector Database (semantic search)
  6. Vector Search Service → Memory Video Service (return matching clips)
  7. Memory Video Service → Video Processing Service (trim and merge clips)
  8. Video Processing Service → Blob Storage (store memory video)
  9. Video Processing Service → Memory Video Service (update status)
  10. Memory Video Service → Device (return memory video URL)

Database Design

Schema Design

Users Table:

CREATE TABLE users (
    user_id VARCHAR(36) PRIMARY KEY,
    username VARCHAR(255) NOT NULL,
    email VARCHAR(255),
    subscription_tier VARCHAR(50),
    created_at TIMESTAMP,
    INDEX idx_email (email)
);

Videos Table:

CREATE TABLE videos (
    video_id VARCHAR(36) PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    file_url VARCHAR(512) NOT NULL,
    duration INT,
    size BIGINT,
    metadata JSON,
    created_at TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Memory Videos Table:

CREATE TABLE memory_videos (
    memory_id VARCHAR(36) PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    query_text TEXT NOT NULL,
    video_clips JSON,
    status ENUM('processing', 'completed', 'failed') DEFAULT 'processing',
    video_url VARCHAR(512),
    created_at TIMESTAMP,
    completed_at TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_status (status),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Video Clips Table:

CREATE TABLE video_clips (
    clip_id VARCHAR(36) PRIMARY KEY,
    video_id VARCHAR(36) NOT NULL,
    start_time INT NOT NULL,
    end_time INT NOT NULL,
    duration INT,
    tags JSON,
    embeddings JSON,
    INDEX idx_video_id (video_id),
    FOREIGN KEY (video_id) REFERENCES videos(video_id)
);

Database Sharding Strategy

Shard by User ID:

  • User data, videos, and memory videos on same shard
  • Enables efficient user queries
  • Use consistent hashing for distribution

Vector Database:

  • Use specialized vector DB (Pinecone, Weaviate, Milvus)
  • Partition by user_id for isolation
  • Optimize for similarity search

High-Level Design

High-Level Architecture (Hybrid)

┌─────────────────────────────────────────────────────────────┐
│                    Client Layer                              │
│  ┌──────────────────┐         ┌──────────────────┐        │
│  │  Smart Glasses   │◄──BT───►│   Phone App       │        │
│  │  - Voice Control │         │  - Management     │        │
│  │  - AR Display    │         │  - Viewing        │        │
│  │  - Local Storage │         │  - Offline Cache  │        │
│  └────────┬─────────┘         └────────┬─────────┘        │
└───────────┼─────────────────────────────┼───────────────────┘
            │                             │
            │ HTTPS / WebSocket            │ HTTPS / WebSocket
            │ (WiFi/Cellular)              │ (WiFi/Cellular)
            │                             │
┌───────────▼─────────────────────────────▼───────────────────┐
│              API Gateway / Load Balancer                      │
│         (Authentication, Rate Limiting, Routing)            │
└───────────┬─────────────────────────────────────────────────┘
            │
    ┌───────┴───────┐
    │               │
┌───▼──────┐  ┌─────▼──────┐
│ Read Path│  │ Write Path │
│ (Search) │  │ (Ingestion)│
└──────────┘  └────────────┘
    │               │
    │               │
┌───▼──────┐  ┌─────▼──────┐
│  Cache   │  │   Queue    │
│ (Redis)  │  │  (Kafka)    │
└──────────┘  └────────────┘
    │               │
┌───▼──────┐  ┌─────▼──────┐
│ Read DB  │  │ Write DB   │
│(Elastic- │  │ (Cassandra)│
│ search/  │  │            │
│Vector DB)│  │            │
└──────────┘  └────────────┘
    │               │
    └───────┬───────┘
            │
    ┌───────▼───────┐
    │ Blob Storage  │
    │ (S3/Azure Blob)│
    └───────────────┘
            │
    ┌───────▼───────┐
    │  Sync Service │
    │  (Phone App)  │
    └───────────────┘

Hybrid Architecture Components

  1. Smart Glasses
    • Primary capture device
    • Voice control interface
    • AR display
    • Local storage for offline operation
    • Bluetooth connection to phone app
  2. Phone App
    • Companion management app
    • Media viewing and management
    • Offline cache
    • Cloud sync coordinator
    • Push notifications
  3. Cloud Services
    • Backend processing
    • AI/ML services
    • Storage and metadata
    • Sync coordination

Architecture Principles

  1. CQRS Pattern: Separate read and write paths
  2. Event-Driven: Async processing for video operations
  3. Microservices: Independent scaling of components
  4. Caching: Multiple cache layers for performance
  5. Horizontal Scaling: All components scale independently

Deep Dive

Component Design

1. Smart Glasses

Responsibilities:

  • Voice command capture and processing
  • Media capture (photos/videos)
  • AR display of memories
  • Local storage for offline operation
  • Bluetooth communication with phone app

Key Features:

  • Voice recognition (on-device for basic commands, cloud for complex)
  • Real-time video preview
  • Background upload when online
  • Local storage (up to 10GB for offline videos)
  • Low-power operation
  • Bluetooth Low Energy (BLE) for phone app connection

Technology:

  • Embedded OS (custom or Android-based)
  • On-device ML models (lightweight)
  • Local SQLite for metadata cache
  • BLE stack for phone connectivity

Offline Operation:

  • Store videos locally when offline
  • Queue uploads with sync tokens
  • Resume uploads when online
  • Basic voice commands work offline

1.1 Phone App (Companion App)

Responsibilities:

  • Media viewing and management
  • Memory video creation (text/voice input)
  • Cloud sync coordination
  • Offline cache management
  • Push notification handling

Key Features:

  • Full media library browsing
  • Create memory videos via app
  • Offline viewing of cached content
  • Background sync with cloud
  • Conflict resolution for offline edits
  • Push notifications for completed processing

Technology:

  • Native apps (iOS Swift, Android Kotlin)
  • Local SQLite for offline cache
  • Background sync service
  • Push notification service (FCM/APNS)

Offline Support:

  • Cache recent videos (up to 5GB)
  • Cache metadata for offline search
  • Queue operations for sync
  • View cached content offline

Sync Strategy:

  • Incremental sync (only changed data)
  • Conflict resolution (last-write-wins)
  • Background sync every 15 minutes
  • Manual sync trigger
  • Sync status indicators

2. API Gateway

Responsibilities:

  • Request routing
  • Authentication and authorization
  • Rate limiting
  • Request/response transformation
  • Load balancing

Features:

  • JWT token validation
  • User quota management
  • Request throttling
  • API versioning

Technology:

  • AWS API Gateway
  • Azure API Management
  • Kong / Envoy

3. Sync Service (Phone App ↔ Cloud)

Responsibilities:

  • Coordinate sync between phone app and cloud
  • Handle offline uploads
  • Resolve conflicts
  • Manage sync tokens
  • Track sync status

Sync Flow:

Phone App (Offline) → Queue Operations → 
When Online → Sync Service → 
Upload Pending Videos → 
Download New Content → 
Resolve Conflicts → 
Update Local Cache

Conflict Resolution:

  • Last-write-wins for metadata
  • Merge for tags/annotations
  • User notification for conflicts
  • Manual resolution option

Sync Tokens:

  • Track sync state per device
  • Incremental sync (only changes)
  • Resume interrupted syncs
  • Handle concurrent syncs

Technology:

  • REST API for sync operations
  • WebSocket for real-time updates
  • Sync queue in phone app
  • Background sync service

4. Write Path (Write-Heavy)

3.1 Video Upload Service

Flow:

Smart Glass → Upload Service → Blob Storage → Metadata Extraction → Write DB

Process:

  1. Receive video upload (chunked upload for large files)
  2. Store video in blob storage (S3/Azure Blob)
  3. Trigger metadata extraction
  4. Insert metadata into write-optimized DB
  5. Queue video for processing

Optimizations:

  • Chunked uploads (resumable)
  • Compression before upload
  • Direct upload to blob storage (signed URLs)
  • Async metadata extraction

3.2 Metadata Extraction Service

Extracted Metadata:

  • Temporal: Timestamp, duration
  • Spatial: GPS coordinates, location name
  • People: Face detection, person identification
  • Objects: Scene detection, object recognition
  • Embeddings: Vector embeddings for semantic search
  • Audio: Speech-to-text, audio features
  • Video: Resolution, fps, codec

AI/ML Models:

  • Face recognition (AWS Rekognition, Azure Face API)
  • Object detection (YOLO, TensorFlow)
  • Scene classification (CNN models)
  • NLP embeddings (BERT, OpenAI embeddings)
  • Speech-to-text (Whisper, Google Speech)

Technology:

  • Microservice architecture
  • GPU clusters for ML inference
  • Batch processing for cost efficiency
  • Real-time processing for recent videos

3.3 Write Database

Requirements:

  • High write throughput (100M writes/day)
  • Scalable and distributed
  • Flexible schema for metadata
  • Fast ingestion

Database Choice: Cassandra / DynamoDB

Why:

  • Excellent write performance
  • Horizontal scaling
  • NoSQL flexibility for metadata
  • High availability

Schema Design (Cassandra):

CREATE TABLE video_metadata (
    video_id UUID PRIMARY KEY,
    user_id UUID,
    upload_timestamp TIMESTAMP,
    blob_url TEXT,
    duration_seconds INT,
    location_name TEXT,
    gps_lat DOUBLE,
    gps_lon DOUBLE,
    detected_faces LIST<UUID>,  -- Person IDs
    detected_objects LIST<TEXT>,
    scene_tags LIST<TEXT>,
    embedding_vector BLOB,  -- Vector embedding
    processing_status TEXT,
    created_at TIMESTAMP
);

CREATE INDEX ON video_metadata (user_id);
CREATE INDEX ON video_metadata (location_name);

Partitioning:

  • Partition by user_id for user queries
  • Replication factor: 3
  • Consistency level: QUORUM for writes

3.4 Video Processing Queue

Purpose:

  • Async video processing (trimming, merging)
  • Decouple upload from processing
  • Handle burst traffic
  • Retry failed processing

Queue Choice: Kafka

Why:

  • High throughput (millions of messages/second)
  • Message replayability
  • Multiple consumer groups
  • Long retention for reprocessing

Topics:

  • video-uploads: New video uploads
  • video-processing: Video trimming/merging jobs
  • metadata-updates: Metadata enrichment
  • memory-video-creation: Memory video generation requests

Message Format:

{
  "video_id": "uuid",
  "user_id": "uuid",
  "operation": "trim|merge|create_memory",
  "parameters": {
    "start_time": 10,
    "end_time": 30,
    "clip_ids": ["uuid1", "uuid2"]
  },
  "priority": "high|normal|low",
  "timestamp": "2024-01-01T00:00:00Z"
}

3.5 Video Processing Workers

Responsibilities:

  • Video trimming
  • Clip merging
  • Video encoding/transcoding
  • Thumbnail generation
  • Quality optimization

Technology:

  • FFmpeg for video processing
  • GPU acceleration (NVENC)
  • Containerized workers (Docker/Kubernetes)
  • Auto-scaling based on queue depth

Processing Pipeline:

Video Clip → Decode → Trim/Merge → Encode → Upload → Update Metadata

Optimization:

  • Parallel processing
  • GPU acceleration
  • Adaptive bitrate encoding
  • Caching intermediate results

5. Read Path (Read-Heavy)

4.1 Query Processing Service

Flow:

User Query → NLP Processing → Cache Check → Search DB → Fetch Videos → Process → Return

Natural Language Processing:

  1. Intent Recognition: Extract intent (create memory video)
  2. Entity Extraction: Extract entities (wife, Paris, date range)
  3. Query Expansion: Expand to related terms
  4. Vector Embedding: Convert to embedding vector

Example Query Processing:

Input: "Create a memory video of my wife and me in Paris"

Processing:
- Intent: CREATE_MEMORY_VIDEO
- Entities:
  - People: ["wife", "me"]
  - Location: "Paris"
  - Relationship: "wife" → person_id mapping
- Time: (optional, default: all time)
- Embedding: [0.123, 0.456, ...] (768-dim vector)

4.2 Cache Layer (Redis)

Purpose:

  • Cache hot query results
  • Cache frequently accessed metadata
  • Cache user-specific data
  • Reduce database load

Cache Strategy:

  1. Query Result Cache
    Key: user:{user_id}:query:{query_hash}
    Value: {video_ids: [...], metadata: {...}}
    TTL: 10 minutes
    
  2. Recent Memories Cache
    Key: user:{user_id}:recent:memories
    Value: List of recent memory video IDs
    TTL: 1 hour
    
  3. Metadata Cache
    Key: video:{video_id}:metadata
    Value: Video metadata JSON
    TTL: 1 hour
    
  4. User Profile Cache
    Key: user:{user_id}:profile
    Value: User profile, person mappings
    TTL: 24 hours
    

Cache Invalidation:

  • Invalidate on video upload
  • Invalidate on metadata update
  • TTL-based expiration
  • Manual invalidation API

4.3 Read Database - Search Engine

Database Choice: Elasticsearch

Why:

  • Full-text search capabilities
  • Vector search support (kNN)
  • Fast search performance
  • Horizontal scaling
  • Rich query DSL

Index Design:

{
  "mappings": {
    "properties": {
      "video_id": {"type": "keyword"},
      "user_id": {"type": "keyword"},
      "upload_timestamp": {"type": "date"},
      "location_name": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
      "gps": {"type": "geo_point"},
      "detected_faces": {"type": "keyword"},
      "detected_objects": {"type": "text"},
      "scene_tags": {"type": "text"},
      "embedding_vector": {
        "type": "dense_vector",
        "dims": 768,
        "index": true,
        "similarity": "cosine"
      },
      "duration_seconds": {"type": "integer"},
      "processing_status": {"type": "keyword"}
    }
  }
}

Search Queries:

  1. Text Search (location, tags):
    {
      "query": {
     "bool": {
       "must": [
         {"match": {"location_name": "Paris"}},
         {"terms": {"detected_faces": ["person_123", "person_456"]}}
       ],
       "filter": [
         {"term": {"user_id": "user_789"}},
         {"range": {"upload_timestamp": {"gte": "2024-01-01"}}}
       ]
     }
      }
    }
    
  2. Vector Search (semantic similarity):
    {
      "query": {
     "script_score": {
       "query": {"match_all": {}},
       "script": {
         "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0",
         "params": {"query_vector": [0.123, 0.456, ...]}
       }
     }
      }
    }
    
  3. Hybrid Search (text + vector):
    {
      "query": {
     "bool": {
       "should": [
         {"match": {"location_name": "Paris"}},
         {"script_score": {
           "script": {
             "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0",
             "params": {"query_vector": [...]}
           }
         }}
       ],
       "minimum_should_match": 1
     }
      }
    }
    

4.4 Vector Database (Alternative/Complementary)

Database Choice: Milvus / Pinecone / Weaviate

Why:

  • Optimized for vector search
  • Better performance for large-scale vector search
  • Advanced vector indexing (IVF, HNSW)
  • Can complement Elasticsearch

Use Cases:

  • Primary vector search for semantic queries
  • Person similarity search
  • Scene similarity search
  • Cross-modal search (text-to-video)

Integration:

  • Use for pure vector search queries
  • Elasticsearch for hybrid (text + vector) queries
  • Sync embeddings between systems

6. Memory Video Creation Service

Workflow:

User Query → Query Processing → Search Videos → Select Clips → 
Trim & Merge → Generate Video → Store → Return URL

Step-by-Step:

  1. Query Processing
    • Parse natural language query
    • Extract entities (people, location, time)
    • Generate search criteria
  2. Video Search
    • Search Elasticsearch/Vector DB
    • Filter by user, location, people, time
    • Rank by relevance
    • Select top N clips (10-20 clips)
  3. Clip Selection Algorithm
    def select_clips(videos, target_duration=120):
        # Sort by relevance score
        sorted_videos = sort_by_relevance(videos)
           
        # Select clips covering time range
        selected = []
        total_duration = 0
           
        for video in sorted_videos:
            # Extract best segment (e.g., 10-15 seconds)
            segment = extract_best_segment(video)
               
            if total_duration + segment.duration <= target_duration:
                selected.append(segment)
                total_duration += segment.duration
            else:
                break
           
        return selected
    
  4. Video Processing
    • Trim selected clips
    • Add transitions
    • Merge into single video
    • Add music/effects (optional)
    • Generate thumbnail
  5. Optimization
    • Cache common queries
    • Pre-generate popular memories
    • Use GPU acceleration
    • Parallel processing

Performance Optimization:

  • Caching: Cache common memory videos
  • Pre-computation: Pre-generate popular memories
  • Lazy Generation: Generate on-demand, cache result
  • Progressive Loading: Return partial results quickly

7. Blob Storage

Storage Choice: S3 / Azure Blob Storage

Organization:

s3://smart-glass-videos/
  ├── raw/
  │   └── {user_id}/
  │       └── {year}/{month}/{day}/
  │           └── {video_id}.mp4
  ├── processed/
  │   └── {user_id}/
  │       └── {video_id}/
  │           ├── 1080p.mp4
  │           ├── 720p.mp4
  │           └── thumbnail.jpg
  └── memories/
      └── {user_id}/
          └── {memory_id}.mp4

Features:

  • Lifecycle policies (move to cheaper storage)
  • CDN integration (CloudFront, Azure CDN)
  • Versioning for recovery
  • Encryption at rest
  • Cross-region replication

Optimization:

  • Use different storage tiers
  • Hot: Recent videos (S3 Standard)
  • Warm: Older videos (S3 Standard-IA)
  • Cold: Archived videos (S3 Glacier)

Detailed Design

Video Upload Flow (Hybrid)

Smart Glasses (Online):

1. Smart Glass captures video → Local temp storage
   ↓
2. Upload Service receives chunked upload (via WiFi/Cellular)
   ↓
3. Store in S3 (direct upload with signed URL)
   ↓
4. Publish to Kafka topic: video-uploads
   ↓
5. Metadata Extraction Service processes:
   - Extract faces → Identify people
   - Detect objects/scenes
   - Generate embeddings
   - Extract GPS, timestamp
   ↓
6. Insert metadata into Cassandra (write DB)
   ↓
7. Index metadata in Elasticsearch (read DB)
   ↓
8. Cache metadata in Redis
   ↓
9. Trigger video processing (trimming, encoding)
   ↓
10. Notify phone app via push notification

Smart Glasses (Offline):

1. Smart Glass captures video → Local persistent storage
   ↓
2. Store with sync token in local queue
   ↓
3. When online → Resume upload flow above
   ↓
4. Phone app syncs when connected

Phone App Upload:

1. User selects video in phone app
   ↓
2. Phone App → Upload Service (chunked upload)
   ↓
3. Rest of flow same as smart glasses
   ↓
4. Phone App receives push notification when complete

Sync Flow (Phone App ↔ Cloud)

1. Phone App checks sync status
   ↓
2. Upload pending videos (from offline queue)
   ↓
3. Download new videos/metadata from cloud
   ↓
4. Resolve conflicts:
   - Last-write-wins for metadata
   - Merge for tags
   - User notification for conflicts
   ↓
5. Update local cache
   ↓
6. Notify user of sync completion

Memory Video Creation Flow

1. User: "Create a memory video of my wife and me in Paris"
   ↓
2. API Gateway receives request
   ↓
3. Query Processing Service:
   - NLP: Extract "wife", "me", "Paris"
   - Map "wife" → person_id (from user profile)
   - Generate query embedding
   ↓
4. Check Redis cache:
   - Key: user:{user_id}:query:{hash}
   - If hit → return cached result
   ↓
5. If cache miss → Search Elasticsearch:
   - Filter: user_id, location="Paris", faces=[wife_id, user_id]
   - Vector search: semantic similarity
   - Return top 20 clips
   ↓
6. Fetch video metadata from Cassandra
   ↓
7. Select best clips (algorithm):
   - Rank by relevance
   - Select segments totaling ~2 minutes
   ↓
8. Video Processing:
   - Trim clips
   - Merge with transitions
   - Generate video
   ↓
9. Store in S3 (memories bucket)
   ↓
10. Update metadata, cache result
    ↓
11. Return video URL to user

Scalability & Performance

Read Scaling

Strategies:

  1. Read Replicas: Multiple Elasticsearch replicas
  2. Caching: Multi-layer caching (Redis, CDN)
  3. Sharding: Partition data by user_id
  4. CDN: Cache popular memory videos
  5. Query Optimization: Index optimization, query tuning

Metrics:

  • Elasticsearch: 10+ nodes, 3 replicas per shard
  • Redis: Cluster mode, 6+ nodes
  • CDN: Global edge locations

Write Scaling

Strategies:

  1. Horizontal Partitioning: Cassandra sharding by user_id
  2. Async Processing: Queue-based processing
  3. Batch Processing: Batch metadata updates
  4. Direct Upload: Signed URLs for direct S3 upload
  5. Worker Scaling: Auto-scale processing workers

Metrics:

  • Cassandra: 10+ nodes, replication factor 3
  • Kafka: 6+ brokers, 3 partitions per topic
  • Processing Workers: Auto-scale 10-1000 instances

Performance Targets

Operation Target Latency Throughput
Video Upload < 5s 100K uploads/sec
Metadata Search < 500ms 100K queries/sec
Memory Video Creation < 5s 10K creations/sec
Video Processing < 30s 10K videos/sec

Reliability & Durability

Data Durability

  1. Video Storage
    • S3: 99.999999999% (11 9’s) durability
    • Cross-region replication
    • Versioning enabled
  2. Metadata
    • Cassandra: Replication factor 3
    • Elasticsearch: Replica count 2
    • Regular backups
  3. Queue
    • Kafka: Replication factor 3
    • Message retention: 7 days
    • Idempotent producers

High Availability

  1. Multi-Region Deployment
    • Active-active regions
    • Data replication across regions
    • Failover mechanisms
  2. Health Checks
    • Service health endpoints
    • Database connectivity checks
    • Queue depth monitoring
  3. Circuit Breakers
    • Prevent cascade failures
    • Fallback mechanisms
    • Graceful degradation

Disaster Recovery

  1. Backup Strategy
    • Daily backups of metadata
    • S3 versioning for videos
    • Point-in-time recovery
  2. Recovery Procedures
    • RTO: 1 hour
    • RPO: 15 minutes
    • Automated failover

Security & Privacy

Authentication & Authorization

  1. User Authentication
    • OAuth 2.0 / JWT tokens
    • Multi-factor authentication
    • Device registration
  2. Authorization
    • User can only access own videos
    • Role-based access control
    • API key management

Data Privacy

  1. Encryption
    • Encryption at rest (AES-256)
    • Encryption in transit (TLS 1.3)
    • End-to-end encryption (optional)
  2. Face Recognition
    • On-device processing option
    • User consent for face recognition
    • GDPR compliance
  3. Data Retention
    • User-controlled retention
    • Automatic deletion policies
    • Right to deletion

Cost Optimization

Storage Costs

  1. Storage Tiers
    • Hot: Recent videos (S3 Standard)
    • Warm: 30-90 days (S3 Standard-IA)
    • Cold: Archive (S3 Glacier)
  2. Compression
    • Video compression (H.265)
    • Metadata compression
    • Efficient encoding

Compute Costs

  1. Processing Optimization
    • GPU acceleration (cheaper than CPU)
    • Batch processing
    • Spot instances for non-critical jobs
  2. Caching
    • Reduce database queries
    • CDN for video delivery
    • Cache hit ratio > 80%

Database Costs

  1. Right-Sizing
    • Appropriate instance types
    • Reserved instances for steady load
    • Auto-scaling

Monitoring & Observability

Key Metrics

  1. Performance Metrics
    • API latency (p50, p95, p99)
    • Video processing time
    • Search query latency
    • Cache hit ratio
  2. System Metrics
    • CPU, memory, disk usage
    • Queue depth
    • Database connection pool
    • Error rates
  3. Business Metrics
    • Videos uploaded per day
    • Memory videos created
    • Active users
    • Storage usage

Logging

  1. Structured Logging
    • JSON format
    • Correlation IDs
    • Centralized logging (ELK stack)
  2. Distributed Tracing
    • Request tracing across services
    • Performance profiling
    • Error tracking

Alerting

  1. Critical Alerts
    • Service downtime
    • High error rates
    • Storage capacity
    • Queue backlog

Technology Stack Summary

Component Technology Rationale
API Gateway AWS API Gateway / Kong Request routing, auth, rate limiting
Write DB Cassandra / DynamoDB High write throughput, scalability
Read DB Elasticsearch Full-text + vector search
Vector DB Milvus / Pinecone Optimized vector search
Cache Redis Cluster Fast in-memory caching
Queue Kafka High throughput, replayability
Blob Storage S3 / Azure Blob Scalable object storage
Video Processing FFmpeg + GPU Video encoding/transcoding
ML/AI AWS Rekognition / Custom Face recognition, object detection
NLP BERT / OpenAI Embeddings Natural language understanding
CDN CloudFront / Azure CDN Global content delivery

Future Enhancements

  1. Real-Time Collaboration
    • Shared memory videos
    • Collaborative editing
    • Real-time notifications
  2. Advanced AI Features
    • Automatic video editing
    • Music selection
    • Story generation
    • Emotion detection
  3. AR Integration
    • Overlay memories in AR view
    • Location-based memory triggers
    • Augmented reality previews
  4. Social Features
    • Share memory videos
    • Comments and reactions
    • Memory collections

What Interviewers Look For

CQRS Architecture Skills

  1. Read/Write Separation
    • Separate read and write paths
    • Appropriate database choices
    • Red Flags: Single database, no separation, poor performance
  2. Write Path Design
    • High write throughput
    • Cassandra for writes
    • Kafka for async processing
    • Red Flags: Wrong database, synchronous processing, bottlenecks
  3. Read Path Design
    • Fast search queries
    • Elasticsearch for reads
    • Redis caching
    • Red Flags: Wrong database, no caching, slow queries

Video Processing Skills

  1. Async Processing
    • Message queue for video operations
    • Worker pool design
    • Red Flags: Synchronous processing, blocking, poor UX
  2. Video Storage
    • S3 for object storage
    • Lifecycle policies
    • Red Flags: Wrong storage, no lifecycle, high costs
  3. GPU Acceleration
    • Video processing optimization
    • Cost-effective processing
    • Red Flags: No optimization, slow processing, high costs

NLP/Voice Processing Skills

  1. Voice Command Processing
    • Speech-to-text
    • Natural language understanding
    • Red Flags: No NLP, poor accuracy, slow processing
  2. Query Understanding
    • Intent extraction
    • Entity recognition
    • Red Flags: No understanding, poor queries, no accuracy

Problem-Solving Approach

  1. Workload Analysis
    • Read-heavy vs. write-heavy
    • Appropriate architecture
    • Red Flags: No analysis, wrong architecture, poor performance
  2. Edge Cases
    • Video processing failures
    • Search failures
    • Network issues
    • Red Flags: Ignoring edge cases, no handling
  3. Trade-off Analysis
    • Consistency vs. performance
    • Cost vs. features
    • Red Flags: No trade-offs, dogmatic choices

System Design Skills

  1. Component Design
    • Video service
    • Search service
    • NLP service
    • Red Flags: Monolithic, unclear boundaries
  2. Caching Strategy
    • Multi-layer caching
    • CDN for videos
    • Red Flags: No caching, poor strategy, slow delivery
  3. Scalability Design
    • Horizontal scaling
    • Independent scaling
    • Red Flags: Vertical scaling, bottlenecks, no scaling

Communication Skills

  1. CQRS Explanation
    • Can explain read/write separation
    • Understands benefits
    • Red Flags: No understanding, vague explanations
  2. Architecture Justification
    • Explains design decisions
    • Discusses alternatives
    • Red Flags: No justification, no alternatives

Meta-Specific Focus

  1. CQRS Expertise
    • Deep understanding of CQRS
    • Appropriate use cases
    • Key: Show CQRS expertise
  2. Workload-Aware Design
    • Understanding of read/write patterns
    • Appropriate architecture
    • Key: Demonstrate workload analysis skills

Conclusion

This smart glass system design addresses the unique challenges of handling both read-heavy search queries and write-heavy video processing workloads through:

  1. CQRS Architecture: Separating read and write paths
  2. Appropriate Database Choices: Cassandra for writes, Elasticsearch for reads
  3. Multi-Layer Caching: Redis for hot data, CDN for videos
  4. Async Processing: Kafka queue for video operations
  5. Horizontal Scaling: All components scale independently

Key Design Decisions:

  • Write Path: Cassandra + Kafka for high throughput
  • Read Path: Elasticsearch + Redis for fast search
  • Storage: S3 with lifecycle policies for cost optimization
  • Processing: Async workers with GPU acceleration

The system is designed to scale to millions of users while maintaining low latency for memory video creation and high reliability for video storage and processing.