Introduction
Smart glasses represent the next generation of augmented reality devices, combining advanced computer vision, AI, and voice control to create immersive experiences. This post explores the current smart glass architecture and designs a new feature that enables users to generate personalized memory videos through natural language voice commands.
Example Use Case: User says “Create a 2-minute video of me and my wife’s trip in Paris” and the system automatically finds, selects, and compiles relevant video clips into a polished memory video.
Smart Glass Specifications
Audio
Speaker:
- 2 custom-built open-ear speakers
- Loudness and bass: 76.1 dB(C)
Microphone:
- 6-mic system:
- 2 microphones in left arm
- 2 microphones in right arm
- 1 microphone near nose pad
- 1 contact microphone
Camera
Resolution:
- 12 MP ultra-wide camera
Image Acquisition:
- 3024 × 4032 pixels
Video Acquisition:
- 1440 × 1920 pixels @ 30 FPS
- 3x digital zoom
Battery
Frame Capacity:
- 960 mWh (248 mAh)
- Up to 6 hours mixed-use
Frame Charging Case:
- Up to 24 additional hours with fully charged case
Neural Band:
- Capacity: 134 mAh
- Usage time: Up to 18 hours
Water Resistance
- Frame: IPx4 (splash resistant)
- Neural Band: IPx7 (water resistant up to 1 meter)
Memory
Internal Storage:
- 32 GB Flash storage
- Stores up to 1,000 photos
- Stores 100+ 30-second videos
RAM:
- 2 GB LPDDR4x
Connectivity
Wi-Fi:
- Wi-Fi 6 certified
Bluetooth:
- Bluetooth 5.3 (Frame)
- Bluetooth 5.2 (Neural Band)
OS Compatibility:
- iOS 15.2 and above
- Android 10 minimum
Current Smart Glass Architecture
High-Level Architecture
┌─────────────────────────────────────────────────┐
│ Smart Glass Hardware │
│ (Cameras, Sensors, Display, Audio, Compute) │
└──────────────┬──────────────────────────────────┘
│
┌──────────────▼──────────────────────────────────┐
│ Smart Glass OS (AOSP-based) │
│ (Android Runtime, AR Framework, Services) │
└──────────────┬──────────────────────────────────┘
│
┌───────┴───────┐
│ │
┌──────▼──────┐ ┌─────▼──────┐
│ On-Device │ │ Cloud │
│ Services │ │ Services │
└─────────────┘ └─────────────┘
Hardware Components
1. Imaging System
- Camera: 12 MP ultra-wide camera
- Image Resolution: 3024 × 4032 pixels
- Video Resolution: 1440 × 1920 pixels @ 30 FPS
- Zoom: 3x digital zoom
- Storage Capacity: Up to 1,000 photos, 100+ 30-second videos
2. Display System
- Waveguide Display: Transparent AR display overlay
- Resolution: High-resolution micro-displays
- Field of View: Wide FOV for immersive AR experience
- Brightness: High brightness for outdoor use
3. Audio System
- Speakers: 2 custom-built open-ear speakers
- Audio Quality: 76.1 dB(C) loudness and bass
- Microphones: 6-mic system for voice capture and noise cancellation:
- 2 microphones in left arm
- 2 microphones in right arm
- 1 microphone near nose pad
- 1 contact microphone
- Spatial Audio: 3D audio processing capabilities
4. Compute Platform
- SoC: Custom mobile SoC optimized for AR/ML workloads
- AI Accelerator: Dedicated NPU for on-device ML inference
- Memory: 2 GB LPDDR4x RAM
- Storage: 32 GB Flash storage
- Battery:
- Frame: 960 mWh (248 mAh), up to 6 hours mixed-use
- Charging Case: Up to 24 additional hours
- Neural Band: 134 mAh, up to 18 hours
5. Connectivity
- Wi-Fi: Wi-Fi 6 certified
- Bluetooth: Bluetooth 5.3 (Frame), Bluetooth 5.2 (Neural Band)
- OS Support: iOS 15.2+, Android 10+
6. Sensors
- IMU: Accelerometer, gyroscope, magnetometer
- GPS: Location tracking
- Ambient Light: Adaptive display brightness
- Proximity: Hand/object detection
7. Water Resistance
- Frame: IPx4 (splash resistant)
- Neural Band: IPx7 (water resistant up to 1 meter)
Software Architecture
1. Smart Glass OS Layer
Base:
- Android-based OS (AOSP)
- Custom modifications for AR/VR
- Low-latency rendering pipeline
Core Services:
- AR Framework: SLAM (Simultaneous Localization and Mapping)
- Camera Service: Image/video capture and processing
- Audio Service: Voice capture, spatial audio
- Display Service: AR overlay rendering
- Sensor Service: Sensor fusion and tracking
2. On-Device Services
Media Capture Service:
- Real-time video recording
- Image capture
- Metadata extraction (GPS, timestamp, orientation)
- On-device compression
AR Service:
- SLAM tracking
- Object detection and tracking
- Plane detection
- Hand tracking
- Eye tracking
AI/ML Service:
- On-device ML inference
- Face detection and recognition
- Object detection
- Scene understanding
- Voice recognition (on-device)
3. Cloud Services
Media Storage:
- Video and image storage (blob storage)
- Metadata database
- Backup and sync
AI/ML Services:
- Advanced ML models (too large for device)
- Video analysis and understanding
- Natural language processing
- Content generation
4. Application Layer
Core Apps:
- Camera app
- AR apps
- Voice assistant
- Settings and configuration
Current System Components
1. Media Capture Pipeline
Flow:
Camera Hardware → Camera HAL → Camera Service → Media Framework → Storage
Components:
- Camera HAL: Hardware abstraction layer
- Camera Service: Manages camera sessions
- Media Framework: Encoding, compression
- Storage: Local storage + cloud sync
Features:
- 1440 × 1920 @ 30 FPS video recording
- Real-time stabilization
- HDR processing
- Low-light enhancement
- 3x digital zoom
2. AR Framework
SLAM System:
- Visual SLAM: Camera-based tracking
- IMU Fusion: Sensor fusion for accuracy
- Relocalization: Recovery from tracking loss
- Mapping: Dense/sparse mapping
Rendering Pipeline:
- Compositor: AR overlay composition
- Rendering Engine: 3D graphics rendering
- Display Driver: Waveguide display control
3. Voice Assistant
Hardware:
- 6-mic system for superior voice capture:
- 2 microphones in left arm
- 2 microphones in right arm
- 1 microphone near nose pad
- 1 contact microphone
- 2 custom-built open-ear speakers (76.1 dB(C) loudness and bass)
On-Device Components:
- Wake Word Detection: “Hey Meta” detection
- Voice Activity Detection: Detect when user is speaking
- Beamforming: Use 6-mic array for noise cancellation and directional audio
- On-Device NLU: Basic intent recognition
- Command Execution: Execute simple commands locally
Cloud Components:
- Advanced NLU: Complex natural language understanding
- Conversational AI: Multi-turn conversations
- Knowledge Graph: Entity recognition and resolution
4. Cloud Infrastructure
Storage:
- Blob Storage: Video/image storage (S3-like)
- Metadata DB: Structured metadata (Cassandra/PostgreSQL)
- Search Index: Full-text and semantic search (Elasticsearch)
Processing:
- Video Processing: Transcoding, analysis
- AI/ML Inference: Large model inference
- Content Generation: Video editing, effects
New Feature: Voice-Controlled Video Generation
Feature Requirements
User Command: “Create a 2-minute video of me and my wife’s trip in Paris”
Functional Requirements:
- Voice Command Recognition
- Understand natural language command
- Extract entities (people, location, time, duration)
- Parse intent (create memory video)
- Video Discovery
- Search user’s video library
- Filter by location (Paris)
- Filter by people (user + wife)
- Filter by time (trip dates)
- Video Selection
- Select best clips (10-20 clips)
- Rank by relevance, quality, diversity
- Ensure 2-minute total duration
- Video Generation
- Trim selected clips
- Add transitions
- Add music/effects
- Generate final 2-minute video
- Delivery
- Return video quickly (< 5 seconds)
- Store in user’s library
- Option to share
Non-Functional Requirements
- Latency: Generate video in 2-5 seconds
- Quality: High-quality output (1080p minimum)
- Scalability: Handle millions of users
- Reliability: 99.9% success rate
- Privacy: Process user data securely
System Architecture for New Feature
High-Level Flow (Hybrid Mode)
User Voice Command (Smart Glass)
↓
Voice Recognition (On-Device)
↓
NLP Processing (On-Device or Cloud)
↓
WiFi Direct Connection to Phone
↓
Transfer Request to Phone App
↓
Phone App: Video Search (Local + Cloud Metadata)
↓
Phone App: Fetch Videos/Images via WiFi Direct
↓
Phone App: Video Selection Algorithm
↓
Phone App: Video Processing Pipeline (On-Phone)
↓
Phone App: Generate 2-minute Video
↓
Transfer Final Video to Smart Glass via WiFi Direct
↓
Display on Smart Glass
Component Architecture (Hybrid Mode)
┌─────────────────────────────────────────┐
│ Smart Glass Device (Local) │
│ ┌───────────────────────────────────┐ │
│ │ Voice Capture & Wake Word │ │
│ │ On-Device NLU │ │
│ │ WiFi Direct Client │ │
│ │ Media Display │ │
│ └──────────────┬──────────────────────┘ │
└─────────────────┼───────────────────────┘
│
┌────────▼────────┐
│ WiFi Direct │
│ (P2P Connection)│
└────────┬────────┘
│
┌─────────────────▼───────────────────────┐
│ Phone App (Local Processing) │
│ ┌───────────────────────────────────┐ │
│ │ WiFi Direct Server │ │
│ │ Media Library Manager │ │
│ │ Video Search Engine │ │
│ │ Video Processing Engine │ │
│ │ (FFmpeg, GPU Acceleration) │ │
│ └──────────────┬──────────────────────┘ │
└─────────────────┼───────────────────────┘
│
┌────────┴────────┐
│ │
┌────▼────┐ ┌─────▼─────┐
│ Phone │ │ Cloud │
│ Storage │ │ Services │
│ (Local) │ │ (Optional)│
└─────────┘ └───────────┘
│
┌──────▼──────┐
│ Metadata │
│ Database │
│ (Cloud) │
└─────────────┘
Hybrid Mode Architecture Details
Primary Path: Phone-Based Processing (WiFi Direct)
Flow:
- Smart Glass captures voice command
- Smart Glass processes intent locally or sends to cloud for NLP
- Smart Glass establishes WiFi Direct connection to phone
- Phone App receives request and searches local media library
- Phone App fetches videos/images from phone storage (or cloud if needed)
- Phone App processes videos locally (trimming, merging, effects)
- Phone App generates final 2-minute video on phone
- Phone App transfers final video to Smart Glass via WiFi Direct
- Smart Glass displays video
Fallback Path: Cloud Processing
- If WiFi Direct unavailable → Use cloud processing
- If phone processing fails → Fallback to cloud
- If videos not on phone → Fetch from cloud storage
Storage Architecture: Local vs Remote
Storage Layers:
1. Local Storage (On Smart Glass Device)
- Purpose: Temporary cache, recent media, offline access
- Capacity: 32 GB Flash storage (stores up to 1,000 photos, 100+ 30-second videos)
- Use Cases:
- Recent videos/photos cache
- Offline viewing
- Quick access to frequently viewed content
- Temporary storage before cloud sync
2. Remote Storage (Cloud - S3/Azure Blob)
- Purpose: Primary storage, long-term archive, backup
- Capacity: Unlimited (scales with usage)
- Use Cases:
- All user videos and photos (primary storage)
- Processed/generated videos
- Long-term archive
- Cross-device sync
- Backup and disaster recovery
Storage Flow:
Smart Glass Device
│
├─→ Local Cache (Recent videos, metadata cache)
│ └─→ Limited capacity, fast access
│
└─→ Cloud Sync → Remote Blob Storage (S3)
└─→ Unlimited capacity, accessible from anywhere
Why Remote Storage (S3) is Used:
- Scalability: Handle petabytes of video data
- Durability: 99.999999999% (11 9’s) durability
- Accessibility: Access from any device, anywhere
- Cost: Cost-effective for large-scale storage
- Backup: Automatic backup and replication
- Processing: Cloud services can directly access videos for processing
Hybrid Approach:
- Local: Fast access, offline capability, recent content
- Remote: Unlimited storage, backup, cross-device sync, processing
- Sync: Automatic sync between local and remote
Local WiFi Transfer Options
Option 1: Direct WiFi Transfer (Phone ↔ Smart Glass)
Architecture:
Phone (WiFi Direct/Hotspot) ←→ Smart Glass (WiFi Client)
│ │
└──→ Local WiFi Network ───────┘
Implementation Approaches:
1. WiFi Direct (Peer-to-Peer)
- Technology: WiFi Direct (P2P)
- Setup: Direct connection between devices (no router needed)
- Speed: Up to 250 Mbps (WiFi 5) or 1+ Gbps (WiFi 6)
- Range: ~200 meters
- Use Case: Fast local transfer, no internet required
Flow:
1. Phone creates WiFi Direct group
2. Smart Glass discovers and connects
3. Phone acts as Group Owner (GO)
4. Transfer media files directly
5. No internet connection needed
Pros:
- ✅ Fast transfer speed
- ✅ No internet required
- ✅ Private (local only)
- ✅ No data charges
- ✅ Low latency
Cons:
- ❌ Requires both devices nearby
- ❌ Setup complexity
- ❌ Battery drain on both devices
2. Local WiFi Network (Same Network)
- Technology: Standard WiFi (802.11)
- Setup: Both devices on same WiFi network
- Speed: Depends on WiFi standard (WiFi 6: 1+ Gbps)
- Range: WiFi router range
- Use Case: Home/office sync, multiple devices
Flow:
1. Phone and Smart Glass connect to same WiFi router
2. Devices discover each other via mDNS/Bonjour
3. Establish direct connection (IP addresses)
4. Transfer media files over local network
5. No data leaves local network
Pros:
- ✅ Fast transfer (local network speed)
- ✅ No internet required
- ✅ Can sync multiple devices
- ✅ Works with existing WiFi
- ✅ Lower battery drain than WiFi Direct
Cons:
- ❌ Requires WiFi router
- ❌ Both devices must be on same network
- ❌ Router range limitation
3. Phone Hotspot Mode
- Technology: Phone creates WiFi hotspot
- Setup: Phone creates hotspot, Smart Glass connects
- Speed: Depends on phone’s WiFi capability
- Use Case: On-the-go sync, no router available
Flow:
1. Phone enables WiFi hotspot
2. Smart Glass connects to phone's hotspot
3. Phone and Smart Glass on same network
4. Transfer media files
5. Phone uses cellular data for internet (if needed)
Pros:
- ✅ Works anywhere (no router needed)
- ✅ Phone controls connection
- ✅ Can use cellular data for cloud sync
- ✅ Portable
Cons:
- ❌ Phone battery drain
- ❌ Slower than WiFi Direct
- ❌ May use cellular data (if enabled)
Implementation Details:
Discovery Protocol:
# mDNS/Bonjour for device discovery
import zeroconf
class DeviceDiscovery:
def discover_devices(self):
# Discover Smart Glass devices on local network
zeroconf.Zeroconf().browse_services(
"_metaglass._tcp.local.",
"_services._dns-sd._udp.local."
)
Transfer Protocol:
# HTTP server on phone, client on Smart Glass
# Or use WebRTC for peer-to-peer transfer
class LocalTransfer:
def transfer_media(self, phone_ip, files):
# Connect to phone's local HTTP server
for file in files:
download(f"http://{phone_ip}:8080/media/{file}")
Security:
- Pairing: Initial pairing via QR code or Bluetooth
- Encryption: TLS/SSL for transfer
- Authentication: Device certificates or tokens
- Authorization: User approval for transfers
Option 2: Hybrid Approach (Local + Cloud)
Smart Sync Strategy:
Priority 1: Local WiFi Transfer (if available)
↓ (if local transfer fails or unavailable)
Priority 2: Cloud Sync (via internet)
Benefits:
- Fast local transfer when devices are nearby
- Fallback to cloud when devices are apart
- Best of both worlds
Implementation:
class SmartSync:
def sync_media(self, files):
# Try local WiFi first
if self.is_local_wifi_available():
try:
self.transfer_via_local_wifi(files)
return "Local transfer successful"
except Exception as e:
# Fallback to cloud
pass
# Fallback to cloud sync
self.sync_via_cloud(files)
return "Cloud sync initiated"
Option 3: Background Sync Service
Continuous Sync:
- Background service on phone
- Automatically syncs new media when devices are on same network
- Incremental sync (only new/changed files)
- Low power consumption
Features:
- Auto-Discovery: Automatically find Smart Glass when on same network
- Incremental Sync: Only transfer new/changed files
- Background Processing: Sync without user intervention
- Bandwidth Management: Throttle during active use
- Resume Support: Resume interrupted transfers
Comparison of Options:
| Option | Speed | Setup | Internet Required | Battery Impact | Use Case |
|---|---|---|---|---|---|
| WiFi Direct | Very Fast | Medium | No | High | Fast one-time transfer |
| Local WiFi | Fast | Easy | No | Low | Regular sync at home |
| Phone Hotspot | Medium | Easy | Optional | Medium | On-the-go sync |
| Cloud Sync | Medium | Easy | Yes | Low | Always available |
Recommended Approach: Hybrid
Implementation Strategy:
- Primary: Local WiFi transfer when devices are on same network
- Fallback: Cloud sync when local transfer unavailable
- User Choice: Allow user to choose sync method
- Auto-Detection: Automatically detect best method
Example User Flow:
User opens Smart Glass app on phone
↓
App detects Smart Glass on local WiFi
↓
"Sync media via local WiFi?" → User confirms
↓
Fast local transfer (no internet, no data charges)
↓
Media available on Smart Glass immediately
Technical Implementation:
Phone Side (Server):
# Flask/FastAPI server on phone
from flask import Flask, send_file
app = Flask(__name__)
@app.route('/media/<filename>')
def get_media(filename):
return send_file(f'/storage/media/{filename}')
@app.route('/sync/initiate')
def initiate_sync():
# Return list of available media
return {'files': list_media_files()}
Smart Glass Side (Client):
# Client on Smart Glass
class MediaSyncClient:
def sync_from_phone(self, phone_ip):
# Discover phone
phone_url = f"http://{phone_ip}:8080"
# Get file list
files = requests.get(f"{phone_url}/sync/initiate").json()
# Download files
for file in files['files']:
download(f"{phone_url}/media/{file}")
Benefits of Local WiFi Transfer:
- ✅ Speed: Much faster than cloud (local network speed)
- ✅ Privacy: Data never leaves local network
- ✅ Cost: No data charges
- ✅ Reliability: Works offline
- ✅ Latency: Lower latency than cloud
- ✅ Bandwidth: Doesn’t consume internet bandwidth
Detailed Component Design
1. Voice Recognition Service
On-Device (Smart Glass):
Wake Word Detection:
- Always-on wake word detection (“Hey Meta”)
- Low-power DSP processing
- < 100ms latency
Voice Capture:
- 6-mic system:
- 2 microphones in left arm
- 2 microphones in right arm
- 1 microphone near nose pad
- 1 contact microphone
- Beamforming: Use mic array for directional audio capture
- Noise cancellation: Multi-mic noise reduction
- Voice activity detection: Detect when user is speaking
- Audio quality: Optimized for voice commands
On-Device NLU (Basic):
- Simple command recognition
- Intent classification (create_video, take_photo, etc.)
- Basic entity extraction
- Fallback to cloud for complex queries
Cloud NLU (Advanced):
Natural Language Understanding:
# Example: "Create a 2-minute video of me and my wife's trip in Paris"
Intent: CREATE_MEMORY_VIDEO
Entities:
- Duration: 2 minutes
- People: ["me", "wife"]
- Location: "Paris"
- Time: (infer from context - recent trip)
- Relationship: "wife" → person_id mapping
NLP Pipeline:
- Speech-to-Text: Convert audio to text
- Intent Recognition: Classify intent
- Entity Extraction: Extract entities (NER)
- Relationship Resolution: Map “wife” to person_id
- Query Generation: Generate search query
Technology:
- Speech-to-Text: Whisper, Google Speech-to-Text
- NLU: BERT-based models, GPT models
- Entity Recognition: spaCy, NER models
- Knowledge Graph: Resolve relationships
2. WiFi Direct Connection Service
On Smart Glass (Client):
WiFi Direct Client:
- Discover phone devices
- Connect to phone’s WiFi Direct group
- Maintain connection
- Handle reconnection
Implementation:
// Android WiFi Direct API
class WiFiDirectClient {
fun discoverPeers() {
val manager = context.getSystemService(Context.WIFI_P2P_SERVICE) as WifiP2pManager
manager.discoverPeers(channel, object : WifiP2pManager.ActionListener {
override fun onSuccess() {
// Peers discovered
}
override fun onFailure(reasonCode: Int) {
// Handle failure
}
})
}
fun connectToDevice(device: WifiP2pDevice) {
val config = WifiP2pConfig().apply {
deviceAddress = device.deviceAddress
}
manager.connect(channel, config, connectionListener)
}
}
On Phone App (Server):
WiFi Direct Server:
- Create WiFi Direct group
- Accept connections from Smart Glass
- Serve media files
- Handle video generation requests
Implementation:
class WiFiDirectServer {
fun createGroup() {
manager.createGroup(channel, object : WifiP2pManager.ActionListener {
override fun onSuccess() {
// Group created, phone is Group Owner
}
override fun onFailure(reasonCode: Int) {
// Handle failure
}
})
}
fun startMediaServer() {
// Start HTTP server for media transfer
val server = NanoHTTPD(8080)
server.start()
}
}
3. Phone App: Video Search Service
Search Strategy (Hybrid):
1. Local Search (Primary):
- Search phone’s local media library
- Use MediaStore API (Android)
- Filter by location, date, people
- Fast, no network required
2. Cloud Metadata Search (Optional):
- Query cloud metadata database for video locations
- Download videos not on phone if needed
- Sync metadata for better search
Phone App Implementation:
class PhoneVideoSearch {
fun searchLocalVideos(query: VideoQuery): List<Video> {
// Search local MediaStore
val projection = arrayOf(
MediaStore.Video.Media._ID,
MediaStore.Video.Media.DATA,
MediaStore.Video.Media.DATE_TAKEN,
MediaStore.Video.Media.DURATION
)
val selection = buildSelection(query)
val cursor = contentResolver.query(
MediaStore.Video.Media.EXTERNAL_CONTENT_URI,
projection,
selection,
null,
null
)
return cursor?.use { parseVideos(it) } ?: emptyList()
}
fun searchCloudMetadata(query: VideoQuery): List<VideoMetadata> {
// Query cloud metadata database
// Return metadata for videos (including those not on phone)
return cloudService.searchMetadata(query)
}
fun fetchVideoIfNeeded(metadata: VideoMetadata): File? {
// If video not on phone, download from cloud
if (!isVideoOnPhone(metadata.videoId)) {
return downloadFromCloud(metadata.blobUrl)
}
return getLocalVideoFile(metadata.videoId)
}
}
Local Metadata Storage (SQLite on Phone):
CREATE TABLE local_video_metadata (
video_id TEXT PRIMARY KEY,
file_path TEXT,
timestamp INTEGER,
location_name TEXT,
gps_lat REAL,
gps_lon REAL,
detected_faces TEXT, -- JSON array
duration_seconds INTEGER,
file_size INTEGER,
last_modified INTEGER
);
CREATE INDEX idx_location ON local_video_metadata(location_name);
CREATE INDEX idx_timestamp ON local_video_metadata(timestamp);
Search Query (Phone App):
fun searchVideos(location: String, people: List<String>, timeRange: TimeRange): List<Video> {
// 1. Search local videos first
val localVideos = searchLocalVideos(location, people, timeRange)
// 2. If not enough results, search cloud metadata
if (localVideos.size < MIN_RESULTS) {
val cloudMetadata = searchCloudMetadata(location, people, timeRange)
val cloudVideos = cloudMetadata.map { fetchVideoIfNeeded(it) }
return localVideos + cloudVideos.filterNotNull()
}
return localVideos
}
4. Phone App: Video Selection Algorithm
Location: Phone App (Local Processing)
Selection Criteria:
1. Relevance Score:
- Location match (Paris)
- People match (user + wife)
- Time match (trip dates)
- Semantic similarity
2. Quality Score:
- Video resolution
- Stability (no shake)
- Lighting quality
- Audio quality
3. Diversity Score:
- Different scenes/locations
- Different times of day
- Different activities
- Temporal distribution
Selection Algorithm (Phone App):
class VideoSelectionEngine {
fun selectClips(videos: List<Video>, targetDuration: Int = 120): List<VideoClip> {
// Score and rank videos
val scoredVideos = videos.map { video ->
val score = (
relevanceScore(video) * 0.5f +
qualityScore(video) * 0.3f +
diversityScore(video, selected) * 0.2f
)
Pair(video, score)
}.sortedByDescending { it.second }
// Select clips to fill duration
val selected = mutableListOf<VideoClip>()
var totalDuration = 0
for ((video, score) in scoredVideos) {
// Extract best segment (e.g., 10-15 seconds)
val segment = extractBestSegment(video, duration = 12)
if (totalDuration + segment.duration <= targetDuration) {
selected.add(segment)
totalDuration += segment.duration
} else {
// Fill remaining time
val remaining = targetDuration - totalDuration
if (remaining > 5) { // Minimum segment length
val finalSegment = extractSegment(video, duration = remaining)
selected.add(finalSegment)
}
break
}
}
return selected
}
private fun extractBestSegment(video: Video, duration: Int): VideoClip {
// Analyze video for interesting moments
val sceneChanges = detectSceneChanges(video)
val keyFrames = identifyKeyFrames(video)
// Extract segment around most interesting part
val bestStart = findBestStartTime(video, sceneChanges, keyFrames)
return VideoClip(video, bestStart, bestStart + duration)
}
}
Best Segment Extraction:
- Analyze video for interesting moments
- Detect scene changes
- Identify key frames
- Extract segments around highlights
5. Phone App: Video Processing Pipeline
Location: Phone App (Local Processing with GPU Acceleration)
Processing Steps:
1. Video Trimming:
- Extract selected segments from source videos
- Precise frame-level trimming using FFmpeg
- Maintain audio sync
- Use MediaCodec API for hardware acceleration
2. Transitions:
- Add smooth transitions between clips
- Fade in/out
- Crossfade
- Wipe transitions
3. Enhancement:
- Color correction
- Stabilization (if needed)
- Audio normalization
- Noise reduction
4. Music/Effects:
- Add background music (user preference or auto-select)
- Match music tempo to video pace
- Add sound effects (optional)
- Audio mixing
5. Final Assembly:
- Combine all segments
- Add title/credits (optional)
- Final encoding (H.264/H.265)
- Generate thumbnail
Processing Pipeline (Phone App):
class PhoneVideoProcessor {
private val ffmpeg = FFmpeg.getInstance(context)
suspend fun processVideo(clips: List<VideoClip>, config: VideoConfig): File {
// Step 1: Trim clips
val trimmedClips = clips.map { clip ->
trimVideo(clip.source, clip.startTime, clip.endTime)
}
// Step 2: Add transitions
val withTransitions = addTransitions(trimmedClips)
// Step 3: Enhance
val enhanced = enhanceVideo(withTransitions)
// Step 4: Add music
val withMusic = addMusic(enhanced, config.musicTrack)
// Step 5: Final assembly
val finalVideo = assembleVideo(withMusic)
return finalVideo
}
private suspend fun trimVideo(
source: File,
start: Long,
end: Long
): File {
// Use FFmpeg or MediaCodec for trimming
val output = File(context.cacheDir, "trimmed_${System.currentTimeMillis()}.mp4")
val command = arrayOf(
"-i", source.absolutePath,
"-ss", start.toString(),
"-t", (end - start).toString(),
"-c", "copy", // Fast copy mode
output.absolutePath
)
ffmpeg.execute(command)
return output
}
private suspend fun assembleVideo(clips: List<File>): File {
// Create concat file for FFmpeg
val concatFile = createConcatFile(clips)
val output = File(context.getExternalFilesDir(null), "final_video.mp4")
val command = arrayOf(
"-f", "concat",
"-safe", "0",
"-i", concatFile.absolutePath,
"-c", "copy",
output.absolutePath
)
ffmpeg.execute(command)
return output
}
}
Technology (Phone App):
- FFmpeg Mobile: FFmpeg for Android/iOS
- MediaCodec API: Hardware-accelerated encoding/decoding (Android)
- VideoToolbox: Hardware acceleration (iOS)
- GPU Acceleration: Use device GPU for processing
- ML Kit: On-device ML for scene detection (optional)
Performance Optimization:
- Hardware Acceleration: Use MediaCodec/VideoToolbox
- Parallel Processing: Process multiple clips in parallel
- Background Threading: Use coroutines/async tasks
- Memory Management: Stream processing for large videos
- Battery Optimization: Throttle processing to save battery
5. Caching and Optimization
Caching Strategy:
1. Query Result Cache:
Key: user:{user_id}:query:{query_hash}
Value: {video_ids: [...], metadata: {...}}
TTL: 10 minutes
2. Processed Video Cache:
Key: user:{user_id}:video:{video_hash}
Value: Processed video URL
TTL: 24 hours
3. Metadata Cache:
Key: video:{video_id}:metadata
Value: Video metadata JSON
TTL: 1 hour
Optimization:
- Pre-computation: Pre-generate popular memories
- Lazy Generation: Generate on-demand, cache result
- Progressive Loading: Return partial results quickly
- Parallel Processing: Process multiple segments in parallel
6. Phone App: Video Delivery Service
Delivery Flow (Hybrid Mode):
1. WiFi Direct Transfer (Primary):
Video Generated on Phone → Transfer via WiFi Direct → Smart Glass
↓
Display on Smart Glass immediately
↓
No cloud upload needed (local only)
2. Cloud Sync (Optional):
Video Generated on Phone → Upload to Cloud (background) →
Available on other devices
Phone App Implementation:
class VideoDeliveryService {
suspend fun deliverVideo(video: File, toMetaGlass: Boolean) {
if (toMetaGlass) {
// Transfer via WiFi Direct
wifiDirectService.sendFile(video, metaGlassAddress)
}
// Optionally sync to cloud in background
lifecycleScope.launch {
cloudService.uploadVideo(video)
}
}
}
Data Flow: Complete Example (Hybrid Mode)
Scenario: “Create a 2-minute video of me and my wife’s trip in Paris”
Step 1: Voice Capture (Smart Glass)
User: "Hey Meta, create a 2-minute video of me and my wife's trip in Paris"
↓
Wake word detected → Voice capture → On-device NLU or Cloud NLP
Step 2: NLP Processing (Smart Glass or Cloud)
Speech-to-Text: "create a 2-minute video of me and my wife's trip in Paris"
↓
Intent Recognition: CREATE_MEMORY_VIDEO
↓
Entity Extraction:
- Duration: 120 seconds
- People: ["me", "wife"]
- Location: "Paris"
- Time: (infer from recent trip dates)
↓
Map "wife" → person_id (from user profile)
Step 3: WiFi Direct Connection (Smart Glass → Phone)
Smart Glass discovers phone via WiFi Direct
↓
Establish P2P connection
↓
Send video generation request to phone app
↓
Request: {
intent: "CREATE_MEMORY_VIDEO",
duration: 120,
location: "Paris",
people: [user_id, wife_person_id],
timeRange: [trip_start, trip_end]
}
Step 4: Video Search (Phone App)
Phone App receives request
↓
Search Local Media Library:
- Query MediaStore for videos in Paris
- Filter by date range (trip dates)
- Filter by detected faces (user + wife)
↓
If not enough local videos:
- Query cloud metadata database
- Download missing videos from cloud
↓
Return: 50 candidate videos (local + cloud)
Step 5: Video Selection (Phone App)
Phone App: Score videos:
- Relevance: Location, people, time match
- Quality: Resolution, stability, lighting
- Diversity: Different scenes, times
↓
Select top 15-20 clips
↓
Extract best segments (10-15 seconds each)
↓
Total duration: ~120 seconds
Step 6: Video Processing (Phone App)
Phone App: Trim selected segments
- Use FFmpeg or MediaCodec (hardware accelerated)
↓
Phone App: Add transitions between clips
↓
Phone App: Enhance video (color, stabilization)
↓
Phone App: Add background music
↓
Phone App: Assemble final video
↓
Phone App: Encode to H.264 (1080p)
↓
Generated video stored on phone
Step 7: Video Delivery (Phone → Smart Glass via WiFi Direct)
Phone App: Transfer final video to Smart Glass
- Via WiFi Direct (fast local transfer)
- No internet required
↓
Smart Glass: Receive video
↓
Smart Glass: Display video in UI
↓
Optional: Phone App uploads to cloud in background
Total Time: 2-5 seconds (processing on phone, transfer via WiFi Direct)
Timing and Data Size Analysis
Step-by-Step Timing Breakdown
Scenario: Generate 2-minute video from 50 candidate videos
| Step | Operation | Location | Time | Notes |
|---|---|---|---|---|
| 1 | Voice Capture | Smart Glass | 50-200ms | Audio capture duration |
| 2 | Voice Recognition | Smart Glass/Cloud | 100-500ms | On-device: 100ms, Cloud: 500ms |
| 3 | NLP Processing | Smart Glass/Cloud | 200-800ms | Intent + entity extraction |
| 4 | WiFi Direct Connection | Smart Glass ↔ Phone | 500-2000ms | Discovery + connection setup |
| 5 | Request Transfer | Smart Glass → Phone | 10-50ms | Small JSON payload |
| 6 | Video Search (Local) | Phone | 100-500ms | MediaStore query |
| 7 | Video Search (Cloud) | Phone → Cloud | 200-1000ms | If needed, metadata query |
| 8 | Video Selection | Phone | 50-200ms | Algorithm execution |
| 9 | Video Processing | Phone | 1000-3000ms | Trimming, merging, encoding |
| 10 | Video Transfer | Phone → Smart Glass | 200-1000ms | WiFi Direct transfer |
| 11 | Display | Smart Glass | 50-100ms | Video playback start |
Total Time: 2.3-9.2 seconds
Optimized Path (Best Case):
- On-device NLP: 100ms
- Fast WiFi Direct: 500ms
- Local search only: 100ms
- Optimized processing: 1000ms
- Fast transfer: 200ms
- Total: ~1.9 seconds
Worst Case (Cloud Fallback):
- Cloud NLP: 800ms
- Slow WiFi Direct: 2000ms
- Cloud search: 1000ms
- Complex processing: 3000ms
- Slow transfer: 1000ms
- Total: ~7.8 seconds
Detailed Timing Analysis
1. Voice Recognition Timing
On-Device Processing:
- Wake word detection: 50-100ms
- Voice capture: 100-200ms (depends on command length)
- On-device NLU: 100-300ms
- Subtotal: 250-600ms
Cloud Processing (Fallback):
- Voice capture: 100-200ms
- Upload audio: 200-500ms (depends on network)
- Cloud NLP: 500-1000ms
- Subtotal: 800-1700ms
2. WiFi Direct Connection Timing
Connection Setup:
- Device discovery: 200-1000ms
- Pairing/authentication: 100-500ms
- Connection establishment: 200-500ms
- Subtotal: 500-2000ms
Optimization:
- Maintain persistent connection: 0ms (reuse)
- Auto-reconnect: 100-300ms
- Connection pooling: Reduce setup time
3. Video Search Timing
Local Search (Phone MediaStore):
- Query MediaStore: 50-200ms
- Filter by location: 50-100ms
- Filter by date: 50-100ms
- Filter by faces: 100-200ms (if metadata cached)
- Subtotal: 250-600ms
Cloud Metadata Search (If Needed):
- Network request: 100-300ms
- Database query: 50-200ms
- Response processing: 50-100ms
- Subtotal: 200-600ms
Video Download (If Not on Phone):
- Download 50MB video: 500-2000ms (depends on connection)
- Can be parallelized or skipped if using local videos only
4. Video Selection Timing
Algorithm Execution:
- Score calculation: 10-50ms per video
- 50 videos: 500-2500ms (can be parallelized)
- Sort and select: 10-50ms
- Subtotal: 510-2550ms
Optimization:
- Parallel scoring: Reduce to 100-500ms
- Early termination: Stop when enough clips found
- Cached scores: Reuse previous calculations
5. Video Processing Timing
Per Video Clip Processing:
Trimming (15 clips × 12 seconds each):
- Single clip trim: 50-200ms (hardware accelerated)
- 15 clips sequential: 750-3000ms
- 15 clips parallel: 200-500ms (optimized)
- Subtotal: 200-3000ms
Transitions:
- Add transitions: 100-300ms
- Subtotal: 100-300ms
Enhancement:
- Color correction: 50-200ms
- Stabilization: 100-500ms (if needed)
- Audio normalization: 50-100ms
- Subtotal: 200-800ms
Music Addition:
- Audio mixing: 100-300ms
- Subtotal: 100-300ms
Final Assembly:
- Concatenate clips: 200-500ms
- Encode final video: 500-1500ms (hardware accelerated)
- Generate thumbnail: 50-100ms
- Subtotal: 750-2100ms
Total Processing Time: 1350-6500ms
Optimized Processing (Parallel + Hardware):
- Parallel trimming: 200ms
- Transitions: 100ms
- Enhancement: 200ms
- Music: 100ms
- Assembly: 500ms
- Total: ~1100ms
6. Video Transfer Timing
WiFi Direct Transfer:
2-minute video (1080p, H.264):
- Video size: ~30-60MB (depending on bitrate)
- WiFi Direct speed: 100-500 Mbps (typical)
- Transfer time: 480-4800ms (30MB at 500Mbps = 480ms)
- Subtotal: 500-5000ms
Optimization:
- Compression: Reduce to 20MB → 320ms
- Progressive transfer: Start playback while transferring
- Quality adjustment: Lower quality for faster transfer
Data Size Analysis
Input Data Sizes
Source Videos:
- Smart Glass native format: 1440 × 1920 @ 30 FPS
- Average video: 50-60MB (1440×1920, 30 seconds, H.264)
- 50 candidate videos: 2.5-3GB total
- 15 selected clips: 750-900MB (15 × 50-60MB)
Metadata:
- Per video metadata: ~2-5KB (JSON)
- 50 videos metadata: 100-250KB
- Search index: Negligible (in-memory)
Request Payload:
- Voice command: 10-50KB (audio file)
- NLP request: 1-5KB (JSON)
- Video search request: 1-2KB (JSON)
- Total request: 12-57KB
Processing Data Sizes
Intermediate Files:
Trimmed Clips:
- Per clip: 10-25MB (12 seconds, 1440×1920)
- 15 clips: 150-375MB
- Storage: Temporary (cleaned after processing)
Enhanced Clips:
- Per clip: 12-25MB (after enhancement)
- 15 clips: 180-375MB
- Storage: Temporary
Final Video:
- 2-minute video (1440×1920 @ 30 FPS - Smart Glass native): 30-60MB
- 2-minute video (1080p): 30-60MB
- 2-minute video (720p): 15-30MB
- Recommended: 1440×1920 @ 30 FPS (30-60MB) - matches Smart Glass native resolution
Transfer Data Sizes
WiFi Direct Transfer:
Request (Smart Glass → Phone):
- Size: 1-5KB (JSON)
- Time: 10-50ms
Video Transfer (Phone → Smart Glass):
- Size: 30-60MB (final video)
- Time: 500-5000ms (depends on WiFi Direct speed)
- Bandwidth: 100-500 Mbps typical
Cloud Transfer (If Used):
- Upload request: 1-5KB
- Download videos: 50MB per video (if not on phone)
- Upload final video: 30-60MB
Storage Requirements
On Phone:
Source Videos:
- Average user: 10GB-100GB
- Power user: 100GB-500GB
- Storage: Phone internal storage or SD card
Processed Videos:
- Per generated video: 30-60MB
- 10 videos: 300-600MB
- 100 videos: 3-6GB
- Storage: Phone storage (can be cleaned periodically)
Metadata (SQLite):
- Per video: 2-5KB
- 1000 videos: 2-5MB
- Storage: Negligible
On Smart Glass:
Cached Videos:
- Total storage: 32 GB Flash storage
- Available for media: ~24-27GB (after system/OS)
- Recent videos: 1-5GB
- Metadata cache: 10-50MB
- Capacity: Stores up to 1,000 photos, 100+ 30-second videos
Cloud Storage (Optional):
Backup Storage:
- Per user average: 50GB-200GB
- Per user power user: 200GB-1TB
- Storage: S3/Azure Blob (scalable)
Bandwidth Analysis
WiFi Direct Bandwidth
Smart Glass Wi-Fi Specification:
- Wi-Fi 6 certified (802.11ax)
- Supports WiFi Direct over Wi-Fi 6
Theoretical Speeds:
- WiFi 5 (802.11ac): Up to 433 Mbps (single stream)
- WiFi 6 (802.11ax): Up to 1.2 Gbps (Smart Glass supports this)
- WiFi 6E: Up to 1.2 Gbps (6GHz band)
Practical Speeds (WiFi Direct with Wi-Fi 6):
- Typical: 100-500 Mbps
- Optimal conditions: 500-1000 Mbps (Wi-Fi 6 advantage)
- Poor conditions: 50-200 Mbps
Transfer Time for 2-minute Video:
| Video Size | WiFi Speed | Transfer Time |
|---|---|---|
| 30MB | 100 Mbps | 2.4 seconds |
| 30MB | 500 Mbps | 0.48 seconds |
| 60MB | 100 Mbps | 4.8 seconds |
| 60MB | 500 Mbps | 0.96 seconds |
Cloud Bandwidth (Fallback)
Upload (Phone → Cloud):
- Cellular 4G: 10-50 Mbps
- Cellular 5G: 100-1000 Mbps
- WiFi: 50-500 Mbps
Download (Cloud → Phone):
- Cellular 4G: 20-100 Mbps
- Cellular 5G: 200-2000 Mbps
- WiFi: 100-1000 Mbps
Performance Targets
Latency Targets:
| Operation | Target | Current (Optimized) |
|---|---|---|
| Voice Recognition | < 500ms | 100-300ms |
| WiFi Direct Connection | < 1s | 500ms (reused) |
| Video Search | < 500ms | 100-500ms |
| Video Selection | < 500ms | 50-200ms |
| Video Processing | < 3s | 1-2s |
| Video Transfer | < 2s | 0.5-1s |
| End-to-End | < 5s | 2-3s |
Throughput Targets:
| Operation | Target | Achieved |
|---|---|---|
| Video Search | 1000 videos/sec | 2000+ videos/sec (local) |
| Video Processing | 1 video/min | 2-3 videos/min (phone) |
| WiFi Direct Transfer | 100 Mbps | 100-500 Mbps |
Resource Usage Analysis
Phone Resource Usage
CPU Usage:
- Video processing: 50-80% (during processing)
- Idle: 5-10%
- Peak: 80-100% (intensive processing)
Memory Usage:
- App memory: 200-500MB
- Video processing: +500MB-2GB (temporary)
- Total peak: 700MB-2.5GB
Battery Impact:
- Video processing: 5-15% battery per video
- WiFi Direct: 2-5% battery per transfer
- Total per request: 7-20% battery
Storage Usage:
- Source videos: 10-100GB (user dependent)
- Processed videos: 300MB-6GB (cache)
- App data: 100-500MB
Smart Glass Resource Usage
CPU Usage:
- Voice recognition: 10-20%
- WiFi Direct: 5-10%
- Video playback: 20-40%
Memory Usage:
- Total RAM: 2 GB LPDDR4x
- App memory: 100-300MB
- Video cache: 200-500MB (limited by 2GB RAM)
- OS and system: ~500-800MB
- Available for apps: ~700MB-1.2GB
Storage Usage:
- Total storage: 32 GB Flash storage
- System/OS: ~5-8GB
- Available for media: ~24-27GB
- Video cache: 1-5GB (stores up to 1,000 photos, 100+ 30-second videos)
- Metadata cache: 10-50MB
Battery Impact:
- Frame battery: 960 mWh (248 mAh), up to 6 hours mixed-use
- Voice command: 1-2% battery (~6-12 minutes)
- WiFi Direct: 2-5% battery (~12-30 minutes)
- Video playback: 3-5% battery per minute
- Total per request: 6-12% battery (~36-72 minutes of usage)
- With charging case: Up to 24 additional hours
Optimization Strategies
Timing Optimization
1. Parallel Processing:
- Process multiple clips simultaneously
- Savings: 3-5x faster processing
2. Hardware Acceleration:
- Use MediaCodec/VideoToolbox
- Savings: 2-3x faster encoding
3. Caching:
- Cache search results
- Cache processed videos
- Savings: 50-90% faster for repeated queries
4. Progressive Processing:
- Start transfer while processing
- Stream results as available
- Savings: Perceived latency reduced
Data Size Optimization
1. Video Compression:
- Use H.265 (HEVC) instead of H.264
- Savings: 30-50% smaller files
2. Adaptive Quality:
- Lower quality for faster transfer
- Higher quality for final storage
- Savings: 50-70% faster transfer
3. Selective Download:
- Only download needed videos
- Skip videos already on phone
- Savings: Reduce transfer by 80-90%
4. Incremental Sync:
- Only sync new/changed videos
- Savings: Reduce sync time by 90%+
Cost Analysis
Phone Processing Costs
Battery Cost:
- Per video generation: 7-20% battery
- Daily usage (10 videos): 70-200% battery (requires charging)
Storage Cost:
- Phone storage: User-owned (no cost)
- SD card: Optional, $50-200 for 256GB-1TB
Processing Cost:
- CPU/GPU usage: No direct cost (device owned)
- Wear on device: Minimal (modern devices handle it well)
Cloud Costs (If Used)
Storage Costs:
- S3 Standard: $0.023/GB/month
- 100GB/user: $2.30/month
- 1M users: $2.3M/month
Processing Costs:
- EC2 instances: $0.10-0.50/hour
- Per video: $0.001-0.01 (if processed in cloud)
- 1M videos/month: $1K-10K/month
Transfer Costs:
- Data transfer: $0.09/GB (outbound)
- Per video (60MB): $0.0054
- 1M videos: $5.4K/month
Total Cloud Cost (1M users):
- Storage: $2.3M/month
- Processing: $1K-10K/month
- Transfer: $5.4K/month
- Total: ~$2.3M/month
Hybrid Mode Savings:
- Processing on phone: Save $1K-10K/month
- Local transfer: Save $5.4K/month
- Reduced storage: Save 50-80% (only backup needed)
- Total savings: Significant cost reduction
Benefits of Hybrid Mode (Phone Processing + WiFi Direct)
Advantages of Phone-Based Processing
1. Privacy
- ✅ Videos processed locally on phone
- ✅ No video data sent to cloud (unless user chooses)
- ✅ User has full control over data
- ✅ Meets privacy regulations (GDPR, etc.)
2. Performance
- ✅ Faster processing (no network latency)
- ✅ WiFi Direct transfer is very fast (1+ Gbps)
- ✅ Lower latency than cloud processing
- ✅ No dependency on internet connection
3. Cost
- ✅ No cloud processing costs
- ✅ No data transfer charges
- ✅ Reduced cloud storage usage
- ✅ Lower infrastructure costs
4. Reliability
- ✅ Works offline (no internet needed)
- ✅ No cloud service dependencies
- ✅ More reliable for local processing
- ✅ Better user experience
5. Scalability
- ✅ Processing distributed across user devices
- ✅ No cloud processing bottleneck
- ✅ Scales naturally with user base
- ✅ Reduced cloud infrastructure needs
When Cloud Processing is Used (Fallback)
Fallback Scenarios:
- WiFi Direct unavailable (devices too far apart)
- Phone processing fails (insufficient resources)
- Videos not on phone (need to fetch from cloud)
- Complex processing requiring cloud ML models
- User preference for cloud processing
Hybrid Strategy:
Try Phone Processing First (WiFi Direct)
↓ (if fails or unavailable)
Fallback to Cloud Processing
↓
Sync result back to phone/Smart Glass
Performance Optimization
Phone Processing Optimization
1. Hardware Acceleration:
- Use MediaCodec API (Android) for hardware encoding/decoding
- Use VideoToolbox (iOS) for hardware acceleration
- GPU acceleration for video processing
- NPU acceleration for ML inference (if available)
2. Parallel Processing:
- Process multiple video segments in parallel
- Use coroutines/async tasks for concurrent operations
- Background threading to avoid blocking UI
3. Memory Management:
- Stream processing for large videos
- Reuse buffers to reduce allocations
- Clear temporary files after processing
- Monitor memory usage
4. Battery Optimization:
- Throttle processing when battery is low
- Process during charging when possible
- Optimize CPU/GPU usage
- Background processing with low priority
5. Caching:
- Cache processed videos on phone
- Cache search results
- Cache metadata locally
- Avoid reprocessing same videos
WiFi Direct Optimization
1. Connection Management:
- Maintain persistent connection
- Auto-reconnect on disconnect
- Connection pooling
- Efficient discovery
2. Transfer Optimization:
- Chunked file transfer
- Compression for transfer
- Resume interrupted transfers
- Parallel transfers for multiple files
3. Bandwidth Management:
- Prioritize video transfer
- Throttle during active use
- Background transfer when idle
- Adaptive quality based on connection
Scalability (Hybrid Mode)
Distributed Processing:
- Processing distributed across user phones
- No central processing bottleneck
- Scales naturally with user base
- Reduced cloud infrastructure
Cloud Fallback:
- Cloud processing for complex cases
- Auto-scale cloud workers when needed
- Load balancing for cloud services
- CDN for video delivery (when using cloud)
Privacy and Security
Data Privacy
On-Device Processing:
- Process sensitive data locally when possible
- Minimize data sent to cloud
- Encrypt data in transit
User Consent:
- Explicit consent for video processing
- User controls for data sharing
- Right to delete data
Data Encryption:
- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- End-to-end encryption (optional)
Security
Authentication:
- OAuth 2.0 / JWT tokens
- Device authentication
- Biometric authentication
Authorization:
- User can only access own videos
- Role-based access control
- API key management
Monitoring and Observability
Key Metrics
Performance:
- Voice recognition latency
- Video search latency
- Video generation time
- End-to-end latency
Quality:
- Video generation success rate
- User satisfaction (ratings)
- Video quality metrics
- Search relevance
System:
- Request rate
- Error rate
- Resource utilization
- Queue depth
Logging
Structured Logging:
- All requests logged with correlation IDs
- Error tracking and alerting
- Performance profiling
- User behavior analytics
Technology Stack (Hybrid Mode)
| Component | Technology | Location | Rationale |
|---|---|---|---|
| Voice Recognition | On-Device ML, Whisper | Smart Glass/Phone | Low latency, privacy |
| NLP | On-Device NLU, Cloud BERT/GPT | Smart Glass/Cloud | Hybrid processing |
| WiFi Direct | Android WiFi P2P API | Phone ↔ Smart Glass | Fast local transfer |
| Video Search | MediaStore API, SQLite | Phone (Local) | Fast local search |
| Metadata DB | SQLite (Phone), Cassandra (Cloud) | Phone/Cloud | Local + cloud hybrid |
| Video Processing | FFmpeg Mobile, MediaCodec | Phone | Hardware acceleration |
| Storage | Phone Storage, S3 (Optional) | Phone/Cloud | Local primary, cloud backup |
| Cache | In-Memory Cache (Phone) | Phone | Fast local caching |
Future Enhancements
- Real-Time Preview
- Show preview while generating on phone
- Allow user to adjust selection
- Interactive editing
- Advanced AI Features
- On-device ML for scene detection
- Automatic music selection (on phone)
- Story generation
- Emotion detection
- AR Integration
- Preview video in AR overlay
- Share in AR space
- Collaborative editing
- Personalization
- Learn user preferences (on device)
- Custom video styles
- Personalized music selection
- Offline Mode
- Full functionality without internet
- Sync when online
- Background sync
What Interviewers Look For
Hybrid Architecture Skills
- Phone-Cloud Hybrid Design
- Phone-based processing (primary)
- Cloud fallback
- WiFi Direct for local transfer
- Red Flags: Cloud-only, no fallback, poor local processing
- Local-First Processing
- Privacy-preserving
- Performance optimization
- Offline support
- Red Flags: No local processing, always cloud, privacy issues
- Device Coordination
- Smart Glass ↔ Phone communication
- WiFi Direct setup
- Sync strategies
- Red Flags: Poor coordination, no sync, connection issues
Video Processing Skills
- Video Search & Selection
- Semantic search
- Clip selection algorithm
- Relevance ranking
- Red Flags: No search, poor selection, irrelevant clips
- Video Processing Pipeline
- FFmpeg processing
- Hardware acceleration
- Phone resource management
- Red Flags: Slow processing, no acceleration, resource issues
- Memory Video Generation
- Clip trimming
- Merging with transitions
- 2-minute target duration
- Red Flags: Poor quality, wrong duration, no transitions
Problem-Solving Approach
- Privacy & Performance Balance
- Local processing for privacy
- Performance optimization
- Red Flags: No privacy consideration, poor performance
- Edge Cases
- Phone unavailable
- WiFi Direct failures
- Processing failures
- Red Flags: Ignoring edge cases, no handling
- Trade-off Analysis
- Phone resources vs performance
- Local vs cloud
- Red Flags: No trade-offs, dogmatic choices
System Design Skills
- Component Design
- Voice recognition service
- Video search service
- Processing service
- Red Flags: Monolithic, unclear boundaries
- Connectivity Design
- WiFi Direct setup
- Fallback mechanisms
- Red Flags: Poor connectivity, no fallback, unreliable
- Resource Management
- Phone CPU/GPU usage
- Battery optimization
- Storage management
- Red Flags: High resource usage, poor battery, storage issues
Communication Skills
- Hybrid Architecture Explanation
- Can explain phone-cloud hybrid
- Understands trade-offs
- Red Flags: No understanding, vague explanations
- Processing Pipeline Explanation
- Can explain video processing
- Understands optimization
- Red Flags: No understanding, vague
Meta-Specific Focus
- Hybrid Systems Expertise
- Local-cloud architecture
- Privacy-first design
- Key: Show hybrid systems expertise
- Edge Computing Knowledge
- Phone processing
- Resource optimization
- Key: Demonstrate edge computing knowledge
Conclusion
This hybrid mode system design enables Smart Glass users to generate personalized memory videos through natural language voice commands, with processing primarily happening on the user’s phone via WiFi Direct connection.
Key Architecture Decisions:
- Hybrid Processing: Phone-based processing (primary) + Cloud fallback
- WiFi Direct: Fast local transfer between phone and Smart Glass
- Local-First: Process videos on phone for privacy and performance
- Cloud Optional: Use cloud only when needed (fallback, sync)
Key Components:
- Voice Recognition: On-device + cloud NLP
- WiFi Direct: P2P connection between devices
- Phone App: Video search, selection, and processing
- Local Processing: FFmpeg + hardware acceleration on phone
- Delivery: WiFi Direct transfer to Smart Glass
Benefits:
- Privacy: Videos processed locally, no cloud upload required
- Performance: Faster processing (no network latency)
- Cost: Reduced cloud infrastructure costs
- Reliability: Works offline, no internet dependency
- Scalability: Distributed processing across user devices
Trade-offs:
- Phone Resources: Uses phone CPU/GPU/battery
- Device Dependency: Requires phone nearby
- Storage: Uses phone storage
- Complexity: More complex than pure cloud solution
The hybrid approach provides the best balance of privacy, performance, and user experience while maintaining the flexibility to fall back to cloud processing when needed.