Design a Global Translation Service with Hybrid Mode - System Design Interview

Introduction

A global translation service with hybrid mode provides seamless translation capabilities that work both offline and online. When offline, it uses local translation models stored on the device. When online, it can use remote server translation for better accuracy and support for more language pairs. This pattern is essential for mobile applications that need to work in areas with poor connectivity.

This post provides a detailed walkthrough of designing a hybrid translation service, covering offline-first architecture, local model management, network detection, remote translation API, caching strategies, model updates, and synchronization between local and remote translations. This is a common system design interview question that tests your understanding of distributed systems, mobile architecture, offline-first design, ML model deployment, and edge computing.

Problem Statement
Requirements
- Functional Requirements
- Non-Functional Requirements
Capacity Estimation
Core Entities
API
Data Flow
Database Design
- Schema Design
- Database Sharding Strategy
High-Level Design
Deep Dive
Summary

Problem Statement

Design a global translation service with hybrid mode that:

Works offline using local translation models
Uses remote server translation when online (WiFi or cellular)
Automatically switches between local and remote modes
Supports 100+ language pairs
Caches translations for offline access
Updates local models periodically
Syncs translation history across devices
Provides consistent translation quality
Handles network failures gracefully
Supports batch translation

Scale Requirements:

100 million+ users
1 billion+ translations per day
Peak: 100,000 translations per second
Average translation length: 50 words
Local model size: 50-200MB per language pair
Must work completely offline
Network detection latency: < 100ms
Translation latency: < 500ms (local), < 2s (remote)

Requirements

Functional Requirements

Core Features:

Translation: Translate text between languages
Hybrid Mode: Automatic switching between local and remote
Network Detection: Detect WiFi, cellular, or offline
Local Models: Download and manage local translation models
Translation Cache: Cache translations for offline access
Batch Translation: Translate multiple texts at once
Language Detection: Auto-detect source language
Model Updates: Update local models periodically
Translation History: Store translation history
Sync Across Devices: Sync translations across user devices

Out of Scope:

Real-time voice translation
Image translation (OCR + translation)
Video subtitle translation
Translation quality scoring
User authentication (assume existing auth system)

Non-Functional Requirements

Availability: 99.9% uptime (remote), 100% availability (local)
Reliability: No translation loss, graceful degradation
Performance:
- Local translation: < 500ms
- Remote translation: < 2 seconds
- Network detection: < 100ms
- Cache lookup: < 10ms
Scalability: Handle 100K+ translations per second
Offline Support: Full functionality offline
Storage Efficiency: Minimize local storage usage
Battery Efficiency: Minimize battery drain

Capacity Estimation

Traffic Estimates

Total Users: 100 million
Daily Active Users (DAU): 10 million
Translations per Day: 1 billion
Peak Translation Rate: 100,000 per second
Normal Translation Rate: 10,000 per second
Offline Translations: 30% (300 million per day)
Online Translations: 70% (700 million per day)
Average Text Length: 50 words = 250 characters

Storage Estimates

Local Models (per device):

Popular language pairs: 20 pairs × 100MB = 2GB
All language pairs: 100 pairs × 100MB = 10GB
Average user: 5 pairs × 100MB = 500MB

Translation Cache (per device):

10,000 cached translations × 1KB = 10MB
100,000 cached translations × 1KB = 100MB

Remote Storage:

Translation history: 1B translations/day × 500 bytes = 500GB/day
30-day retention: ~15TB
Model storage: 100 language pairs × 200MB = 20GB

Total Storage: ~15TB

Bandwidth Estimates

Normal Traffic:

10,000 translations/sec × 2KB = 20MB/s = 160Mbps
Request + response data

Peak Traffic:

100,000 translations/sec × 2KB = 200MB/s = 1.6Gbps

Model Downloads:

1M model downloads/day × 100MB = 100TB/day = ~1.16GB/s = ~9.3Gbps

Total Peak: ~11Gbps

Core Entities

Translation Request

request_id (UUID)
user_id (UUID)
source_language (VARCHAR)
target_language (VARCHAR)
source_text (TEXT)
translation_mode (local, remote, hybrid)
network_status (wifi, cellular, offline)
created_at (TIMESTAMP)

Translation Result

result_id (UUID)
request_id (UUID)
translated_text (TEXT)
confidence_score (FLOAT)
translation_mode (local, remote)
model_version (VARCHAR)
latency_ms (INT)
created_at (TIMESTAMP)

Translation Cache

cache_key (VARCHAR, hash of source_text + languages)
source_text (TEXT)
source_language (VARCHAR)
target_language (VARCHAR)
translated_text (TEXT)
created_at (TIMESTAMP)
last_accessed_at (TIMESTAMP)
access_count (INT)

Local Model

model_id (UUID)
language_pair (VARCHAR, e.g., “en-fr”)
model_version (VARCHAR)
model_size_bytes (BIGINT)
model_file_path (VARCHAR)
download_url (VARCHAR)
is_downloaded (BOOLEAN)
download_progress (INT, percentage)
last_updated_at (TIMESTAMP)

Translation History

history_id (UUID)
user_id (UUID)
source_text (TEXT)
translated_text (TEXT)
source_language (VARCHAR)
target_language (VARCHAR)
translation_mode (VARCHAR)
device_id (VARCHAR)
created_at (TIMESTAMP)

API

1. Translate Text

POST /api/v1/translate
Request:
{
  "source_text": "Hello, how are you?",
  "source_language": "en",
  "target_language": "fr",
  "mode": "auto",  // auto, local, remote
  "cache_enabled": true
}

Response:
{
  "request_id": "uuid",
  "translated_text": "Bonjour, comment allez-vous?",
  "source_language": "en",
  "target_language": "fr",
  "translation_mode": "local",
  "confidence_score": 0.95,
  "model_version": "v2.1",
  "latency_ms": 350,
  "cached": false
}

2. Batch Translate

POST /api/v1/translate/batch
Request:
{
  "texts": [
    "Hello",
    "Goodbye",
    "Thank you"
  ],
  "source_language": "en",
  "target_language": "fr",
  "mode": "auto"
}

Response:
{
  "translations": [
    {
      "source_text": "Hello",
      "translated_text": "Bonjour",
      "translation_mode": "local"
    },
    {
      "source_text": "Goodbye",
      "translated_text": "Au revoir",
      "translation_mode": "local"
    },
    {
      "source_text": "Thank you",
      "translated_text": "Merci",
      "translation_mode": "local"
    }
  ]
}

3. Detect Language

POST /api/v1/translate/detect
Request:
{
  "text": "Bonjour, comment allez-vous?"
}

Response:
{
  "detected_language": "fr",
  "confidence": 0.98
}

4. Get Available Languages

GET /api/v1/languages
Response:
{
  "languages": [
    {
      "code": "en",
      "name": "English",
      "local_model_available": true,
      "model_size_mb": 120
    },
    {
      "code": "fr",
      "name": "French",
      "local_model_available": true,
      "model_size_mb": 115
    }
  ],
  "total": 100
}

5. Download Local Model

POST /api/v1/models/download
Request:
{
  "language_pair": "en-fr",
  "priority": "high"
}

Response:
{
  "model_id": "uuid",
  "language_pair": "en-fr",
  "download_url": "https://...",
  "model_size_mb": 120,
  "estimated_download_time_seconds": 60
}

6. Get Translation History

GET /api/v1/translations/history?limit=20&offset=0
Response:
{
  "translations": [
    {
      "history_id": "uuid",
      "source_text": "Hello",
      "translated_text": "Bonjour",
      "source_language": "en",
      "target_language": "fr",
      "created_at": "2025-11-13T10:00:00Z"
    }
  ],
  "total": 100,
  "limit": 20,
  "offset": 0
}

Data Flow

Translation Flow (Online - Remote Mode)

User Requests Translation:
- User submits text for translation
- Client SDK detects network status (WiFi/cellular)
- Chooses remote mode
Cache Check:
- Client SDK checks local cache
- Cache hit: Returns cached translation
- Cache miss: Proceeds to remote
Remote Translation:
- Client SDK sends request to API Gateway
- API Gateway routes to Translation Service
- Translation Service:
  - Uses remote ML model or API
  - Generates translation
  - Returns result
Cache Update:
- Client SDK stores translation in local cache
- Updates cache statistics
Response:
- Client SDK returns translation to user
- Updates UI

Translation Flow (Offline - Local Mode)

User Requests Translation:
- User submits text for translation
- Client SDK detects offline status
- Chooses local mode
Cache Check:
- Client SDK checks local cache
- Cache hit: Returns cached translation
- Cache miss: Proceeds to local model
Local Model Translation:
- Client SDK:
  - Loads local translation model
  - Runs inference on device
  - Generates translation
Cache Update:
- Client SDK stores translation in local cache
- Updates cache statistics
Response:
- Client SDK returns translation to user
- Updates UI

Translation Flow (Hybrid Mode - Auto)

User Requests Translation:
- User submits text for translation
- Client SDK detects network status
Mode Selection:
- Network Manager:
  - Checks network connectivity
  - Checks network quality (WiFi vs cellular)
  - Checks local model availability
  - Selects optimal mode
Translation Execution:
- If online + good connection: Use remote
- If online + poor connection: Use local
- If offline: Use local
- If local model unavailable: Use remote (if online)
Fallback Handling:
- If remote fails: Fallback to local
- If local fails: Return error or cached result
Response:
- Client SDK returns translation
- Updates UI

Model Download Flow

User Requests Model Download:
- User selects language pair to download
- Client SDK sends download request
Download Preparation:
- Model Service:
  - Validates language pair
  - Gets model metadata
  - Generates download URL
Model Download:
- Client SDK:
  - Downloads model file
  - Shows progress
  - Validates download integrity
Model Installation:
- Client SDK:
  - Extracts model file
  - Stores in local storage
  - Registers model with translation engine
Verification:
- Client SDK verifies model works
- Updates model status

Database Design

Schema Design

Translation History Table (Sharded by user_id):

CREATE TABLE translation_history_0 (
    history_id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    source_text TEXT NOT NULL,
    translated_text TEXT NOT NULL,
    source_language VARCHAR(10) NOT NULL,
    target_language VARCHAR(10) NOT NULL,
    translation_mode VARCHAR(20) NOT NULL,
    device_id VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at DESC),
    INDEX idx_user_created (user_id, created_at DESC)
);
-- Similar tables: translation_history_1, ..., translation_history_N

Translation Cache Table (Local - SQLite/Realm):

CREATE TABLE translation_cache (
    cache_key VARCHAR(64) PRIMARY KEY,
    source_text TEXT NOT NULL,
    source_language VARCHAR(10) NOT NULL,
    target_language VARCHAR(10) NOT NULL,
    translated_text TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    last_accessed_at TIMESTAMP DEFAULT NOW(),
    access_count INT DEFAULT 1,
    INDEX idx_languages (source_language, target_language),
    INDEX idx_last_accessed (last_accessed_at DESC)
);

Local Models Table (Local - SQLite/Realm):

CREATE TABLE local_models (
    model_id UUID PRIMARY KEY,
    language_pair VARCHAR(20) NOT NULL,
    model_version VARCHAR(50) NOT NULL,
    model_size_bytes BIGINT NOT NULL,
    model_file_path VARCHAR(500) NOT NULL,
    is_downloaded BOOLEAN DEFAULT FALSE,
    download_progress INT DEFAULT 0,
    last_updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE KEY uk_language_pair (language_pair)
);

Model Metadata Table:

CREATE TABLE model_metadata (
    model_id UUID PRIMARY KEY,
    language_pair VARCHAR(20) NOT NULL,
    model_version VARCHAR(50) NOT NULL,
    model_size_bytes BIGINT NOT NULL,
    download_url VARCHAR(1000) NOT NULL,
    checksum VARCHAR(64) NOT NULL,
    supported_features JSON,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE KEY uk_language_version (language_pair, model_version)
);

Database Sharding Strategy

Translation History Table Sharding:

Shard by user_id using consistent hashing
1000 shards: shard_id = hash(user_id) % 1000
All translations for a user in same shard
Enables efficient user history queries

Shard Key Selection:

user_id ensures all translations for a user are in same shard
Enables efficient queries for user translation history
Prevents cross-shard queries for single user

Replication:

Each shard replicated 3x for high availability
Master-replica setup for read scaling
Writes go to master, reads can go to replicas

High-Level Design

┌─────────────────────────────────────────────────────────────┐
│                    Mobile/Web Client                         │
│                                                              │
│  ┌──────────────┐                                           │
│  │ Client SDK   │                                           │
│  │ - Network   │                                           │
│  │   Detection  │                                           │
│  │ - Mode      │                                           │
│  │   Selection │                                           │
│  │ - Cache     │                                           │
│  │   Manager   │                                           │
│  └──────┬───────┘                                           │
│         │                                                    │
│         ├──────────────────┬──────────────────┐            │
│         │                  │                  │            │
│  ┌──────▼──────┐  ┌───────▼──────┐  ┌───────▼──────┐     │
│  │ Local       │  │ Remote       │  │ Cache         │     │
│  │ Translation │  │ Translation  │  │ Manager       │     │
│  │ Engine      │  │ Client       │  │               │     │
│  └──────┬──────┘  └───────┬──────┘  └───────┬──────┘     │
│         │                  │                  │            │
│  ┌──────▼──────────────────▼──────────────────▼──────┐     │
│  │         Local Storage (SQLite/Realm)              │     │
│  │  - Translation models                             │     │
│  │  - Translation cache                               │     │
│  │  - Translation history                             │     │
│  └───────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘
         │
         │ HTTP/HTTPS (when online)
         │
┌────────▼───────────────────────────────────────────────────┐
│        API Gateway / Load Balancer                           │
│        - Rate Limiting                                       │
│        - Request Routing                                     │
└────────┬───────────────────────────────────────────────────┘
         │
         │
┌────────▼───────────────────────────────────────────────────┐
│         Translation Service                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                │
│  │ ML Model │  │ API      │  │ Cache    │                │
│  │ Service  │  │ Gateway  │  │ Service  │                │
│  └──────────┘  └──────────┘  └──────────┘                │
└────────┬───────────────────────────────────────────────────┘
         │
         │
┌────────▼───────────────────────────────────────────────────┐
│         Model Service                                        │
│         - Model metadata                                     │
│         - Model downloads                                    │
│         - Model updates                                      │
└────────┬───────────────────────────────────────────────────┘
         │
         │
┌────────▼───────────────────────────────────────────────────┐
│         Database Cluster (Sharded)                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                │
│  │ History  │  │ Model    │  │ Cache    │                │
│  │ (Sharded)│  │ Metadata │  │ DB       │                │
│  └──────────┘  └──────────┘  └──────────┘                │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│         Object Storage (S3)                                  │
│         - Translation models                                │
│         - Model versions                                     │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│         CDN                                                   │
│         - Model distribution                                 │
│         - Fast downloads                                     │
└─────────────────────────────────────────────────────────────┘

Deep Dive

Component Design

1. Network Detection Manager

Responsibilities:

Detect network connectivity
Determine network type (WiFi, cellular, offline)
Measure network quality
Provide network status to translation service

Key Design Decisions:

Fast Detection: < 100ms detection latency
Accurate Status: Reliable network detection
Battery Efficient: Minimal battery drain
Platform Agnostic: Works on iOS, Android, Web

Implementation:

class NetworkDetectionManager:
    def __init__(self):
        self.network_status = 'unknown'
        self.last_check = None
        self.check_interval = 5  # seconds
    
    def get_network_status(self):
        """Get current network status"""
        # Check if cached status is recent
        if self.last_check and (time.time() - self.last_check) < self.check_interval:
            return self.network_status
        
        # Detect network
        status = self.detect_network()
        self.network_status = status
        self.last_check = time.time()
        
        return status
    
    def detect_network(self):
        """Detect network connectivity"""
        try:
            # Try to reach a lightweight endpoint
            response = requests.get(
                'https://api.translation.com/health',
                timeout=0.1  # 100ms timeout
            )
            
            if response.status_code == 200:
                # Check connection type
                connection_type = self.get_connection_type()
                return {
                    'status': 'online',
                    'type': connection_type,  # wifi, cellular
                    'quality': self.measure_quality()
                }
        except:
            pass
        
        return {
            'status': 'offline',
            'type': None,
            'quality': None
        }
    
    def get_connection_type(self):
        """Get connection type (WiFi or cellular)"""
        # Platform-specific implementation
        # iOS: Use Reachability framework
        # Android: Use ConnectivityManager
        # Web: Use Network Information API
        
        # Example for web
        if hasattr(navigator, 'connection'):
            connection = navigator.connection
            if connection.type == 'wifi':
                return 'wifi'
            elif connection.type in ['cellular', '2g', '3g', '4g', '5g']:
                return 'cellular'
        
        return 'unknown'
    
    def measure_quality(self):
        """Measure network quality"""
        # Measure latency and bandwidth
        # Return: 'excellent', 'good', 'poor'
        
        try:
            start = time.time()
            requests.get('https://api.translation.com/ping', timeout=1)
            latency = (time.time() - start) * 1000  # ms
            
            if latency < 100:
                return 'excellent'
            elif latency < 500:
                return 'good'
            else:
                return 'poor'
        except:
            return 'poor'
    
    def should_use_remote(self, network_status):
        """Determine if should use remote translation"""
        if network_status['status'] == 'offline':
            return False
        
        if network_status['type'] == 'wifi':
            return True
        
        if network_status['type'] == 'cellular':
            # Use remote only if quality is good
            return network_status['quality'] in ['excellent', 'good']
        
        return False

2. Translation Mode Selector

Responsibilities:

Select optimal translation mode
Balance between local and remote
Handle fallbacks
Optimize for user experience

Key Design Decisions:

Smart Selection: Choose best mode based on context
Fallback Strategy: Graceful fallback on failure
User Preference: Respect user preferences
Performance: Fast mode selection

Implementation:

class TranslationModeSelector:
    def __init__(self, network_manager, cache_manager, model_manager):
        self.network_manager = network_manager
        self.cache_manager = cache_manager
        self.model_manager = model_manager
    
    def select_mode(self, source_language, target_language, user_preference='auto'):
        """Select optimal translation mode"""
        # Check user preference
        if user_preference == 'local':
            return self._select_local_mode(source_language, target_language)
        elif user_preference == 'remote':
            return self._select_remote_mode()
        else:  # auto
            return self._select_auto_mode(source_language, target_language)
    
    def _select_auto_mode(self, source_language, target_language):
        """Automatically select best mode"""
        network_status = self.network_manager.get_network_status()
        
        # Check if local model available
        local_model_available = self.model_manager.has_model(
            source_language, target_language
        )
        
        # Decision logic
        if network_status['status'] == 'offline':
            # Must use local
            if local_model_available:
                return 'local'
            else:
                raise OfflineTranslationError("No local model available")
        
        if network_status['status'] == 'online':
            # Prefer remote if good connection
            if self.network_manager.should_use_remote(network_status):
                return 'remote'
            else:
                # Poor connection, use local if available
                if local_model_available:
                    return 'local'
                else:
                    # Fallback to remote even with poor connection
                    return 'remote'
        
        return 'local'  # Default to local
    
    def _select_local_mode(self, source_language, target_language):
        """Select local mode"""
        if not self.model_manager.has_model(source_language, target_language):
            raise LocalModelNotAvailableError("Local model not available")
        return 'local'
    
    def _select_remote_mode(self):
        """Select remote mode"""
        network_status = self.network_manager.get_network_status()
        if network_status['status'] == 'offline':
            raise OfflineError("Cannot use remote mode offline")
        return 'remote'

3. Local Translation Engine

Responsibilities:

Load and manage local translation models
Execute translation inference
Handle model lifecycle
Optimize for mobile performance

Key Design Decisions:

Model Format: Use optimized format (TFLite, CoreML, ONNX)
Lazy Loading: Load models on demand
Memory Management: Efficient memory usage
Performance: Fast inference (< 500ms)

Implementation:

class LocalTranslationEngine:
    def __init__(self, model_manager):
        self.model_manager = model_manager
        self.loaded_models = {}  # Cache loaded models
    
    def translate(self, source_text, source_language, target_language):
        """Translate using local model"""
        # Get model
        model = self.get_model(source_language, target_language)
        
        if not model:
            raise LocalModelNotAvailableError("Model not available")
        
        # Run inference
        start_time = time.time()
        translated_text = model.translate(source_text)
        latency = (time.time() - start_time) * 1000  # ms
        
        return {
            'translated_text': translated_text,
            'mode': 'local',
            'latency_ms': latency,
            'confidence_score': 0.9  # Local models typically have high confidence
        }
    
    def get_model(self, source_language, target_language):
        """Get translation model"""
        language_pair = f"{source_language}-{target_language}"
        
        # Check cache
        if language_pair in self.loaded_models:
            return self.loaded_models[language_pair]
        
        # Load model
        model = self.model_manager.load_model(source_language, target_language)
        if model:
            self.loaded_models[language_pair] = model
        
        return model
    
    def batch_translate(self, texts, source_language, target_language):
        """Translate multiple texts"""
        model = self.get_model(source_language, target_language)
        if not model:
            raise LocalModelNotAvailableError("Model not available")
        
        # Batch inference
        translations = model.batch_translate(texts)
        
        return [
            {
                'source_text': text,
                'translated_text': translation,
                'mode': 'local'
            }
            for text, translation in zip(texts, translations)
        ]

4. Remote Translation Client

Responsibilities:

Communicate with remote translation API
Handle API requests and responses
Manage API errors and retries
Optimize for network efficiency

Key Design Decisions:

HTTP/2: Use HTTP/2 for better performance
Request Batching: Batch multiple translations
Retry Logic: Retry on failures
Compression: Compress requests/responses

Implementation:

class RemoteTranslationClient:
    def __init__(self, api_base_url):
        self.api_base_url = api_base_url
        self.session = requests.Session()
        self.session.headers.update({
            'Content-Type': 'application/json',
            'Accept-Encoding': 'gzip'
        })
    
    def translate(self, source_text, source_language, target_language):
        """Translate using remote API"""
        url = f"{self.api_base_url}/api/v1/translate"
        
        payload = {
            'source_text': source_text,
            'source_language': source_language,
            'target_language': target_language
        }
        
        try:
            start_time = time.time()
            response = self.session.post(
                url,
                json=payload,
                timeout=5
            )
            latency = (time.time() - start_time) * 1000  # ms
            
            response.raise_for_status()
            data = response.json()
            
            return {
                'translated_text': data['translated_text'],
                'mode': 'remote',
                'latency_ms': latency,
                'confidence_score': data.get('confidence_score', 0.85)
            }
        except requests.Timeout:
            raise RemoteTranslationTimeoutError("Translation timeout")
        except requests.RequestException as e:
            raise RemoteTranslationError(f"Translation failed: {str(e)}")
    
    def batch_translate(self, texts, source_language, target_language):
        """Batch translate using remote API"""
        url = f"{self.api_base_url}/api/v1/translate/batch"
        
        payload = {
            'texts': texts,
            'source_language': source_language,
            'target_language': target_language
        }
        
        try:
            response = self.session.post(
                url,
                json=payload,
                timeout=10
            )
            response.raise_for_status()
            data = response.json()
            
            return [
                {
                    'source_text': text,
                    'translated_text': translation['translated_text'],
                    'mode': 'remote'
                }
                for text, translation in zip(texts, data['translations'])
            ]
        except requests.RequestException as e:
            raise RemoteTranslationError(f"Batch translation failed: {str(e)}")

5. Translation Cache Manager

Responsibilities:

Cache translations locally
Manage cache size and eviction
Provide fast cache lookups
Sync cache across devices

Key Design Decisions:

LRU Eviction: Evict least recently used entries
Size Limit: Limit cache size (e.g., 10MB)
Fast Lookup: < 10ms cache lookup
Persistence: Persist cache to disk

Implementation:

class TranslationCacheManager:
    def __init__(self, max_size_mb=10):
        self.max_size_bytes = max_size_mb * 1024 * 1024
        self.cache = {}  # In-memory cache
        self.access_order = []  # For LRU
        self.current_size = 0
        self.db = CacheDatabase()  # Persistent storage
    
    def get(self, source_text, source_language, target_language):
        """Get translation from cache"""
        cache_key = self._generate_key(source_text, source_language, target_language)
        
        # Check in-memory cache
        if cache_key in self.cache:
            entry = self.cache[cache_key]
            self._update_access_order(cache_key)
            entry['last_accessed_at'] = datetime.now()
            entry['access_count'] += 1
            return entry['translated_text']
        
        # Check persistent cache
        entry = self.db.get(cache_key)
        if entry:
            # Load into memory cache
            self._add_to_memory_cache(cache_key, entry)
            return entry['translated_text']
        
        return None
    
    def put(self, source_text, source_language, target_language, translated_text):
        """Store translation in cache"""
        cache_key = self._generate_key(source_text, source_language, target_language)
        
        entry = {
            'source_text': source_text,
            'source_language': source_language,
            'target_language': target_language,
            'translated_text': translated_text,
            'created_at': datetime.now(),
            'last_accessed_at': datetime.now(),
            'access_count': 1
        }
        
        entry_size = self._calculate_size(entry)
        
        # Check if need to evict
        while self.current_size + entry_size > self.max_size_bytes:
            self._evict_lru()
        
        # Add to cache
        self._add_to_memory_cache(cache_key, entry)
        self.db.put(cache_key, entry)
    
    def _generate_key(self, source_text, source_language, target_language):
        """Generate cache key"""
        key_string = f"{source_language}:{target_language}:{source_text}"
        return hashlib.md5(key_string.encode()).hexdigest()
    
    def _add_to_memory_cache(self, cache_key, entry):
        """Add entry to memory cache"""
        if cache_key in self.cache:
            # Update existing
            old_entry = self.cache[cache_key]
            self.current_size -= self._calculate_size(old_entry)
        
        self.cache[cache_key] = entry
        self.current_size += self._calculate_size(entry)
        self._update_access_order(cache_key)
    
    def _evict_lru(self):
        """Evict least recently used entry"""
        if not self.access_order:
            return
        
        lru_key = self.access_order.pop(0)
        if lru_key in self.cache:
            entry = self.cache[lru_key]
            self.current_size -= self._calculate_size(entry)
            del self.cache[lru_key]

Detailed Design

Hybrid Mode Implementation

Challenge: Seamlessly switch between local and remote modes

Solution:

Unified Interface: Single API for both modes
Automatic Switching: Auto-detect and switch modes
Fallback Strategy: Fallback to alternative mode on failure
Transparent to User: User doesn’t need to know which mode

Implementation:

class HybridTranslationService:
    def __init__(self):
        self.network_manager = NetworkDetectionManager()
        self.mode_selector = TranslationModeSelector(
            self.network_manager,
            CacheManager(),
            ModelManager()
        )
        self.local_engine = LocalTranslationEngine(ModelManager())
        self.remote_client = RemoteTranslationClient(API_BASE_URL)
        self.cache_manager = TranslationCacheManager()
    
    def translate(self, source_text, source_language, target_language, 
                  mode='auto', use_cache=True):
        """Translate with hybrid mode support"""
        # Check cache first
        if use_cache:
            cached = self.cache_manager.get(
                source_text, source_language, target_language
            )
            if cached:
                return {
                    'translated_text': cached,
                    'mode': 'cache',
                    'cached': True
                }
        
        # Select mode
        selected_mode = self.mode_selector.select_mode(
            source_language, target_language, mode
        )
        
        # Translate
        try:
            if selected_mode == 'local':
                result = self.local_engine.translate(
                    source_text, source_language, target_language
                )
            else:  # remote
                result = self.remote_client.translate(
                    source_text, source_language, target_language
                )
            
            # Cache result
            if use_cache:
                self.cache_manager.put(
                    source_text, source_language, target_language,
                    result['translated_text']
                )
            
            return result
        
        except Exception as e:
            # Fallback to alternative mode
            return self._fallback_translate(
                source_text, source_language, target_language,
                selected_mode, use_cache
            )
    
    def _fallback_translate(self, source_text, source_language, target_language,
                           failed_mode, use_cache):
        """Fallback to alternative mode"""
        if failed_mode == 'local':
            # Try remote
            try:
                result = self.remote_client.translate(
                    source_text, source_language, target_language
                )
                if use_cache:
                    self.cache_manager.put(
                        source_text, source_language, target_language,
                        result['translated_text']
                    )
                return result
            except:
                pass
        else:  # remote failed
            # Try local
            try:
                result = self.local_engine.translate(
                    source_text, source_language, target_language
                )
                if use_cache:
                    self.cache_manager.put(
                        source_text, source_language, target_language,
                        result['translated_text']
                    )
                return result
            except:
                pass
        
        # Both failed, return error
        raise TranslationError("Translation failed in both modes")

Model Management

Challenge: Manage local models efficiently

Solution:

Lazy Download: Download models on demand
Version Management: Track model versions
Update Mechanism: Update models periodically
Storage Optimization: Compress models, remove unused

Implementation:

class ModelManager:
    def __init__(self):
        self.local_storage = LocalStorage()
        self.model_service = ModelService()
        self.downloaded_models = set()
    
    def has_model(self, source_language, target_language):
        """Check if model is available locally"""
        language_pair = f"{source_language}-{target_language}"
        return language_pair in self.downloaded_models
    
    def download_model(self, source_language, target_language, callback=None):
        """Download model"""
        language_pair = f"{source_language}-{target_language}"
        
        # Get model metadata
        metadata = self.model_service.get_model_metadata(language_pair)
        
        # Download model
        model_path = self.local_storage.download_file(
            metadata['download_url'],
            f"models/{language_pair}.tflite",
            callback=callback
        )
        
        # Verify checksum
        if not self._verify_checksum(model_path, metadata['checksum']):
            raise ModelDownloadError("Checksum verification failed")
        
        # Register model
        self.downloaded_models.add(language_pair)
        self.local_storage.save_model_info(language_pair, metadata)
        
        return model_path
    
    def load_model(self, source_language, target_language):
        """Load model into memory"""
        language_pair = f"{source_language}-{target_language}"
        
        if language_pair not in self.downloaded_models:
            return None
        
        # Load from storage
        model_path = self.local_storage.get_model_path(language_pair)
        return self._load_model_file(model_path)
    
    def update_model(self, source_language, target_language):
        """Update model to latest version"""
        language_pair = f"{source_language}-{target_language}"
        
        # Get current version
        current_info = self.local_storage.get_model_info(language_pair)
        
        # Get latest version
        latest_metadata = self.model_service.get_latest_model_metadata(language_pair)
        
        if latest_metadata['model_version'] != current_info['model_version']:
            # Download new version
            return self.download_model(source_language, target_language)
        
        return None

Scalability Considerations

Horizontal Scaling

Translation Service:

Stateless service, horizontally scalable
Multiple instances behind load balancer
Auto-scaling based on load
Model serving on GPU instances

Caching Strategy

Multi-Level Cache:

Client Cache: Local cache on device
CDN Cache: Cache popular translations
Application Cache: Cache in translation service
Database Cache: Cache in database layer

Security Considerations

Data Privacy

Encryption: Encrypt sensitive translations
Local Storage: Secure local model storage
API Security: Secure API communication
User Data: Don’t log sensitive user data

Model Security

Model Integrity: Verify model checksums
Model Updates: Secure model update mechanism
Malicious Models: Scan models for malware

Monitoring & Observability

Key Metrics

Performance Metrics:

Translation latency (local vs remote)
Cache hit rate
Model load time
Network detection latency

Usage Metrics:

Translations per second
Offline vs online usage
Language pair distribution
Model download rate

Quality Metrics:

Translation accuracy
User satisfaction
Error rate

Trade-offs and Optimizations

Trade-offs

1. Model Size: Small vs Large

Small: Less storage, lower accuracy
Large: More storage, higher accuracy
Decision: Balance based on device storage

2. Cache Size: Large vs Small

Large: More cache hits, more storage
Small: Less storage, more misses
Decision: 10-100MB based on device

3. Mode Selection: Aggressive vs Conservative

Aggressive: Prefer remote, better quality
Conservative: Prefer local, better offline
Decision: Adaptive based on network

Optimizations

1. Model Compression

Quantize models
Reduce model size
Maintain accuracy

2. Batch Translation

Batch multiple texts
Reduce API calls
Improve throughput

3. Predictive Model Download

Download models before needed
Reduce wait time
Improve UX

What Interviewers Look For

Hybrid Architecture Skills

Mode Switching
- Seamless local/remote switching
- Network detection
- Automatic fallback
- Red Flags: Manual switching, no detection, no fallback
Local Model Management
- Efficient model storage
- Model updates
- Red Flags: No local models, inefficient storage, no updates
Offline Support
- Full offline functionality
- Local translation
- Red Flags: No offline, network required, poor UX

Distributed Systems Skills

Network Detection
- Fast detection
- Accurate status
- Red Flags: Slow detection, inaccurate, no detection
Caching Strategy
- Multi-level caching
- High cache hit rate
- Red Flags: No caching, low hit rate, poor performance
Scalability Design
- Horizontal scaling
- Load balancing
- Red Flags: Vertical scaling, no load balancing, bottlenecks

Problem-Solving Approach

Performance Optimization
- Sub-500ms local translation
- Sub-2s remote translation
- Red Flags: High latency, no optimization, poor UX
Edge Cases
- Network failures
- Model updates
- Large texts
- Red Flags: Ignoring edge cases, no handling
Trade-off Analysis
- Local vs remote
- Accuracy vs speed
- Red Flags: No trade-offs, dogmatic choices

System Design Skills

Component Design
- Translation service
- Model management service
- Network detection service
- Red Flags: Monolithic, unclear boundaries
Fallback Strategy
- Graceful degradation
- Error handling
- Red Flags: No fallback, errors, poor UX
Model Updates
- Efficient updates
- Version management
- Red Flags: No updates, inefficient, version conflicts

Communication Skills

Hybrid Architecture Explanation
- Can explain mode switching
- Understands local/remote trade-offs
- Red Flags: No understanding, vague
Performance Explanation
- Can explain optimization strategies
- Understands caching
- Red Flags: No understanding, vague

Meta-Specific Focus

Hybrid Systems Expertise
- Local/remote architecture
- Offline-first design
- Key: Show hybrid systems knowledge
User Experience Focus
- Transparent switching
- Seamless operation
- Key: Demonstrate UX thinking

Summary

Designing a global translation service with hybrid mode requires careful consideration of:

Hybrid Architecture: Seamless switching between local and remote
Network Detection: Fast and accurate network status detection
Local Models: Efficient model management and storage
Caching Strategy: Multi-level caching for performance
Offline Support: Full functionality without network
Mode Selection: Smart mode selection based on context
Fallback Strategy: Graceful fallback on failures
Model Updates: Efficient model update mechanism
Performance: Sub-500ms local, sub-2s remote translation
User Experience: Transparent mode switching

Key architectural decisions:

Hybrid Mode for seamless online/offline operation
Network Detection for automatic mode switching
Local Models for offline translation
Multi-Level Cache for performance
Smart Mode Selection based on network and context
Fallback Strategy for reliability
Model Management for efficient storage
Horizontal Scaling for remote service

The system handles 100,000 translations per second, works completely offline with local models, automatically switches between local and remote modes, and provides sub-500ms local translation latency with > 90% cache hit rate.

Introduction

Table of Contents

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Traffic Estimates

Storage Estimates

Bandwidth Estimates

Core Entities

Translation Request

Translation Result

Translation Cache

Local Model

Translation History

API

1. Translate Text

2. Batch Translate

3. Detect Language

4. Get Available Languages

5. Download Local Model

6. Get Translation History

Data Flow

Translation Flow (Online - Remote Mode)

Translation Flow (Offline - Local Mode)

Translation Flow (Hybrid Mode - Auto)

Model Download Flow

Database Design

Schema Design

Database Sharding Strategy

High-Level Design

Deep Dive

Component Design

1. Network Detection Manager

2. Translation Mode Selector

3. Local Translation Engine

4. Remote Translation Client

5. Translation Cache Manager

Detailed Design

Hybrid Mode Implementation

Model Management

Scalability Considerations

Horizontal Scaling

Caching Strategy

Security Considerations

Data Privacy

Model Security

Monitoring & Observability

Key Metrics

Trade-offs and Optimizations

Trade-offs

Optimizations

What Interviewers Look For

Hybrid Architecture Skills

Distributed Systems Skills

Problem-Solving Approach

System Design Skills

Communication Skills

Meta-Specific Focus

Summary

Related Posts

Recent Posts