Introduction
MySQL is one of the most popular open-source relational database management systems (RDBMS) in the world. It’s widely used for web applications, content management systems, and enterprise applications. Understanding MySQL is essential for system design interviews and building scalable database systems.
This guide covers:
- MySQL Fundamentals: Core concepts, architecture, and storage engines
- Replication: Master-slave, master-master, and high availability
- Sharding: Horizontal scaling strategies
- Performance Optimization: Indexing, query optimization, and caching
- Best Practices: Schema design, security, and monitoring
What is MySQL?
MySQL is an open-source relational database management system that:
- Stores Data: Tables, rows, and columns with relationships
- ACID Compliance: Transaction support with consistency guarantees
- SQL Interface: Standard SQL query language
- Replication: Built-in replication for high availability
- Scalability: Vertical and horizontal scaling options
Key Concepts
Database: Collection of tables
Table: Collection of rows (records) and columns (fields)
Row: Single record in a table
Column: Field in a table
Index: Data structure for fast lookups
Primary Key: Unique identifier for each row
Foreign Key: Reference to another table’s primary key
Transaction: Atomic unit of work (ACID)
Replication: Copying data to multiple servers
Architecture
High-Level Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────▶│ Client │────▶│ Client │
│ Application │ │ Application │ │ Application │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────┴────────────────────┘
│
▼
┌─────────────────────────┐
│ Connection Pool │
│ (Load Balancer) │
└──────┬──────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌──────▼──────┐ ┌───────▼──────┐
│ MySQL │ │ MySQL │
│ Master │ │ Replica │
│ (Writes) │ │ (Reads) │
└─────────────┘ └─────────────┘
Explanation:
- Client Applications: Applications that connect to MySQL to store and retrieve data (e.g., web applications, APIs, microservices).
- Connection Pool (Load Balancer): Manages database connections and can distribute read requests across replicas.
- MySQL Master: The primary server that handles all write operations and replicates changes to replicas.
- MySQL Replica: A copy of the master database that handles read operations and provides high availability.
Core Architecture
┌─────────────────────────────────────────────────────────┐
│ MySQL Server │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Connection Layer │ │
│ │ (Thread Pool, Connection Management) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ SQL Parser & Optimizer │ │
│ │ (Query Parsing, Optimization, Execution Plan) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Storage Engine Layer │ │
│ │ (InnoDB, MyISAM, Memory, etc.) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ File System │ │
│ │ (Data Files, Log Files, Index Files) │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Storage Engines
InnoDB (Default)
Characteristics:
- ACID transactions
- Row-level locking
- Foreign key constraints
- Crash recovery
- Clustered indexes
Use Cases:
- Most applications (default choice)
- Transaction-heavy workloads
- Applications requiring ACID
Example:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
email VARCHAR(100) UNIQUE,
INDEX idx_email (email)
) ENGINE=InnoDB;
MyISAM
Characteristics:
- Table-level locking
- Fast reads
- No transactions
- Full-text search
- Lower overhead
Use Cases:
- Read-heavy workloads
- Full-text search
- Logging tables
- Note: Deprecated in MySQL 8.0+
Memory (HEAP)
Characteristics:
- In-memory storage
- Very fast
- Data lost on restart
- Hash indexes
Use Cases:
- Temporary tables
- Caching
- Session storage
Other Engines
Archive: Compressed storage for historical data
CSV: CSV file storage
Federated: Access remote tables
Replication
Master-Slave Replication
Architecture:
Master Server (Write)
│
│ Binary Log
│
├──→ Slave 1 (Read)
├──→ Slave 2 (Read)
└──→ Slave 3 (Read)
Configuration:
Master:
-- Enable binary logging
[mysqld]
log-bin=mysql-bin
server-id=1
Slave:
[mysqld]
server-id=2
relay-log=mysql-relay-bin
read-only=1
Setup:
-- On Master
CREATE USER 'replicator'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
-- On Slave
CHANGE MASTER TO
MASTER_HOST='master.example.com',
MASTER_USER='replicator',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=0;
START SLAVE;
Benefits:
- Read scaling
- High availability
- Backup from slave
- Geographic distribution
Master-Master Replication
Architecture:
Master 1 ←→ Master 2
│ │
└────┬─────┘
│
Read Replicas
Use Cases:
- Multi-master writes
- Geographic distribution
- Failover scenarios
Challenges:
- Conflict resolution
- Data consistency
- Complexity
Replication Types
Statement-Based Replication (SBR):
- Replicates SQL statements
- Smaller log size
- Non-deterministic issues
Row-Based Replication (RBR):
- Replicates row changes
- More data
- More reliable
Mixed Replication:
- Uses SBR by default
- Switches to RBR when needed
Sharding
Horizontal Sharding
Strategy:
Database
│
├──→ Shard 1 (users 0-999)
├──→ Shard 2 (users 1000-1999)
└──→ Shard 3 (users 2000-2999)
Sharding Key:
-- Shard by user_id
shard_id = user_id % num_shards
-- Example
user_id = 1234
shard_id = 1234 % 3 = 1
Sharding Strategies:
1. Range-Based Sharding:
-- Shard by date range
Shard 1: 2024-01-01 to 2024-06-30
Shard 2: 2024-07-01 to 2024-12-31
2. Hash-Based Sharding:
-- Shard by hash
shard_id = hash(user_id) % num_shards
3. Directory-Based Sharding:
-- Lookup table for shard mapping
shard_mapping[user_id] = shard_id
Sharding Challenges
Cross-Shard Queries:
- Joins across shards
- Aggregations
- Transactions
Solutions:
- Denormalization
- Application-level joins
- Distributed transactions (2PC)
Rebalancing:
- Moving data between shards
- Maintaining availability
- Consistency
Performance Characteristics
Maximum Read & Write Throughput
Single Node (InnoDB Engine):
- Max Read Throughput:
- Simple queries (indexed): 10K-50K queries/sec
- Complex queries (joins): 1K-10K queries/sec
- With read replicas: 10K-50K queries/sec per replica (linear scaling)
- Max Write Throughput:
- Simple inserts: 5K-25K writes/sec
- Updates with indexes: 2K-10K writes/sec
- Batch inserts: 50K-200K rows/sec
Factors Affecting Throughput:
- Storage engine (InnoDB vs MyISAM)
- Hardware (CPU, RAM, disk I/O, SSD vs HDD)
- Query complexity and indexing
- Connection pooling
- Binary log configuration
- Buffer pool size
- Number of connections
- Replication lag (for read replicas)
Optimized Configuration:
- Max Read Throughput: 50K-100K queries/sec (with proper indexing, connection pooling, and read replicas)
- Max Write Throughput: 25K-50K writes/sec (with optimized InnoDB settings and SSD storage)
Horizontal Scaling (Read Replicas):
- Max Read Throughput: 10K-50K queries/sec per replica
- Example: 5 read replicas can handle 50K-250K queries/sec total
- Write Throughput: Limited by master (single write node)
Performance Optimization
Indexing
Primary Index:
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100)
);
Secondary Index:
CREATE INDEX idx_email ON users(email);
Composite Index:
CREATE INDEX idx_name_email ON users(name, email);
Index Types:
- B-Tree: Default, good for range queries
- Hash: Exact match lookups
- Full-Text: Text search
- Spatial: Geographic data
Index Best Practices:
- Index frequently queried columns
- Avoid over-indexing (slows writes)
- Use composite indexes for multi-column queries
- Monitor index usage
Query Optimization
EXPLAIN:
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
Query Optimization Tips:
- Use indexes
- Avoid SELECT *
- Limit result sets
- Use appropriate JOIN types
- Avoid subqueries when possible
Example:
-- Slow
SELECT * FROM users WHERE name LIKE '%john%';
-- Better (if possible)
SELECT * FROM users WHERE name LIKE 'john%';
-- Best (if exact match)
SELECT * FROM users WHERE name = 'john';
Caching
Query Cache:
[mysqld]
query_cache_type=1
query_cache_size=256M
Application-Level Caching:
- Redis/Memcached
- Cache frequently accessed data
- Cache query results
Connection Pooling
Benefits:
- Reuse connections
- Reduce overhead
- Limit connections
Configuration:
[mysqld]
max_connections=200
thread_cache_size=50
Schema Design
Normalization
First Normal Form (1NF):
- Atomic values
- No repeating groups
Second Normal Form (2NF):
- 1NF + no partial dependencies
Third Normal Form (3NF):
- 2NF + no transitive dependencies
Example:
-- Denormalized
CREATE TABLE orders (
id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
product_name VARCHAR(100),
product_price DECIMAL(10,2)
);
-- Normalized
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
product_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id),
FOREIGN KEY (product_id) REFERENCES products(id)
);
Denormalization
When to Denormalize:
- Read-heavy workloads
- Performance critical queries
- Reduce JOINs
Example:
-- Denormalized for performance
CREATE TABLE user_posts (
id INT PRIMARY KEY,
user_id INT,
user_name VARCHAR(100), -- Denormalized
post_content TEXT,
created_at TIMESTAMP
);
Transactions
ACID Properties
Atomicity:
- All or nothing
- Rollback on failure
Consistency:
- Valid state transitions
- Constraints maintained
Isolation:
- Concurrent transactions don’t interfere
- Isolation levels
Durability:
- Committed changes persist
- Crash recovery
Isolation Levels
READ UNCOMMITTED:
- Lowest isolation
- Dirty reads possible
READ COMMITTED:
- No dirty reads
- Non-repeatable reads possible
REPEATABLE READ (Default):
- Consistent reads
- Phantom reads possible
SERIALIZABLE:
- Highest isolation
- No concurrency issues
Example:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
High Availability
Master-Slave with Failover
Architecture:
Master (Active)
│
├──→ Slave 1 (Standby)
└──→ Slave 2 (Standby)
Failover Process:
- Detect master failure
- Promote slave to master
- Update application configuration
- Reconnect clients
Tools:
- MySQL Router
- ProxySQL
- MHA (Master High Availability)
MySQL Cluster (NDB)
Architecture:
- Shared-nothing architecture
- Data nodes
- SQL nodes
- Management nodes
Use Cases:
- High availability
- Real-time applications
- In-memory performance
Security
User Management
Create User:
CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'password';
Grant Privileges:
GRANT SELECT, INSERT, UPDATE ON database.* TO 'app_user'@'localhost';
FLUSH PRIVILEGES;
Principle of Least Privilege:
- Grant minimum required privileges
- Separate users for different purposes
- Regular privilege audits
Encryption
Data at Rest:
[mysqld]
encrypt_binlog=ON
Data in Transit:
- SSL/TLS connections
- Encrypted replication
Application-Level:
- Encrypt sensitive fields
- Use strong hashing (bcrypt, Argon2)
Monitoring
Key Metrics
Performance:
- Query response time
- Throughput (QPS)
- Connection count
- Cache hit ratio
Resources:
- CPU usage
- Memory usage
- Disk I/O
- Network I/O
Replication:
- Replication lag
- Slave status
- Binary log size
Tools
MySQL Workbench:
- GUI administration
- Query execution
- Performance monitoring
Percona Monitoring:
- Comprehensive monitoring
- Alerting
- Performance insights
Prometheus + Grafana:
- Metrics collection
- Visualization
- Alerting
Best Practices
Schema Design
- Use Appropriate Data Types:
- INT vs BIGINT
- VARCHAR vs TEXT
- DECIMAL vs FLOAT
- Index Strategically:
- Primary keys
- Foreign keys
- Frequently queried columns
- Normalize When Appropriate:
- Balance normalization vs performance
- Denormalize for read-heavy workloads
Query Optimization
- Use EXPLAIN:
- Analyze query plans
- Identify bottlenecks
- Avoid N+1 Queries:
- Use JOINs
- Batch queries
- Limit Result Sets:
- Use LIMIT
- Pagination
Performance
- Connection Pooling:
- Reuse connections
- Limit max connections
- Caching:
- Query cache
- Application cache
- Read Replicas:
- Scale reads
- Reduce master load
What Interviewers Look For
Database Fundamentals
- SQL Proficiency
- Write efficient queries
- Understand JOINs
- Index usage
- Red Flags: Poor SQL, no index understanding, inefficient queries
- Schema Design
- Normalization
- Index design
- Data types
- Red Flags: Poor schema, no normalization, wrong data types
- ACID Understanding
- Transaction concepts
- Isolation levels
- Consistency guarantees
- Red Flags: No ACID understanding, wrong isolation, no transactions
Problem-Solving Approach
- Performance Optimization
- Query optimization
- Index design
- Caching strategies
- Red Flags: No optimization, poor indexes, no caching
- Scalability
- Replication strategies
- Sharding approaches
- Read/write separation
- Red Flags: No scaling plan, wrong replication, no sharding
System Design Skills
- Database Architecture
- Master-slave setup
- Sharding design
- High availability
- Red Flags: Single database, no replication, no HA
- Trade-off Analysis
- Consistency vs availability
- Normalization vs denormalization
- Read vs write optimization
- Red Flags: No trade-offs, dogmatic choices, no analysis
Communication Skills
- Clear Explanation
- Explains database concepts
- Discusses trade-offs
- Justifies design decisions
- Red Flags: Unclear explanations, no justification, confusing
Meta-Specific Focus
- Database Expertise
- Deep MySQL knowledge
- Performance optimization
- Scalability patterns
- Key: Demonstrate database expertise
- System Design Skills
- Can design database architecture
- Understands scaling challenges
- Makes informed trade-offs
- Key: Show practical database design skills
Summary
MySQL Key Points:
- Relational Database: ACID-compliant RDBMS
- Storage Engines: InnoDB (default), MyISAM, Memory
- Replication: Master-slave, master-master
- Sharding: Horizontal scaling strategies
- Performance: Indexing, query optimization, caching
- High Availability: Replication, failover, clustering
Common Use Cases:
- Web applications
- Content management systems
- E-commerce platforms
- Enterprise applications
- Data warehousing
Best Practices:
- Use InnoDB for most applications
- Implement replication for HA
- Index frequently queried columns
- Optimize queries with EXPLAIN
- Use connection pooling
- Monitor performance metrics
- Implement proper security
MySQL is a powerful, reliable, and scalable database system that’s essential for building production applications.