Introduction

MySQL is one of the most popular open-source relational database management systems (RDBMS) in the world. It’s widely used for web applications, content management systems, and enterprise applications. Understanding MySQL is essential for system design interviews and building scalable database systems.

This guide covers:

  • MySQL Fundamentals: Core concepts, architecture, and storage engines
  • Replication: Master-slave, master-master, and high availability
  • Sharding: Horizontal scaling strategies
  • Performance Optimization: Indexing, query optimization, and caching
  • Best Practices: Schema design, security, and monitoring

What is MySQL?

MySQL is an open-source relational database management system that:

  • Stores Data: Tables, rows, and columns with relationships
  • ACID Compliance: Transaction support with consistency guarantees
  • SQL Interface: Standard SQL query language
  • Replication: Built-in replication for high availability
  • Scalability: Vertical and horizontal scaling options

Key Concepts

Database: Collection of tables

Table: Collection of rows (records) and columns (fields)

Row: Single record in a table

Column: Field in a table

Index: Data structure for fast lookups

Primary Key: Unique identifier for each row

Foreign Key: Reference to another table’s primary key

Transaction: Atomic unit of work (ACID)

Replication: Copying data to multiple servers

Architecture

High-Level Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client    │────▶│   Client    │────▶│   Client    │
│ Application │     │ Application │     │ Application │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                    │                    │
       └────────────────────┴────────────────────┘
                            │
                            ▼
              ┌─────────────────────────┐
              │   Connection Pool      │
              │   (Load Balancer)      │
              └──────┬──────────────────┘
                     │
       ┌─────────────┴─────────────┐
       │                           │
┌──────▼──────┐           ┌───────▼──────┐
│   MySQL     │           │   MySQL      │
│   Master    │           │   Replica    │
│   (Writes)  │           │   (Reads)    │
└─────────────┘           └─────────────┘

Explanation:

  • Client Applications: Applications that connect to MySQL to store and retrieve data (e.g., web applications, APIs, microservices).
  • Connection Pool (Load Balancer): Manages database connections and can distribute read requests across replicas.
  • MySQL Master: The primary server that handles all write operations and replicates changes to replicas.
  • MySQL Replica: A copy of the master database that handles read operations and provides high availability.

Core Architecture

┌─────────────────────────────────────────────────────────┐
│                    MySQL Server                          │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Connection Layer                         │   │
│  │  (Thread Pool, Connection Management)            │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         SQL Parser & Optimizer                   │   │
│  │  (Query Parsing, Optimization, Execution Plan)   │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Storage Engine Layer                     │   │
│  │  (InnoDB, MyISAM, Memory, etc.)                  │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         File System                              │   │
│  │  (Data Files, Log Files, Index Files)            │   │
│  └──────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────┘

Storage Engines

InnoDB (Default)

Characteristics:

  • ACID transactions
  • Row-level locking
  • Foreign key constraints
  • Crash recovery
  • Clustered indexes

Use Cases:

  • Most applications (default choice)
  • Transaction-heavy workloads
  • Applications requiring ACID

Example:

CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    email VARCHAR(100) UNIQUE,
    INDEX idx_email (email)
) ENGINE=InnoDB;

MyISAM

Characteristics:

  • Table-level locking
  • Fast reads
  • No transactions
  • Full-text search
  • Lower overhead

Use Cases:

  • Read-heavy workloads
  • Full-text search
  • Logging tables
  • Note: Deprecated in MySQL 8.0+

Memory (HEAP)

Characteristics:

  • In-memory storage
  • Very fast
  • Data lost on restart
  • Hash indexes

Use Cases:

  • Temporary tables
  • Caching
  • Session storage

Other Engines

Archive: Compressed storage for historical data

CSV: CSV file storage

Federated: Access remote tables

Replication

Master-Slave Replication

Architecture:

Master Server (Write)
    │
    │ Binary Log
    │
    ├──→ Slave 1 (Read)
    ├──→ Slave 2 (Read)
    └──→ Slave 3 (Read)

Configuration:

Master:

-- Enable binary logging
[mysqld]
log-bin=mysql-bin
server-id=1

Slave:

[mysqld]
server-id=2
relay-log=mysql-relay-bin
read-only=1

Setup:

-- On Master
CREATE USER 'replicator'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';

-- On Slave
CHANGE MASTER TO
  MASTER_HOST='master.example.com',
  MASTER_USER='replicator',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=0;

START SLAVE;

Benefits:

  • Read scaling
  • High availability
  • Backup from slave
  • Geographic distribution

Master-Master Replication

Architecture:

Master 1 ←→ Master 2
    │          │
    └────┬─────┘
         │
    Read Replicas

Use Cases:

  • Multi-master writes
  • Geographic distribution
  • Failover scenarios

Challenges:

  • Conflict resolution
  • Data consistency
  • Complexity

Replication Types

Statement-Based Replication (SBR):

  • Replicates SQL statements
  • Smaller log size
  • Non-deterministic issues

Row-Based Replication (RBR):

  • Replicates row changes
  • More data
  • More reliable

Mixed Replication:

  • Uses SBR by default
  • Switches to RBR when needed

Sharding

Horizontal Sharding

Strategy:

Database
    │
    ├──→ Shard 1 (users 0-999)
    ├──→ Shard 2 (users 1000-1999)
    └──→ Shard 3 (users 2000-2999)

Sharding Key:

-- Shard by user_id
shard_id = user_id % num_shards

-- Example
user_id = 1234
shard_id = 1234 % 3 = 1

Sharding Strategies:

1. Range-Based Sharding:

-- Shard by date range
Shard 1: 2024-01-01 to 2024-06-30
Shard 2: 2024-07-01 to 2024-12-31

2. Hash-Based Sharding:

-- Shard by hash
shard_id = hash(user_id) % num_shards

3. Directory-Based Sharding:

-- Lookup table for shard mapping
shard_mapping[user_id] = shard_id

Sharding Challenges

Cross-Shard Queries:

  • Joins across shards
  • Aggregations
  • Transactions

Solutions:

  • Denormalization
  • Application-level joins
  • Distributed transactions (2PC)

Rebalancing:

  • Moving data between shards
  • Maintaining availability
  • Consistency

Performance Characteristics

Maximum Read & Write Throughput

Single Node (InnoDB Engine):

  • Max Read Throughput:
    • Simple queries (indexed): 10K-50K queries/sec
    • Complex queries (joins): 1K-10K queries/sec
    • With read replicas: 10K-50K queries/sec per replica (linear scaling)
  • Max Write Throughput:
    • Simple inserts: 5K-25K writes/sec
    • Updates with indexes: 2K-10K writes/sec
    • Batch inserts: 50K-200K rows/sec

Factors Affecting Throughput:

  • Storage engine (InnoDB vs MyISAM)
  • Hardware (CPU, RAM, disk I/O, SSD vs HDD)
  • Query complexity and indexing
  • Connection pooling
  • Binary log configuration
  • Buffer pool size
  • Number of connections
  • Replication lag (for read replicas)

Optimized Configuration:

  • Max Read Throughput: 50K-100K queries/sec (with proper indexing, connection pooling, and read replicas)
  • Max Write Throughput: 25K-50K writes/sec (with optimized InnoDB settings and SSD storage)

Horizontal Scaling (Read Replicas):

  • Max Read Throughput: 10K-50K queries/sec per replica
  • Example: 5 read replicas can handle 50K-250K queries/sec total
  • Write Throughput: Limited by master (single write node)

Performance Optimization

Indexing

Primary Index:

CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

Secondary Index:

CREATE INDEX idx_email ON users(email);

Composite Index:

CREATE INDEX idx_name_email ON users(name, email);

Index Types:

  • B-Tree: Default, good for range queries
  • Hash: Exact match lookups
  • Full-Text: Text search
  • Spatial: Geographic data

Index Best Practices:

  • Index frequently queried columns
  • Avoid over-indexing (slows writes)
  • Use composite indexes for multi-column queries
  • Monitor index usage

Query Optimization

EXPLAIN:

EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

Query Optimization Tips:

  • Use indexes
  • Avoid SELECT *
  • Limit result sets
  • Use appropriate JOIN types
  • Avoid subqueries when possible

Example:

-- Slow
SELECT * FROM users WHERE name LIKE '%john%';

-- Better (if possible)
SELECT * FROM users WHERE name LIKE 'john%';

-- Best (if exact match)
SELECT * FROM users WHERE name = 'john';

Caching

Query Cache:

[mysqld]
query_cache_type=1
query_cache_size=256M

Application-Level Caching:

  • Redis/Memcached
  • Cache frequently accessed data
  • Cache query results

Connection Pooling

Benefits:

  • Reuse connections
  • Reduce overhead
  • Limit connections

Configuration:

[mysqld]
max_connections=200
thread_cache_size=50

Schema Design

Normalization

First Normal Form (1NF):

  • Atomic values
  • No repeating groups

Second Normal Form (2NF):

  • 1NF + no partial dependencies

Third Normal Form (3NF):

  • 2NF + no transitive dependencies

Example:

-- Denormalized
CREATE TABLE orders (
    id INT,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    product_name VARCHAR(100),
    product_price DECIMAL(10,2)
);

-- Normalized
CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Denormalization

When to Denormalize:

  • Read-heavy workloads
  • Performance critical queries
  • Reduce JOINs

Example:

-- Denormalized for performance
CREATE TABLE user_posts (
    id INT PRIMARY KEY,
    user_id INT,
    user_name VARCHAR(100),  -- Denormalized
    post_content TEXT,
    created_at TIMESTAMP
);

Transactions

ACID Properties

Atomicity:

  • All or nothing
  • Rollback on failure

Consistency:

  • Valid state transitions
  • Constraints maintained

Isolation:

  • Concurrent transactions don’t interfere
  • Isolation levels

Durability:

  • Committed changes persist
  • Crash recovery

Isolation Levels

READ UNCOMMITTED:

  • Lowest isolation
  • Dirty reads possible

READ COMMITTED:

  • No dirty reads
  • Non-repeatable reads possible

REPEATABLE READ (Default):

  • Consistent reads
  • Phantom reads possible

SERIALIZABLE:

  • Highest isolation
  • No concurrency issues

Example:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

High Availability

Master-Slave with Failover

Architecture:

Master (Active)
    │
    ├──→ Slave 1 (Standby)
    └──→ Slave 2 (Standby)

Failover Process:

  1. Detect master failure
  2. Promote slave to master
  3. Update application configuration
  4. Reconnect clients

Tools:

  • MySQL Router
  • ProxySQL
  • MHA (Master High Availability)

MySQL Cluster (NDB)

Architecture:

  • Shared-nothing architecture
  • Data nodes
  • SQL nodes
  • Management nodes

Use Cases:

  • High availability
  • Real-time applications
  • In-memory performance

Security

User Management

Create User:

CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'password';

Grant Privileges:

GRANT SELECT, INSERT, UPDATE ON database.* TO 'app_user'@'localhost';
FLUSH PRIVILEGES;

Principle of Least Privilege:

  • Grant minimum required privileges
  • Separate users for different purposes
  • Regular privilege audits

Encryption

Data at Rest:

[mysqld]
encrypt_binlog=ON

Data in Transit:

  • SSL/TLS connections
  • Encrypted replication

Application-Level:

  • Encrypt sensitive fields
  • Use strong hashing (bcrypt, Argon2)

Monitoring

Key Metrics

Performance:

  • Query response time
  • Throughput (QPS)
  • Connection count
  • Cache hit ratio

Resources:

  • CPU usage
  • Memory usage
  • Disk I/O
  • Network I/O

Replication:

  • Replication lag
  • Slave status
  • Binary log size

Tools

MySQL Workbench:

  • GUI administration
  • Query execution
  • Performance monitoring

Percona Monitoring:

  • Comprehensive monitoring
  • Alerting
  • Performance insights

Prometheus + Grafana:

  • Metrics collection
  • Visualization
  • Alerting

Best Practices

Schema Design

  1. Use Appropriate Data Types:
    • INT vs BIGINT
    • VARCHAR vs TEXT
    • DECIMAL vs FLOAT
  2. Index Strategically:
    • Primary keys
    • Foreign keys
    • Frequently queried columns
  3. Normalize When Appropriate:
    • Balance normalization vs performance
    • Denormalize for read-heavy workloads

Query Optimization

  1. Use EXPLAIN:
    • Analyze query plans
    • Identify bottlenecks
  2. Avoid N+1 Queries:
    • Use JOINs
    • Batch queries
  3. Limit Result Sets:
    • Use LIMIT
    • Pagination

Performance

  1. Connection Pooling:
    • Reuse connections
    • Limit max connections
  2. Caching:
    • Query cache
    • Application cache
  3. Read Replicas:
    • Scale reads
    • Reduce master load

What Interviewers Look For

Database Fundamentals

  1. SQL Proficiency
    • Write efficient queries
    • Understand JOINs
    • Index usage
    • Red Flags: Poor SQL, no index understanding, inefficient queries
  2. Schema Design
    • Normalization
    • Index design
    • Data types
    • Red Flags: Poor schema, no normalization, wrong data types
  3. ACID Understanding
    • Transaction concepts
    • Isolation levels
    • Consistency guarantees
    • Red Flags: No ACID understanding, wrong isolation, no transactions

Problem-Solving Approach

  1. Performance Optimization
    • Query optimization
    • Index design
    • Caching strategies
    • Red Flags: No optimization, poor indexes, no caching
  2. Scalability
    • Replication strategies
    • Sharding approaches
    • Read/write separation
    • Red Flags: No scaling plan, wrong replication, no sharding

System Design Skills

  1. Database Architecture
    • Master-slave setup
    • Sharding design
    • High availability
    • Red Flags: Single database, no replication, no HA
  2. Trade-off Analysis
    • Consistency vs availability
    • Normalization vs denormalization
    • Read vs write optimization
    • Red Flags: No trade-offs, dogmatic choices, no analysis

Communication Skills

  1. Clear Explanation
    • Explains database concepts
    • Discusses trade-offs
    • Justifies design decisions
    • Red Flags: Unclear explanations, no justification, confusing

Meta-Specific Focus

  1. Database Expertise
    • Deep MySQL knowledge
    • Performance optimization
    • Scalability patterns
    • Key: Demonstrate database expertise
  2. System Design Skills
    • Can design database architecture
    • Understands scaling challenges
    • Makes informed trade-offs
    • Key: Show practical database design skills

Summary

MySQL Key Points:

  • Relational Database: ACID-compliant RDBMS
  • Storage Engines: InnoDB (default), MyISAM, Memory
  • Replication: Master-slave, master-master
  • Sharding: Horizontal scaling strategies
  • Performance: Indexing, query optimization, caching
  • High Availability: Replication, failover, clustering

Common Use Cases:

  • Web applications
  • Content management systems
  • E-commerce platforms
  • Enterprise applications
  • Data warehousing

Best Practices:

  • Use InnoDB for most applications
  • Implement replication for HA
  • Index frequently queried columns
  • Optimize queries with EXPLAIN
  • Use connection pooling
  • Monitor performance metrics
  • Implement proper security

MySQL is a powerful, reliable, and scalable database system that’s essential for building production applications.