Introduction

PostgreSQL (often called Postgres) is a powerful, open-source object-relational database management system (ORDBMS) that has been in active development for over 30 years. Known for its advanced features, standards compliance, and extensibility, PostgreSQL has become one of the most popular relational databases, powering applications at companies like Instagram, Spotify, Uber, and many others.

What is PostgreSQL?

PostgreSQL is an ACID-compliant relational database that extends SQL with:

  • Advanced Data Types: JSON, arrays, ranges, geospatial
  • Extensibility: Custom functions, data types, operators
  • Full SQL Compliance: Supports most SQL standard features
  • Concurrency: MVCC (Multi-Version Concurrency Control)
  • Reliability: Strong data integrity and crash recovery

Why PostgreSQL?

Key Advantages:

  • Advanced Features: JSON support, full-text search, arrays, custom types
  • ACID Compliance: Strong consistency guarantees
  • Extensibility: Custom functions, operators, data types
  • Performance: Excellent query optimizer, parallel execution
  • Open Source: Free, open-source, active community
  • Standards Compliant: Follows SQL standards closely

Common Use Cases:

  • Web applications
  • Data warehousing and analytics
  • Geospatial applications
  • Content management systems
  • Financial systems
  • Scientific applications

Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────┐
│              Client Applications                        │
│  (psql, JDBC, ODBC, Python, etc.)                      │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ Network Connection
                     │
┌────────────────────▼────────────────────────────────────┐
│              PostgreSQL Server                          │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Postmaster Process                       │   │
│  │  (Connection Manager, Process Forker)             │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Backend Processes                        │   │
│  │  (One per client connection)                      │   │
│  │  - Query Parser                                   │   │
│  │  - Query Optimizer                                │   │
│  │  - Query Executor                                 │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Background Processes                      │   │
│  │  - Writer (WAL Writer)                            │   │
│  │  - Checkpointer                                   │   │
│  │  - Autovacuum                                      │   │
│  │  - Archiver                                        │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Shared Memory                            │   │
│  │  - Shared Buffers                                 │   │
│  │  - WAL Buffers                                    │   │
│  │  - Lock Tables                                    │   │
│  └──────────────────────────────────────────────────┘   │
│                          │                               │
│  ┌──────────────────────────────────────────────────┐   │
│  │         Storage Layer                             │   │
│  │  - Tablespaces                                    │   │
│  │  - Data Files                                     │   │
│  │  - WAL Files                                      │   │
│  └──────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────┘

Process Model

PostgreSQL Architecture:

  • Postmaster Process: Main process that manages connections
  • Backend Processes: One per client connection
  • Background Processes: Various utility processes
  • Shared Memory: Shared buffers, locks, etc.

Process Types:

  • Postmaster: Accepts connections, forks backend processes
  • Backend: Handles client queries
  • Writer: Writes dirty buffers to disk
  • WAL Writer: Writes WAL (Write-Ahead Log) to disk
  • Checkpointer: Performs checkpoints
  • Autovacuum: Cleans up dead tuples
  • Archiver: Archives WAL files

Storage Architecture

Storage Hierarchy:

Database
  └── Schema
      └── Table
          └── Row (Tuple)

Physical Storage:

  • Tablespace: Logical storage location
  • Database: Collection of schemas
  • Schema: Collection of objects (tables, functions, etc.)
  • Table: Collection of rows
  • Page: 8KB storage unit (default)
  • Tuple: Row data

File Organization:

$PGDATA/
  ├── base/          # Database files
  │   └── 12345/     # Database OID
  │       └── 12346   # Table file
  ├── global/        # Cluster-wide tables
  ├── pg_wal/        # WAL files
  └── pg_tblspc/     # Tablespace links

Memory Architecture

Shared Memory:

  • Shared Buffers: Cached data pages
  • WAL Buffers: WAL data before writing to disk
  • Lock Tables: Lock information
  • Process Arrays: Process information

Per-Process Memory:

  • Work Memory: For sorting, hashing operations
  • Maintenance Work Memory: For VACUUM, CREATE INDEX
  • Temp Buffers: For temporary tables

Advanced Data Types

JSON and JSONB

JSON Type:

  • Stores JSON data as text
  • Validates JSON on input
  • Slower queries (parsing required)

JSONB Type:

  • Binary JSON format
  • Faster queries (indexed)
  • Supports GIN indexes
  • Recommended for most use cases

Example:

-- Create table with JSONB
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    attributes JSONB
);

-- Insert JSON data
INSERT INTO products (name, attributes) VALUES
('Laptop', '{"brand": "Dell", "ram": "16GB", "storage": "512GB"}');

-- Query JSONB
SELECT * FROM products 
WHERE attributes->>'brand' = 'Dell';

-- Index JSONB
CREATE INDEX idx_attributes ON products USING GIN (attributes);

JSONB Operators:

  • ->: Get JSON object field (returns JSON)
  • ->>: Get JSON object field as text
  • @>: Contains operator
  • ?: Key exists operator

Arrays

Array Types:

  • Any data type can be an array
  • Multi-dimensional arrays supported
  • Indexed with GIN indexes

Example:

-- Create table with arrays
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT,
    tags TEXT[],
    scores INTEGER[]
);

-- Insert array data
INSERT INTO users (name, tags, scores) VALUES
('John', ARRAY['developer', 'python'], ARRAY[95, 87, 92]);

-- Query arrays
SELECT * FROM users WHERE 'python' = ANY(tags);
SELECT * FROM users WHERE 95 = ANY(scores);

-- Array functions
SELECT array_length(tags, 1) FROM users;
SELECT unnest(tags) FROM users;

Range Types

Range Types:

  • int4range: Integer ranges
  • int8range: Bigint ranges
  • numrange: Numeric ranges
  • tsrange: Timestamp ranges
  • tstzrange: Timestamp with timezone ranges
  • daterange: Date ranges

Example:

-- Create table with ranges
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    name TEXT,
    duration TSRANGE
);

-- Insert range data
INSERT INTO events (name, duration) VALUES
('Meeting', '[2024-01-01 10:00, 2024-01-01 11:00)');

-- Query ranges
SELECT * FROM events 
WHERE duration @> '2024-01-01 10:30'::timestamp;

-- Range operators
-- @>: Contains
-- @<: Contained by
-- &&: Overlaps
-- -|-: Adjacent

Geospatial (PostGIS)

PostGIS Extension:

  • Adds geospatial data types
  • Spatial indexing (GIST)
  • Spatial functions and operators

Example:

-- Enable PostGIS
CREATE EXTENSION postgis;

-- Create table with geometry
CREATE TABLE locations (
    id SERIAL PRIMARY KEY,
    name TEXT,
    geom GEOMETRY(POINT, 4326)
);

-- Insert location
INSERT INTO locations (name, geom) VALUES
('New York', ST_GeomFromText('POINT(-74.0060 40.7128)', 4326));

-- Spatial query
SELECT * FROM locations 
WHERE ST_DWithin(
    geom,
    ST_GeomFromText('POINT(-74.0060 40.7128)', 4326),
    1000  -- 1000 meters
);

Advanced Features

Built-in Full-Text Search:

  • tsvector: Preprocessed text
  • tsquery: Search query
  • GIN indexes for performance

Example:

-- Create table with text
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);

-- Add full-text search column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
CREATE INDEX idx_search ON articles USING GIN (search_vector);

-- Update search vector
UPDATE articles SET search_vector = 
    to_tsvector('english', title || ' ' || content);

-- Full-text search query
SELECT * FROM articles 
WHERE search_vector @@ to_tsquery('english', 'postgresql & database');

Full-Text Search Functions:

  • to_tsvector(): Convert text to search vector
  • to_tsquery(): Convert query string to search query
  • ts_rank(): Rank search results
  • ts_headline(): Highlight matches

Window Functions

Window Functions:

  • Perform calculations across rows
  • Don’t collapse rows (unlike GROUP BY)
  • Use OVER clause

Example:

-- Window functions
SELECT 
    name,
    salary,
    department,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
    AVG(salary) OVER (PARTITION BY department) as avg_salary,
    LAG(salary) OVER (ORDER BY salary) as prev_salary
FROM employees;

Common Window Functions:

  • ROW_NUMBER(): Sequential row number
  • RANK(): Rank with gaps
  • DENSE_RANK(): Rank without gaps
  • LAG() / LEAD(): Previous/next value
  • SUM() / AVG(): Aggregates over window
  • FIRST_VALUE() / LAST_VALUE(): First/last value

Common Table Expressions (CTEs)

CTEs (WITH clause):

  • Named temporary result sets
  • Can be recursive
  • Improves query readability

Example:

-- Simple CTE
WITH high_salary AS (
    SELECT * FROM employees WHERE salary > 100000
)
SELECT * FROM high_salary;

-- Recursive CTE
WITH RECURSIVE employee_hierarchy AS (
    -- Base case
    SELECT id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case
    SELECT e.id, e.name, e.manager_id, eh.level + 1
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy;

Materialized Views

Materialized Views:

  • Pre-computed query results
  • Stored on disk
  • Must be refreshed manually

Example:

-- Create materialized view
CREATE MATERIALIZED VIEW sales_summary AS
SELECT 
    product_id,
    SUM(quantity) as total_quantity,
    SUM(amount) as total_amount
FROM sales
GROUP BY product_id;

-- Create index on materialized view
CREATE INDEX idx_sales_summary ON sales_summary (product_id);

-- Refresh materialized view
REFRESH MATERIALIZED VIEW sales_summary;

-- Concurrent refresh (allows reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;

Partial Indexes

Partial Indexes:

  • Index only rows matching condition
  • Smaller index size
  • Faster queries

Example:

-- Partial index (only active users)
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';

-- Query uses partial index
SELECT * FROM users WHERE status = 'active' AND email = 'user@example.com';

Expression Indexes

Expression Indexes:

  • Index on computed expressions
  • Useful for function-based queries

Example:

-- Expression index
CREATE INDEX idx_lower_email ON users (LOWER(email));

-- Query uses expression index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

Concurrency Control

MVCC (Multi-Version Concurrency Control)

How MVCC Works:

  • Each transaction sees snapshot of data
  • Writes create new versions
  • Old versions kept until no longer needed
  • No read locks needed

Advantages:

  • Reads don’t block writes
  • Writes don’t block reads
  • Better concurrency than locking

Transaction Isolation Levels:

  1. Read Uncommitted
    • Not supported (upgraded to Read Committed)
    • Can see uncommitted data
  2. Read Committed (Default)
    • Each statement sees committed data
    • Non-repeatable reads possible
    • Phantom reads possible
  3. Repeatable Read
    • Consistent snapshot throughout transaction
    • Prevents non-repeatable reads
    • Phantom reads prevented (serialization errors)
  4. Serializable
    • Strongest isolation
    • Transactions appear serialized
    • May cause serialization failures

Example:

-- Set isolation level
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Transaction sees consistent snapshot
SELECT * FROM accounts WHERE id = 1;  -- Sees value at transaction start

-- Other transactions' commits not visible
COMMIT;

Locking

Lock Types:

  1. Row-Level Locks
    • FOR UPDATE: Exclusive lock
    • FOR SHARE: Shared lock
    • FOR NO KEY UPDATE: Lock without key update
  2. Table-Level Locks
    • ACCESS SHARE: Read lock
    • ACCESS EXCLUSIVE: Write lock
    • Various other lock modes

Example:

-- Row-level lock
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Other transactions blocked from updating this row
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

-- Table-level lock
LOCK TABLE accounts IN EXCLUSIVE MODE;

Deadlocks

Deadlock Detection:

  • PostgreSQL detects deadlocks automatically
  • Aborts one transaction
  • Application should retry

Prevention:

  • Acquire locks in consistent order
  • Keep transactions short
  • Use lower isolation levels when possible

Performance Characteristics

Maximum Read & Write Throughput

Single Node (Standard Configuration):

  • Max Read Throughput:
    • Simple queries (indexed): 10K-50K queries/sec
    • Complex queries (joins, aggregations): 1K-10K queries/sec
    • With read replicas: 10K-50K queries/sec per replica (linear scaling)
  • Max Write Throughput:
    • Simple inserts: 5K-20K writes/sec
    • Updates with indexes: 2K-10K writes/sec
    • Batch inserts: 50K-200K rows/sec

Factors Affecting Throughput:

  • Hardware (CPU, RAM, disk I/O)
  • Query complexity
  • Index usage
  • Connection pooling
  • Write-Ahead Log (WAL) configuration
  • Checkpoint frequency
  • Number of connections
  • Replication lag (for read replicas)

Optimized Configuration:

  • Max Read Throughput: 50K-100K queries/sec (with proper indexing and connection pooling)
  • Max Write Throughput: 20K-50K writes/sec (with optimized WAL and checkpoints)

Horizontal Scaling (Read Replicas):

  • Max Read Throughput: 10K-50K queries/sec per replica
  • Example: 5 read replicas can handle 50K-250K queries/sec total
  • Write Throughput: Limited by primary (single write node)

Performance Optimization

Indexing

Index Types:

  1. B-tree (Default)
    • General purpose
    • Supports equality and range queries
    • Most common index type
  2. Hash
    • Equality queries only
    • Faster than B-tree for equality
    • No range queries
  3. GIN (Generalized Inverted Index)
    • Arrays, JSONB, full-text search
    • Slower updates, faster queries
  4. GiST (Generalized Search Tree)
    • Geospatial, full-text search
    • Custom operators
  5. BRIN (Block Range Index)
    • Very large tables
    • Minimal storage
    • Good for sequential data

Example:

-- B-tree index (default)
CREATE INDEX idx_email ON users (email);

-- Hash index
CREATE INDEX idx_email_hash ON users USING HASH (email);

-- GIN index for JSONB
CREATE INDEX idx_attributes ON products USING GIN (attributes);

-- GiST index for geometry
CREATE INDEX idx_geom ON locations USING GIST (geom);

-- BRIN index for large sequential table
CREATE INDEX idx_timestamp ON events USING BRIN (created_at);

Query Optimization

EXPLAIN and EXPLAIN ANALYZE:

-- Explain query plan
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

-- Explain with execution statistics
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

Query Plan Analysis:

  • Seq Scan: Full table scan (slow for large tables)
  • Index Scan: Uses index
  • Index Only Scan: Uses index only (fastest)
  • Nested Loop: Join algorithm
  • Hash Join: Join using hash table
  • Merge Join: Join using sorted data

Optimization Tips:

  • Use indexes appropriately
  • Avoid SELECT *
  • Use LIMIT when possible
  • Analyze tables regularly
  • Update statistics (ANALYZE)

Parallel Query Execution

Parallel Queries:

  • Parallel sequential scans
  • Parallel joins
  • Parallel aggregation
  • Configurable worker processes

Configuration:

-- Enable parallel queries
SET max_parallel_workers_per_gather = 4;
SET parallel_setup_cost = 1000;
SET parallel_tuple_cost = 0.01;

When Parallel Execution Works:

  • Large tables (> 8MB)
  • Sequential scans
  • Aggregations
  • Joins

Connection Pooling

Why Pooling:

  • Creating connections is expensive
  • Limited connections (max_connections)
  • Reuse connections efficiently

Tools:

  • PgBouncer: Lightweight connection pooler
  • PgPool-II: Advanced connection pooler
  • Application-level: Connection pools in apps

PgBouncer Configuration:

[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

Replication and High Availability

Streaming Replication

Master-Replica Architecture:

  • Primary: Handles writes
  • Standby: Receives WAL stream
  • Hot Standby: Can serve read queries

Setup:

-- On primary: postgresql.conf
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3

-- On primary: pg_hba.conf
host replication replica_user 192.168.1.2/32 md5

-- On replica: recovery.conf (PostgreSQL 12+)
primary_conninfo = 'host=192.168.1.1 port=5432 user=replica_user'

Synchronous vs Asynchronous:

  • Asynchronous: Default, faster, some data loss risk
  • Synchronous: Slower, no data loss, requires quorum

Synchronous Replication:

-- Set synchronous standby
ALTER SYSTEM SET synchronous_standby_names = 'standby1,standby2';
SELECT pg_reload_conf();

Logical Replication

Logical Replication:

  • Replicate specific tables/databases
  • Cross-version replication
  • Selective replication

Setup:

-- On primary: Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;

-- On replica: Create subscription
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=primary port=5432 dbname=mydb'
PUBLICATION my_publication;

High Availability Solutions

1. Patroni

  • Automatic failover
  • ZooKeeper/etcd/Consul coordination
  • PostgreSQL cluster management

2. repmgr

  • Replication management
  • Failover automation
  • Monitoring

3. pg_auto_failover

  • Automatic failover
  • Simple setup
  • Citus Data project

Partitioning

Table Partitioning

Partitioning Types:

  1. Range Partitioning
    • Partition by range of values
    • Good for dates, numeric ranges
  2. List Partitioning
    • Partition by list of values
    • Good for categories, regions
  3. Hash Partitioning
    • Partition by hash function
    • Even distribution

Example:

-- Range partitioning
CREATE TABLE sales (
    id SERIAL,
    sale_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (sale_date);

-- Create partitions
CREATE TABLE sales_2024_q1 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE sales_2024_q2 PARTITION OF sales
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Query automatically uses correct partition
SELECT * FROM sales WHERE sale_date = '2024-02-15';

Partition Pruning:

  • PostgreSQL automatically prunes partitions
  • Only queries relevant partitions
  • Improves performance

Extensibility

Custom Functions

PL/pgSQL Functions:

-- Create function
CREATE OR REPLACE FUNCTION calculate_total(price DECIMAL, quantity INT)
RETURNS DECIMAL AS $$
BEGIN
    RETURN price * quantity;
END;
$$ LANGUAGE plpgsql;

-- Use function
SELECT calculate_total(10.50, 3);

SQL Functions:

-- SQL function (simpler)
CREATE FUNCTION get_user_email(user_id INT)
RETURNS TEXT AS $$
    SELECT email FROM users WHERE id = user_id;
$$ LANGUAGE SQL;

Other Languages:

  • Python (PL/Python)
  • Perl (PL/Perl)
  • C (most powerful)

Custom Data Types

Create Custom Type:

-- Create composite type
CREATE TYPE address AS (
    street TEXT,
    city TEXT,
    zip_code TEXT
);

-- Use in table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT,
    home_address address
);

Extensions

Popular Extensions:

  • PostGIS: Geospatial data
  • pg_trgm: Trigram matching
  • uuid-ossp: UUID generation
  • pg_stat_statements: Query statistics
  • pg_partman: Partition management

Install Extension:

CREATE EXTENSION postgis;
CREATE EXTENSION pg_trgm;

Use Cases

1. Web Applications

Characteristics:

  • ACID transactions
  • Complex queries
  • Relational data
  • Medium scale

Example:

  • E-commerce platforms
  • Content management systems
  • Social media applications

2. Data Warehousing

Characteristics:

  • Large datasets
  • Complex analytics
  • Aggregations
  • Reporting

Features Used:

  • Partitioning
  • Parallel queries
  • Materialized views
  • Window functions

3. Geospatial Applications

Characteristics:

  • Location data
  • Spatial queries
  • Mapping applications

Features Used:

  • PostGIS extension
  • GiST indexes
  • Spatial functions

4. Financial Systems

Characteristics:

  • ACID compliance critical
  • Complex transactions
  • Audit trails
  • Data integrity

Features Used:

  • Strong consistency
  • Transactions
  • Constraints
  • Triggers

5. Scientific Applications

Characteristics:

  • Complex data types
  • Custom functions
  • Large datasets
  • Research data

Features Used:

  • Arrays
  • Custom types
  • Extensibility
  • JSONB

Best Practices

1. Database Design

✅ Do:

  • Normalize appropriately
  • Use appropriate data types
  • Create indexes on foreign keys
  • Use constraints (NOT NULL, CHECK)
  • Design for query patterns

❌ Don’t:

  • Over-normalize
  • Use TEXT for everything
  • Create too many indexes
  • Ignore constraints
  • Premature optimization

2. Indexing

Guidelines:

  • Index foreign keys
  • Index frequently queried columns
  • Use partial indexes when appropriate
  • Monitor index usage
  • Remove unused indexes

3. Query Optimization

Tips:

  • Use EXPLAIN ANALYZE
  • Avoid SELECT *
  • Use LIMIT when possible
  • Use appropriate JOIN types
  • Update statistics regularly (ANALYZE)

4. Connection Management

Best Practices:

  • Use connection pooling
  • Limit connection count
  • Close connections properly
  • Monitor connection usage
  • Use prepared statements

5. Maintenance

Regular Tasks:

  • VACUUM: Clean up dead tuples
  • ANALYZE: Update statistics
  • REINDEX: Rebuild indexes
  • Backup: Regular backups
  • Monitoring: Monitor performance

Autovacuum:

-- Configure autovacuum
ALTER TABLE my_table SET (
    autovacuum_vacuum_scale_factor = 0.1,
    autovacuum_analyze_scale_factor = 0.05
);

6. Security

Security Practices:

  • Use strong passwords
  • Limit user privileges
  • Use SSL/TLS for connections
  • Regular security updates
  • Audit logging
  • Row-level security (RLS)

Row-Level Security:

-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY user_policy ON users
    FOR ALL
    TO app_user
    USING (user_id = current_user_id());

Configuration

Key Configuration Parameters

postgresql.conf:

# Memory settings
shared_buffers = 256MB          # 25% of RAM for dedicated server
effective_cache_size = 1GB      # 50-75% of RAM
work_mem = 4MB                  # Per operation
maintenance_work_mem = 64MB     # For VACUUM, CREATE INDEX

# Connection settings
max_connections = 100           # Adjust based on resources
listen_addresses = '*'          # Or specific IPs

# WAL settings
wal_level = replica             # For replication
max_wal_size = 1GB             # WAL size before checkpoint
min_wal_size = 80MB

# Checkpoint settings
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9

# Query planner
random_page_cost = 1.1         # For SSD (default 4.0 for HDD)
effective_io_concurrency = 200  # For SSD

# Logging
logging_collector = on
log_destination = 'stderr'
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all'           # Or 'ddl', 'mod', 'none'

Performance Tuning

Memory Configuration:

# For 8GB RAM server
shared_buffers = 2GB            # 25% of RAM
effective_cache_size = 6GB      # 75% of RAM
work_mem = 16MB                 # Adjust based on max_connections
maintenance_work_mem = 512MB

Connection Pooling:

  • Use PgBouncer for connection pooling
  • Reduce max_connections on PostgreSQL
  • Increase pool size in PgBouncer

Monitoring

Key Metrics

Performance Metrics:

  • Query latency (p50, p95, p99)
  • Throughput (queries/second)
  • Cache hit ratio
  • Index usage
  • Connection count

System Metrics:

  • CPU usage
  • Memory usage
  • Disk I/O
  • Network I/O
  • Replication lag

Database Metrics:

  • Table sizes
  • Index sizes
  • Dead tuples
  • Bloat
  • Vacuum status

Monitoring Tools

1. pg_stat_statements

-- Enable extension
CREATE EXTENSION pg_stat_statements;

-- View top queries
SELECT 
    query,
    calls,
    total_time,
    mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

2. pg_stat_activity

-- View active connections
SELECT 
    pid,
    usename,
    application_name,
    state,
    query
FROM pg_stat_activity
WHERE state = 'active';

3. pg_stat_database

-- Database statistics
SELECT 
    datname,
    numbackends,
    xact_commit,
    xact_rollback,
    blks_read,
    blks_hit,
    tup_returned,
    tup_fetched
FROM pg_stat_database;

External Tools:

  • pgAdmin: GUI administration tool
  • Prometheus + postgres_exporter: Metrics collection
  • Grafana: Visualization
  • pgBadger: Log analysis

Backup and Recovery

Backup Methods

1. pg_dump

# Dump database
pg_dump -U postgres -d mydb > backup.sql

# Dump specific schema
pg_dump -U postgres -d mydb -n public > backup.sql

# Custom format (compressed)
pg_dump -U postgres -d mydb -Fc > backup.dump

2. pg_dumpall

# Dump all databases
pg_dumpall -U postgres > all_databases.sql

3. Continuous Archiving (PITR)

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /backup/wal/%f'

4. pg_basebackup

# Physical backup
pg_basebackup -D /backup/base -Ft -z -P

Recovery

Point-in-Time Recovery:

# recovery.conf (PostgreSQL 12+)
restore_command = 'cp /backup/wal/%f %p'
recovery_target_time = '2024-01-01 12:00:00'

Comparison with Other Databases

PostgreSQL vs MySQL

Aspect PostgreSQL MySQL
ACID Full ACID Full ACID (InnoDB)
Advanced Features JSON, arrays, full-text Limited
Performance Excellent optimizer Good
Extensibility Highly extensible Limited
Replication Streaming, logical Master-replica
Use Cases Complex apps, analytics Web apps, simple

PostgreSQL vs Oracle

Aspect PostgreSQL Oracle
Cost Free, open-source Expensive license
Features Most Oracle features Enterprise features
Performance Excellent Excellent
Support Community Commercial support
Use Cases General purpose Enterprise, large scale

PostgreSQL vs MongoDB

Aspect PostgreSQL MongoDB
Data Model Relational Document
ACID Full ACID Limited
Queries SQL, complex joins Document queries
Consistency Strong Eventual
Use Cases Relational data Flexible schema

Limitations & Considerations

Limitations

  1. Horizontal Scaling
    • Primarily vertical scaling
    • Sharding requires application logic
    • No built-in auto-sharding
  2. Write Throughput
    • Lower than NoSQL databases
    • Limited by single node
    • Can use read replicas for reads
  3. Complexity
    • More complex than simple databases
    • Requires tuning for performance
    • Steeper learning curve

When NOT to Use PostgreSQL

❌ Not Suitable For:

  • Very high write throughput (> 1M writes/sec)
  • Simple key-value storage
  • Document-only use cases (consider MongoDB)
  • Extremely large scale (consider NoSQL)

✅ Suitable For:

  • Relational data
  • ACID transactions required
  • Complex queries
  • Analytics and reporting
  • Medium to large scale applications

Real-World Examples

Instagram

Use Case:

  • User data
  • Photo metadata
  • Social graph
  • Activity feeds

Scale:

  • Millions of users
  • Billions of photos
  • High read/write throughput

Spotify

Use Case:

  • Music metadata
  • User playlists
  • Recommendations
  • Analytics

Scale:

  • Millions of songs
  • Billions of streams
  • Complex queries

Uber

Use Case:

  • Trip data
  • Driver/rider information
  • Analytics
  • Some services

Scale:

  • Millions of trips
  • Real-time data
  • Multi-region

What Interviewers Look For

SQL Database Knowledge & Application

  1. ACID Understanding
    • Transaction isolation
    • Consistency guarantees
    • Red Flags: No ACID understanding, wrong isolation, consistency issues
  2. Query Optimization
    • Index design
    • Query planning
    • EXPLAIN usage
    • Red Flags: No indexes, slow queries, poor optimization
  3. Advanced Features
    • JSON/JSONB support
    • Full-text search
    • Geospatial
    • Red Flags: Not using features, inefficient, poor choice

System Design Skills

  1. When to Use PostgreSQL
    • Relational data
    • Complex queries
    • ACID requirements
    • Red Flags: Wrong use case, simple key-value, can’t justify
  2. Scalability Design
    • Read replicas
    • Sharding strategies
    • Red Flags: No scaling, single instance, bottlenecks
  3. Performance Optimization
    • Connection pooling
    • Query optimization
    • Red Flags: No pooling, slow queries, poor performance

Problem-Solving Approach

  1. Trade-off Analysis
    • Consistency vs performance
    • Features vs complexity
    • Red Flags: No trade-offs, dogmatic choices
  2. Edge Cases
    • Connection limits
    • Lock contention
    • Query performance
    • Red Flags: Ignoring edge cases, no handling
  3. Schema Design
    • Normalization
    • Index strategy
    • Red Flags: Poor schema, missing indexes, query issues

Communication Skills

  1. PostgreSQL Explanation
    • Can explain PostgreSQL features
    • Understands ACID
    • Red Flags: No understanding, vague explanations
  2. Decision Justification
    • Explains why PostgreSQL
    • Discusses alternatives
    • Red Flags: No justification, no alternatives

Meta-Specific Focus

  1. SQL Database Expertise
    • Deep PostgreSQL knowledge
    • ACID understanding
    • Key: Show SQL database expertise
  2. Complex Query Design
    • Query optimization
    • Index design
    • Key: Demonstrate query optimization skills

Conclusion

PostgreSQL is a powerful, feature-rich relational database that excels at:

Key Strengths:

  • Advanced Features: JSON, arrays, full-text search, geospatial
  • ACID Compliance: Strong consistency guarantees
  • Extensibility: Custom functions, types, operators
  • Performance: Excellent query optimizer, parallel execution
  • Reliability: Strong data integrity, crash recovery

Best Use Cases:

  • Web applications with relational data
  • Data warehousing and analytics
  • Geospatial applications
  • Financial systems requiring ACID
  • Applications needing complex queries

Key Takeaways:

  1. Use Appropriate Data Types: JSONB, arrays, ranges when needed
  2. Index Strategically: Index foreign keys, frequently queried columns
  3. Optimize Queries: Use EXPLAIN ANALYZE, proper indexes
  4. Connection Pooling: Use PgBouncer or similar
  5. Regular Maintenance: VACUUM, ANALYZE, backups
  6. Monitor Performance: Use pg_stat_statements, monitoring tools

PostgreSQL is an excellent choice for applications requiring relational data, ACID transactions, complex queries, and advanced features while maintaining high performance and reliability.