Introduction
PostgreSQL (often called Postgres) is a powerful, open-source object-relational database management system (ORDBMS) that has been in active development for over 30 years. Known for its advanced features, standards compliance, and extensibility, PostgreSQL has become one of the most popular relational databases, powering applications at companies like Instagram, Spotify, Uber, and many others.
What is PostgreSQL?
PostgreSQL is an ACID-compliant relational database that extends SQL with:
- Advanced Data Types: JSON, arrays, ranges, geospatial
- Extensibility: Custom functions, data types, operators
- Full SQL Compliance: Supports most SQL standard features
- Concurrency: MVCC (Multi-Version Concurrency Control)
- Reliability: Strong data integrity and crash recovery
Why PostgreSQL?
Key Advantages:
- Advanced Features: JSON support, full-text search, arrays, custom types
- ACID Compliance: Strong consistency guarantees
- Extensibility: Custom functions, operators, data types
- Performance: Excellent query optimizer, parallel execution
- Open Source: Free, open-source, active community
- Standards Compliant: Follows SQL standards closely
Common Use Cases:
- Web applications
- Data warehousing and analytics
- Geospatial applications
- Content management systems
- Financial systems
- Scientific applications
Architecture
High-Level Architecture
┌─────────────────────────────────────────────────────────┐
│ Client Applications │
│ (psql, JDBC, ODBC, Python, etc.) │
└────────────────────┬────────────────────────────────────┘
│
│ Network Connection
│
┌────────────────────▼────────────────────────────────────┐
│ PostgreSQL Server │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Postmaster Process │ │
│ │ (Connection Manager, Process Forker) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Backend Processes │ │
│ │ (One per client connection) │ │
│ │ - Query Parser │ │
│ │ - Query Optimizer │ │
│ │ - Query Executor │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Background Processes │ │
│ │ - Writer (WAL Writer) │ │
│ │ - Checkpointer │ │
│ │ - Autovacuum │ │
│ │ - Archiver │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Shared Memory │ │
│ │ - Shared Buffers │ │
│ │ - WAL Buffers │ │
│ │ - Lock Tables │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ - Tablespaces │ │
│ │ - Data Files │ │
│ │ - WAL Files │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Process Model
PostgreSQL Architecture:
- Postmaster Process: Main process that manages connections
- Backend Processes: One per client connection
- Background Processes: Various utility processes
- Shared Memory: Shared buffers, locks, etc.
Process Types:
- Postmaster: Accepts connections, forks backend processes
- Backend: Handles client queries
- Writer: Writes dirty buffers to disk
- WAL Writer: Writes WAL (Write-Ahead Log) to disk
- Checkpointer: Performs checkpoints
- Autovacuum: Cleans up dead tuples
- Archiver: Archives WAL files
Storage Architecture
Storage Hierarchy:
Database
└── Schema
└── Table
└── Row (Tuple)
Physical Storage:
- Tablespace: Logical storage location
- Database: Collection of schemas
- Schema: Collection of objects (tables, functions, etc.)
- Table: Collection of rows
- Page: 8KB storage unit (default)
- Tuple: Row data
File Organization:
$PGDATA/
├── base/ # Database files
│ └── 12345/ # Database OID
│ └── 12346 # Table file
├── global/ # Cluster-wide tables
├── pg_wal/ # WAL files
└── pg_tblspc/ # Tablespace links
Memory Architecture
Shared Memory:
- Shared Buffers: Cached data pages
- WAL Buffers: WAL data before writing to disk
- Lock Tables: Lock information
- Process Arrays: Process information
Per-Process Memory:
- Work Memory: For sorting, hashing operations
- Maintenance Work Memory: For VACUUM, CREATE INDEX
- Temp Buffers: For temporary tables
Advanced Data Types
JSON and JSONB
JSON Type:
- Stores JSON data as text
- Validates JSON on input
- Slower queries (parsing required)
JSONB Type:
- Binary JSON format
- Faster queries (indexed)
- Supports GIN indexes
- Recommended for most use cases
Example:
-- Create table with JSONB
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
attributes JSONB
);
-- Insert JSON data
INSERT INTO products (name, attributes) VALUES
('Laptop', '{"brand": "Dell", "ram": "16GB", "storage": "512GB"}');
-- Query JSONB
SELECT * FROM products
WHERE attributes->>'brand' = 'Dell';
-- Index JSONB
CREATE INDEX idx_attributes ON products USING GIN (attributes);
JSONB Operators:
->: Get JSON object field (returns JSON)->>: Get JSON object field as text@>: Contains operator?: Key exists operator
Arrays
Array Types:
- Any data type can be an array
- Multi-dimensional arrays supported
- Indexed with GIN indexes
Example:
-- Create table with arrays
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
tags TEXT[],
scores INTEGER[]
);
-- Insert array data
INSERT INTO users (name, tags, scores) VALUES
('John', ARRAY['developer', 'python'], ARRAY[95, 87, 92]);
-- Query arrays
SELECT * FROM users WHERE 'python' = ANY(tags);
SELECT * FROM users WHERE 95 = ANY(scores);
-- Array functions
SELECT array_length(tags, 1) FROM users;
SELECT unnest(tags) FROM users;
Range Types
Range Types:
int4range: Integer rangesint8range: Bigint rangesnumrange: Numeric rangestsrange: Timestamp rangeststzrange: Timestamp with timezone rangesdaterange: Date ranges
Example:
-- Create table with ranges
CREATE TABLE events (
id SERIAL PRIMARY KEY,
name TEXT,
duration TSRANGE
);
-- Insert range data
INSERT INTO events (name, duration) VALUES
('Meeting', '[2024-01-01 10:00, 2024-01-01 11:00)');
-- Query ranges
SELECT * FROM events
WHERE duration @> '2024-01-01 10:30'::timestamp;
-- Range operators
-- @>: Contains
-- @<: Contained by
-- &&: Overlaps
-- -|-: Adjacent
Geospatial (PostGIS)
PostGIS Extension:
- Adds geospatial data types
- Spatial indexing (GIST)
- Spatial functions and operators
Example:
-- Enable PostGIS
CREATE EXTENSION postgis;
-- Create table with geometry
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name TEXT,
geom GEOMETRY(POINT, 4326)
);
-- Insert location
INSERT INTO locations (name, geom) VALUES
('New York', ST_GeomFromText('POINT(-74.0060 40.7128)', 4326));
-- Spatial query
SELECT * FROM locations
WHERE ST_DWithin(
geom,
ST_GeomFromText('POINT(-74.0060 40.7128)', 4326),
1000 -- 1000 meters
);
Advanced Features
Full-Text Search
Built-in Full-Text Search:
tsvector: Preprocessed texttsquery: Search query- GIN indexes for performance
Example:
-- Create table with text
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT
);
-- Add full-text search column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
CREATE INDEX idx_search ON articles USING GIN (search_vector);
-- Update search vector
UPDATE articles SET search_vector =
to_tsvector('english', title || ' ' || content);
-- Full-text search query
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql & database');
Full-Text Search Functions:
to_tsvector(): Convert text to search vectorto_tsquery(): Convert query string to search queryts_rank(): Rank search resultsts_headline(): Highlight matches
Window Functions
Window Functions:
- Perform calculations across rows
- Don’t collapse rows (unlike GROUP BY)
- Use OVER clause
Example:
-- Window functions
SELECT
name,
salary,
department,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
AVG(salary) OVER (PARTITION BY department) as avg_salary,
LAG(salary) OVER (ORDER BY salary) as prev_salary
FROM employees;
Common Window Functions:
ROW_NUMBER(): Sequential row numberRANK(): Rank with gapsDENSE_RANK(): Rank without gapsLAG()/LEAD(): Previous/next valueSUM()/AVG(): Aggregates over windowFIRST_VALUE()/LAST_VALUE(): First/last value
Common Table Expressions (CTEs)
CTEs (WITH clause):
- Named temporary result sets
- Can be recursive
- Improves query readability
Example:
-- Simple CTE
WITH high_salary AS (
SELECT * FROM employees WHERE salary > 100000
)
SELECT * FROM high_salary;
-- Recursive CTE
WITH RECURSIVE employee_hierarchy AS (
-- Base case
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy;
Materialized Views
Materialized Views:
- Pre-computed query results
- Stored on disk
- Must be refreshed manually
Example:
-- Create materialized view
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
product_id,
SUM(quantity) as total_quantity,
SUM(amount) as total_amount
FROM sales
GROUP BY product_id;
-- Create index on materialized view
CREATE INDEX idx_sales_summary ON sales_summary (product_id);
-- Refresh materialized view
REFRESH MATERIALIZED VIEW sales_summary;
-- Concurrent refresh (allows reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;
Partial Indexes
Partial Indexes:
- Index only rows matching condition
- Smaller index size
- Faster queries
Example:
-- Partial index (only active users)
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';
-- Query uses partial index
SELECT * FROM users WHERE status = 'active' AND email = 'user@example.com';
Expression Indexes
Expression Indexes:
- Index on computed expressions
- Useful for function-based queries
Example:
-- Expression index
CREATE INDEX idx_lower_email ON users (LOWER(email));
-- Query uses expression index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
Concurrency Control
MVCC (Multi-Version Concurrency Control)
How MVCC Works:
- Each transaction sees snapshot of data
- Writes create new versions
- Old versions kept until no longer needed
- No read locks needed
Advantages:
- Reads don’t block writes
- Writes don’t block reads
- Better concurrency than locking
Transaction Isolation Levels:
- Read Uncommitted
- Not supported (upgraded to Read Committed)
- Can see uncommitted data
- Read Committed (Default)
- Each statement sees committed data
- Non-repeatable reads possible
- Phantom reads possible
- Repeatable Read
- Consistent snapshot throughout transaction
- Prevents non-repeatable reads
- Phantom reads prevented (serialization errors)
- Serializable
- Strongest isolation
- Transactions appear serialized
- May cause serialization failures
Example:
-- Set isolation level
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Transaction sees consistent snapshot
SELECT * FROM accounts WHERE id = 1; -- Sees value at transaction start
-- Other transactions' commits not visible
COMMIT;
Locking
Lock Types:
- Row-Level Locks
FOR UPDATE: Exclusive lockFOR SHARE: Shared lockFOR NO KEY UPDATE: Lock without key update
- Table-Level Locks
ACCESS SHARE: Read lockACCESS EXCLUSIVE: Write lock- Various other lock modes
Example:
-- Row-level lock
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Other transactions blocked from updating this row
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- Table-level lock
LOCK TABLE accounts IN EXCLUSIVE MODE;
Deadlocks
Deadlock Detection:
- PostgreSQL detects deadlocks automatically
- Aborts one transaction
- Application should retry
Prevention:
- Acquire locks in consistent order
- Keep transactions short
- Use lower isolation levels when possible
Performance Characteristics
Maximum Read & Write Throughput
Single Node (Standard Configuration):
- Max Read Throughput:
- Simple queries (indexed): 10K-50K queries/sec
- Complex queries (joins, aggregations): 1K-10K queries/sec
- With read replicas: 10K-50K queries/sec per replica (linear scaling)
- Max Write Throughput:
- Simple inserts: 5K-20K writes/sec
- Updates with indexes: 2K-10K writes/sec
- Batch inserts: 50K-200K rows/sec
Factors Affecting Throughput:
- Hardware (CPU, RAM, disk I/O)
- Query complexity
- Index usage
- Connection pooling
- Write-Ahead Log (WAL) configuration
- Checkpoint frequency
- Number of connections
- Replication lag (for read replicas)
Optimized Configuration:
- Max Read Throughput: 50K-100K queries/sec (with proper indexing and connection pooling)
- Max Write Throughput: 20K-50K writes/sec (with optimized WAL and checkpoints)
Horizontal Scaling (Read Replicas):
- Max Read Throughput: 10K-50K queries/sec per replica
- Example: 5 read replicas can handle 50K-250K queries/sec total
- Write Throughput: Limited by primary (single write node)
Performance Optimization
Indexing
Index Types:
- B-tree (Default)
- General purpose
- Supports equality and range queries
- Most common index type
- Hash
- Equality queries only
- Faster than B-tree for equality
- No range queries
- GIN (Generalized Inverted Index)
- Arrays, JSONB, full-text search
- Slower updates, faster queries
- GiST (Generalized Search Tree)
- Geospatial, full-text search
- Custom operators
- BRIN (Block Range Index)
- Very large tables
- Minimal storage
- Good for sequential data
Example:
-- B-tree index (default)
CREATE INDEX idx_email ON users (email);
-- Hash index
CREATE INDEX idx_email_hash ON users USING HASH (email);
-- GIN index for JSONB
CREATE INDEX idx_attributes ON products USING GIN (attributes);
-- GiST index for geometry
CREATE INDEX idx_geom ON locations USING GIST (geom);
-- BRIN index for large sequential table
CREATE INDEX idx_timestamp ON events USING BRIN (created_at);
Query Optimization
EXPLAIN and EXPLAIN ANALYZE:
-- Explain query plan
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- Explain with execution statistics
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
Query Plan Analysis:
- Seq Scan: Full table scan (slow for large tables)
- Index Scan: Uses index
- Index Only Scan: Uses index only (fastest)
- Nested Loop: Join algorithm
- Hash Join: Join using hash table
- Merge Join: Join using sorted data
Optimization Tips:
- Use indexes appropriately
- Avoid SELECT *
- Use LIMIT when possible
- Analyze tables regularly
- Update statistics (ANALYZE)
Parallel Query Execution
Parallel Queries:
- Parallel sequential scans
- Parallel joins
- Parallel aggregation
- Configurable worker processes
Configuration:
-- Enable parallel queries
SET max_parallel_workers_per_gather = 4;
SET parallel_setup_cost = 1000;
SET parallel_tuple_cost = 0.01;
When Parallel Execution Works:
- Large tables (> 8MB)
- Sequential scans
- Aggregations
- Joins
Connection Pooling
Why Pooling:
- Creating connections is expensive
- Limited connections (max_connections)
- Reuse connections efficiently
Tools:
- PgBouncer: Lightweight connection pooler
- PgPool-II: Advanced connection pooler
- Application-level: Connection pools in apps
PgBouncer Configuration:
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
Replication and High Availability
Streaming Replication
Master-Replica Architecture:
- Primary: Handles writes
- Standby: Receives WAL stream
- Hot Standby: Can serve read queries
Setup:
-- On primary: postgresql.conf
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
-- On primary: pg_hba.conf
host replication replica_user 192.168.1.2/32 md5
-- On replica: recovery.conf (PostgreSQL 12+)
primary_conninfo = 'host=192.168.1.1 port=5432 user=replica_user'
Synchronous vs Asynchronous:
- Asynchronous: Default, faster, some data loss risk
- Synchronous: Slower, no data loss, requires quorum
Synchronous Replication:
-- Set synchronous standby
ALTER SYSTEM SET synchronous_standby_names = 'standby1,standby2';
SELECT pg_reload_conf();
Logical Replication
Logical Replication:
- Replicate specific tables/databases
- Cross-version replication
- Selective replication
Setup:
-- On primary: Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;
-- On replica: Create subscription
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=primary port=5432 dbname=mydb'
PUBLICATION my_publication;
High Availability Solutions
1. Patroni
- Automatic failover
- ZooKeeper/etcd/Consul coordination
- PostgreSQL cluster management
2. repmgr
- Replication management
- Failover automation
- Monitoring
3. pg_auto_failover
- Automatic failover
- Simple setup
- Citus Data project
Partitioning
Table Partitioning
Partitioning Types:
- Range Partitioning
- Partition by range of values
- Good for dates, numeric ranges
- List Partitioning
- Partition by list of values
- Good for categories, regions
- Hash Partitioning
- Partition by hash function
- Even distribution
Example:
-- Range partitioning
CREATE TABLE sales (
id SERIAL,
sale_date DATE,
amount DECIMAL
) PARTITION BY RANGE (sale_date);
-- Create partitions
CREATE TABLE sales_2024_q1 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE sales_2024_q2 PARTITION OF sales
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- Query automatically uses correct partition
SELECT * FROM sales WHERE sale_date = '2024-02-15';
Partition Pruning:
- PostgreSQL automatically prunes partitions
- Only queries relevant partitions
- Improves performance
Extensibility
Custom Functions
PL/pgSQL Functions:
-- Create function
CREATE OR REPLACE FUNCTION calculate_total(price DECIMAL, quantity INT)
RETURNS DECIMAL AS $$
BEGIN
RETURN price * quantity;
END;
$$ LANGUAGE plpgsql;
-- Use function
SELECT calculate_total(10.50, 3);
SQL Functions:
-- SQL function (simpler)
CREATE FUNCTION get_user_email(user_id INT)
RETURNS TEXT AS $$
SELECT email FROM users WHERE id = user_id;
$$ LANGUAGE SQL;
Other Languages:
- Python (PL/Python)
- Perl (PL/Perl)
- C (most powerful)
Custom Data Types
Create Custom Type:
-- Create composite type
CREATE TYPE address AS (
street TEXT,
city TEXT,
zip_code TEXT
);
-- Use in table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
home_address address
);
Extensions
Popular Extensions:
- PostGIS: Geospatial data
- pg_trgm: Trigram matching
- uuid-ossp: UUID generation
- pg_stat_statements: Query statistics
- pg_partman: Partition management
Install Extension:
CREATE EXTENSION postgis;
CREATE EXTENSION pg_trgm;
Use Cases
1. Web Applications
Characteristics:
- ACID transactions
- Complex queries
- Relational data
- Medium scale
Example:
- E-commerce platforms
- Content management systems
- Social media applications
2. Data Warehousing
Characteristics:
- Large datasets
- Complex analytics
- Aggregations
- Reporting
Features Used:
- Partitioning
- Parallel queries
- Materialized views
- Window functions
3. Geospatial Applications
Characteristics:
- Location data
- Spatial queries
- Mapping applications
Features Used:
- PostGIS extension
- GiST indexes
- Spatial functions
4. Financial Systems
Characteristics:
- ACID compliance critical
- Complex transactions
- Audit trails
- Data integrity
Features Used:
- Strong consistency
- Transactions
- Constraints
- Triggers
5. Scientific Applications
Characteristics:
- Complex data types
- Custom functions
- Large datasets
- Research data
Features Used:
- Arrays
- Custom types
- Extensibility
- JSONB
Best Practices
1. Database Design
✅ Do:
- Normalize appropriately
- Use appropriate data types
- Create indexes on foreign keys
- Use constraints (NOT NULL, CHECK)
- Design for query patterns
❌ Don’t:
- Over-normalize
- Use TEXT for everything
- Create too many indexes
- Ignore constraints
- Premature optimization
2. Indexing
Guidelines:
- Index foreign keys
- Index frequently queried columns
- Use partial indexes when appropriate
- Monitor index usage
- Remove unused indexes
3. Query Optimization
Tips:
- Use EXPLAIN ANALYZE
- Avoid SELECT *
- Use LIMIT when possible
- Use appropriate JOIN types
- Update statistics regularly (ANALYZE)
4. Connection Management
Best Practices:
- Use connection pooling
- Limit connection count
- Close connections properly
- Monitor connection usage
- Use prepared statements
5. Maintenance
Regular Tasks:
- VACUUM: Clean up dead tuples
- ANALYZE: Update statistics
- REINDEX: Rebuild indexes
- Backup: Regular backups
- Monitoring: Monitor performance
Autovacuum:
-- Configure autovacuum
ALTER TABLE my_table SET (
autovacuum_vacuum_scale_factor = 0.1,
autovacuum_analyze_scale_factor = 0.05
);
6. Security
Security Practices:
- Use strong passwords
- Limit user privileges
- Use SSL/TLS for connections
- Regular security updates
- Audit logging
- Row-level security (RLS)
Row-Level Security:
-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
-- Create policy
CREATE POLICY user_policy ON users
FOR ALL
TO app_user
USING (user_id = current_user_id());
Configuration
Key Configuration Parameters
postgresql.conf:
# Memory settings
shared_buffers = 256MB # 25% of RAM for dedicated server
effective_cache_size = 1GB # 50-75% of RAM
work_mem = 4MB # Per operation
maintenance_work_mem = 64MB # For VACUUM, CREATE INDEX
# Connection settings
max_connections = 100 # Adjust based on resources
listen_addresses = '*' # Or specific IPs
# WAL settings
wal_level = replica # For replication
max_wal_size = 1GB # WAL size before checkpoint
min_wal_size = 80MB
# Checkpoint settings
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9
# Query planner
random_page_cost = 1.1 # For SSD (default 4.0 for HDD)
effective_io_concurrency = 200 # For SSD
# Logging
logging_collector = on
log_destination = 'stderr'
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all' # Or 'ddl', 'mod', 'none'
Performance Tuning
Memory Configuration:
# For 8GB RAM server
shared_buffers = 2GB # 25% of RAM
effective_cache_size = 6GB # 75% of RAM
work_mem = 16MB # Adjust based on max_connections
maintenance_work_mem = 512MB
Connection Pooling:
- Use PgBouncer for connection pooling
- Reduce max_connections on PostgreSQL
- Increase pool size in PgBouncer
Monitoring
Key Metrics
Performance Metrics:
- Query latency (p50, p95, p99)
- Throughput (queries/second)
- Cache hit ratio
- Index usage
- Connection count
System Metrics:
- CPU usage
- Memory usage
- Disk I/O
- Network I/O
- Replication lag
Database Metrics:
- Table sizes
- Index sizes
- Dead tuples
- Bloat
- Vacuum status
Monitoring Tools
1. pg_stat_statements
-- Enable extension
CREATE EXTENSION pg_stat_statements;
-- View top queries
SELECT
query,
calls,
total_time,
mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
2. pg_stat_activity
-- View active connections
SELECT
pid,
usename,
application_name,
state,
query
FROM pg_stat_activity
WHERE state = 'active';
3. pg_stat_database
-- Database statistics
SELECT
datname,
numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched
FROM pg_stat_database;
External Tools:
- pgAdmin: GUI administration tool
- Prometheus + postgres_exporter: Metrics collection
- Grafana: Visualization
- pgBadger: Log analysis
Backup and Recovery
Backup Methods
1. pg_dump
# Dump database
pg_dump -U postgres -d mydb > backup.sql
# Dump specific schema
pg_dump -U postgres -d mydb -n public > backup.sql
# Custom format (compressed)
pg_dump -U postgres -d mydb -Fc > backup.dump
2. pg_dumpall
# Dump all databases
pg_dumpall -U postgres > all_databases.sql
3. Continuous Archiving (PITR)
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /backup/wal/%f'
4. pg_basebackup
# Physical backup
pg_basebackup -D /backup/base -Ft -z -P
Recovery
Point-in-Time Recovery:
# recovery.conf (PostgreSQL 12+)
restore_command = 'cp /backup/wal/%f %p'
recovery_target_time = '2024-01-01 12:00:00'
Comparison with Other Databases
PostgreSQL vs MySQL
| Aspect | PostgreSQL | MySQL |
|---|---|---|
| ACID | Full ACID | Full ACID (InnoDB) |
| Advanced Features | JSON, arrays, full-text | Limited |
| Performance | Excellent optimizer | Good |
| Extensibility | Highly extensible | Limited |
| Replication | Streaming, logical | Master-replica |
| Use Cases | Complex apps, analytics | Web apps, simple |
PostgreSQL vs Oracle
| Aspect | PostgreSQL | Oracle |
|---|---|---|
| Cost | Free, open-source | Expensive license |
| Features | Most Oracle features | Enterprise features |
| Performance | Excellent | Excellent |
| Support | Community | Commercial support |
| Use Cases | General purpose | Enterprise, large scale |
PostgreSQL vs MongoDB
| Aspect | PostgreSQL | MongoDB |
|---|---|---|
| Data Model | Relational | Document |
| ACID | Full ACID | Limited |
| Queries | SQL, complex joins | Document queries |
| Consistency | Strong | Eventual |
| Use Cases | Relational data | Flexible schema |
Limitations & Considerations
Limitations
- Horizontal Scaling
- Primarily vertical scaling
- Sharding requires application logic
- No built-in auto-sharding
- Write Throughput
- Lower than NoSQL databases
- Limited by single node
- Can use read replicas for reads
- Complexity
- More complex than simple databases
- Requires tuning for performance
- Steeper learning curve
When NOT to Use PostgreSQL
❌ Not Suitable For:
- Very high write throughput (> 1M writes/sec)
- Simple key-value storage
- Document-only use cases (consider MongoDB)
- Extremely large scale (consider NoSQL)
✅ Suitable For:
- Relational data
- ACID transactions required
- Complex queries
- Analytics and reporting
- Medium to large scale applications
Real-World Examples
Use Case:
- User data
- Photo metadata
- Social graph
- Activity feeds
Scale:
- Millions of users
- Billions of photos
- High read/write throughput
Spotify
Use Case:
- Music metadata
- User playlists
- Recommendations
- Analytics
Scale:
- Millions of songs
- Billions of streams
- Complex queries
Uber
Use Case:
- Trip data
- Driver/rider information
- Analytics
- Some services
Scale:
- Millions of trips
- Real-time data
- Multi-region
What Interviewers Look For
SQL Database Knowledge & Application
- ACID Understanding
- Transaction isolation
- Consistency guarantees
- Red Flags: No ACID understanding, wrong isolation, consistency issues
- Query Optimization
- Index design
- Query planning
- EXPLAIN usage
- Red Flags: No indexes, slow queries, poor optimization
- Advanced Features
- JSON/JSONB support
- Full-text search
- Geospatial
- Red Flags: Not using features, inefficient, poor choice
System Design Skills
- When to Use PostgreSQL
- Relational data
- Complex queries
- ACID requirements
- Red Flags: Wrong use case, simple key-value, can’t justify
- Scalability Design
- Read replicas
- Sharding strategies
- Red Flags: No scaling, single instance, bottlenecks
- Performance Optimization
- Connection pooling
- Query optimization
- Red Flags: No pooling, slow queries, poor performance
Problem-Solving Approach
- Trade-off Analysis
- Consistency vs performance
- Features vs complexity
- Red Flags: No trade-offs, dogmatic choices
- Edge Cases
- Connection limits
- Lock contention
- Query performance
- Red Flags: Ignoring edge cases, no handling
- Schema Design
- Normalization
- Index strategy
- Red Flags: Poor schema, missing indexes, query issues
Communication Skills
- PostgreSQL Explanation
- Can explain PostgreSQL features
- Understands ACID
- Red Flags: No understanding, vague explanations
- Decision Justification
- Explains why PostgreSQL
- Discusses alternatives
- Red Flags: No justification, no alternatives
Meta-Specific Focus
- SQL Database Expertise
- Deep PostgreSQL knowledge
- ACID understanding
- Key: Show SQL database expertise
- Complex Query Design
- Query optimization
- Index design
- Key: Demonstrate query optimization skills
Conclusion
PostgreSQL is a powerful, feature-rich relational database that excels at:
Key Strengths:
- Advanced Features: JSON, arrays, full-text search, geospatial
- ACID Compliance: Strong consistency guarantees
- Extensibility: Custom functions, types, operators
- Performance: Excellent query optimizer, parallel execution
- Reliability: Strong data integrity, crash recovery
Best Use Cases:
- Web applications with relational data
- Data warehousing and analytics
- Geospatial applications
- Financial systems requiring ACID
- Applications needing complex queries
Key Takeaways:
- Use Appropriate Data Types: JSONB, arrays, ranges when needed
- Index Strategically: Index foreign keys, frequently queried columns
- Optimize Queries: Use EXPLAIN ANALYZE, proper indexes
- Connection Pooling: Use PgBouncer or similar
- Regular Maintenance: VACUUM, ANALYZE, backups
- Monitor Performance: Use pg_stat_statements, monitoring tools
PostgreSQL is an excellent choice for applications requiring relational data, ACID transactions, complex queries, and advanced features while maintaining high performance and reliability.