Introduction
Apache Pinot is a real-time distributed OLAP datastore designed to provide low-latency analytics on large-scale data. It’s optimized for analytical queries with sub-second latency. Understanding Pinot is essential for system design interviews involving real-time analytics and OLAP workloads.
This guide covers:
- Pinot Fundamentals: Tables, segments, and real-time ingestion
- Data Model: Dimensions, metrics, and time columns
- Querying: SQL and PQL query languages
- Performance: Star-tree indexing and optimization
- Best Practices: Schema design, partitioning, and scaling
What is Apache Pinot?
Apache Pinot is a real-time OLAP database that:
- Real-Time Ingestion: Stream data ingestion
- Low Latency: Sub-second query performance
- Column-Oriented: Optimized for analytical queries
- Scalable: Horizontal scaling
- SQL Interface: Standard SQL queries
Key Concepts
Table: Logical collection of data
Segment: Unit of storage and replication
Dimension: Column used for filtering and grouping
Metric: Column used for aggregation
Time Column: Timestamp column for partitioning
Star-Tree Index: Pre-aggregated index for fast queries
Architecture
High-Level Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────▶│ Client │────▶│ Client │
│ Application │ │ Application │ │ Application │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────┴────────────────────┘
│
│ SQL Queries
│
▼
┌─────────────────────────┐
│ Pinot Broker │
│ (Query Router) │
└──────┬──────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌──────▼──────┐ ┌───────▼──────┐
│ Server │ │ Minion │
│ Nodes │ │ (Ingestion) │
│ (Segments) │ │ │
└──────┬──────┘ └───────┬──────┘
│ │
└──────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ Deep Storage │
│ (S3, HDFS) │
└─────────────────────────┘
Explanation:
- Client Applications: Applications that query Pinot for real-time OLAP analytics (e.g., dashboards, analytics platforms, ad-hoc queries).
- Pinot Broker: Routes queries to appropriate server nodes and merges results from multiple segments.
- Server Nodes: Store and serve data segments. Execute queries on segments and return results to brokers.
- Minion: Handles data ingestion, segment creation, and real-time data processing.
- Deep Storage: Long-term storage for segments (e.g., S3, HDFS). Server nodes load segments from deep storage.
Core Architecture
┌─────────────────────────────────────────────────────────┐
│ Pinot Cluster │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Controller │ │
│ │ (Metadata Management, Segment Assignment) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Broker │ │
│ │ (Query Routing, Result Merging) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Server │ │
│ │ (Segment Storage, Query Execution) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Minion │ │
│ │ (Segment Creation, Real-Time Ingestion) │ │
│ └──────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘
Table Schema
Schema Definition
{
"schemaName": "events",
"dimensionFieldSpecs": [
{
"name": "user_id",
"dataType": "LONG"
},
{
"name": "event_type",
"dataType": "STRING"
},
{
"name": "country",
"dataType": "STRING"
}
],
"metricFieldSpecs": [
{
"name": "count",
"dataType": "LONG",
"defaultNullValue": 0
},
{
"name": "revenue",
"dataType": "DOUBLE",
"defaultNullValue": 0.0
}
],
"timeFieldSpec": {
"incomingGranularitySpec": {
"timeType": "MILLISECONDS",
"dataType": "LONG",
"name": "timestamp"
},
"outgoingGranularitySpec": {
"timeType": "MILLISECONDS",
"dataType": "LONG",
"name": "timestamp"
}
}
}
Table Configuration
Offline Table
{
"tableName": "events_OFFLINE",
"tableType": "OFFLINE",
"segmentsConfig": {
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"replication": "1",
"schemaName": "events"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"metadata": {
"customConfigs": {}
}
}
Real-Time Table
{
"tableName": "events_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"schemaName": "events",
"replicasPerPartition": "1"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.topic.name": "events",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "localhost:9092"
}
}
}
Data Ingestion
Batch Ingestion
Upload Segment:
pinot-admin.sh AddTable \
-schemaFile schema.json \
-tableConfigFile table-config.json \
-exec
Push Data:
pinot-admin.sh LaunchDataIngestionJob \
-jobSpecFile job-spec.yaml
Real-Time Ingestion
Kafka Stream:
{
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "events",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.broker.list": "localhost:9092",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder"
}
}
Querying
SQL Queries
Basic Query:
SELECT
country,
COUNT(*) AS events,
SUM(revenue) AS total_revenue
FROM events
WHERE timestamp >= 1609459200000
GROUP BY country
ORDER BY total_revenue DESC
LIMIT 10
Time-Based Query:
SELECT
DATETRUNC('hour', FROMDATETIME(timestamp, 'MILLISECONDS')) AS hour,
COUNT(*) AS events
FROM events
WHERE timestamp >= 1609459200000
GROUP BY hour
ORDER BY hour
Filtering:
SELECT
event_type,
COUNT(*) AS events
FROM events
WHERE timestamp >= 1609459200000
AND country = 'US'
AND user_id > 1000
GROUP BY event_type
PQL Queries
Aggregation:
{
"sql": "SELECT country, COUNT(*), SUM(revenue) FROM events WHERE timestamp >= 1609459200000 GROUP BY country"
}
Star-Tree Index
Configuration
{
"tableIndexConfig": {
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": ["country", "event_type"],
"functionColumnPairs": ["COUNT", "SUM__revenue"],
"maxLeafRecords": 10
}
]
}
}
Benefits:
- Pre-aggregated data
- Faster queries
- Reduced storage
Performance Optimization
Indexing
Inverted Index:
{
"tableIndexConfig": {
"invertedIndexColumns": ["country", "event_type"]
}
}
Range Index:
{
"tableIndexConfig": {
"rangeIndexColumns": ["timestamp", "user_id"]
}
}
Partitioning
Time-Based Partitioning:
{
"segmentsConfig": {
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"segmentPushType": "APPEND"
}
}
Compression
Compression Codec:
{
"tableIndexConfig": {
"noDictionaryColumns": ["revenue"],
"compressionCodec": "SNAPPY"
}
}
Best Practices
1. Schema Design
- Use appropriate data types
- Design dimensions for filtering
- Design metrics for aggregation
- Use time column for partitioning
2. Table Configuration
- Configure appropriate replication
- Set segment size
- Use star-tree for common queries
- Optimize indexes
3. Query Performance
- Always use time filters
- Filter on dimensions early
- Use appropriate aggregations
- Limit result sets
4. Scaling
- Scale servers for storage
- Scale brokers for queries
- Monitor segment distribution
- Plan for capacity
What Interviewers Look For
OLAP Database Understanding
- Pinot Concepts
- Understanding of columnar storage
- Real-time ingestion
- Star-tree indexing
- Red Flags: No Pinot understanding, wrong model, no indexing
- Real-Time Analytics
- Low-latency queries
- Stream ingestion
- Pre-aggregation strategies
- Red Flags: No latency awareness, poor ingestion, no aggregation
- Performance
- Query optimization
- Indexing strategies
- Compression
- Red Flags: No optimization, poor indexes, no compression
Problem-Solving Approach
- Schema Design
- Dimension and metric design
- Time partitioning
- Index selection
- Red Flags: Poor design, no partitioning, wrong indexes
- Query Optimization
- Time filter usage
- Dimension filtering
- Aggregation design
- Red Flags: No time filters, poor filtering, no aggregation
System Design Skills
- Analytics Architecture
- Pinot cluster design
- Ingestion pipeline
- Query optimization
- Red Flags: No architecture, poor ingestion, no optimization
- Scalability
- Horizontal scaling
- Segment management
- Performance tuning
- Red Flags: No scaling, poor segments, no tuning
Communication Skills
- Clear Explanation
- Explains Pinot concepts
- Discusses trade-offs
- Justifies design decisions
- Red Flags: Unclear explanations, no justification, confusing
Meta-Specific Focus
- Real-Time Analytics Expertise
- Understanding of OLAP systems
- Pinot mastery
- Low-latency optimization
- Key: Demonstrate real-time analytics expertise
- System Design Skills
- Can design analytics systems
- Understands OLAP challenges
- Makes informed trade-offs
- Key: Show practical analytics design skills
Summary
Apache Pinot Key Points:
- Real-Time OLAP: Low-latency analytical queries
- Column-Oriented: Optimized for analytics
- Real-Time Ingestion: Stream data ingestion
- Star-Tree Index: Pre-aggregated indexes
- Scalable: Horizontal scaling
Common Use Cases:
- Real-time dashboards
- OLAP analytics
- Time-series analytics
- Event analytics
- Business intelligence
- User analytics
Best Practices:
- Design dimensions and metrics carefully
- Use time column for partitioning
- Configure star-tree for common queries
- Always use time filters in queries
- Optimize indexes
- Monitor query performance
- Plan for horizontal scaling
Apache Pinot is a powerful real-time OLAP database optimized for sub-second analytical queries on large datasets.