Designing a Scalable Chat System: WhatsApp/Discord Architecture Deep Dive
Designing a chat system that handles millions of concurrent users, delivers messages in real-time, and maintains high availability requires careful engineering decisions at every layer. This comprehensive guide explores the architecture, trade-offs, and design patterns needed to build a scalable chat system like WhatsApp or Discord.
Requirements Analysis
Functional Requirements
Core Messaging:
- One-on-one messaging between users
- Group messaging (multiple participants)
- Media sharing (images, videos, files)
- Message status (sent, delivered, read)
- Typing indicators
- Message search and history
User Management:
- User registration and authentication
- Contact management
- Presence status (online, offline, away)
- User profiles and settings
Advanced Features:
- Message reactions and replies
- File attachments
- Voice and video calls (optional)
- End-to-end encryption
- Message deletion and editing
Non-Functional Requirements
Scalability: Support 500 million daily active users, 50 billion messages per day Availability: 99.9% uptime Latency: Message delivery < 100ms p99 Durability: Messages stored permanently, no data loss Consistency: Eventually consistent (acceptable message ordering delays)
Capacity Estimation
Traffic Estimates
Daily Active Users (DAU): 500 million Peak Concurrent Users: 10% of DAU = 50 million Average Messages per User: 50 messages/day Total Messages per Day: 500M × 50 = 25 billion messages/day Peak Messages per Second: 25B / (24 × 3600) × 3 (peak factor) = ~870K messages/sec
Storage Estimates
Average Message Size: 100 bytes (text) + 1KB metadata = 1.1KB Daily Message Storage: 25B × 1.1KB = 27.5 TB/day Annual Storage: 27.5 TB × 365 = ~10 PB/year Media Messages: 20% of messages are media, average 200KB Daily Media Storage: 25B × 0.2 × 200KB = 1 PB/day
Bandwidth Estimates
Incoming Messages: 870K msg/sec × 1.1KB = ~957 MB/sec Outgoing Messages: 870K msg/sec × 1.1KB × 2 (avg recipients) = ~1.9 GB/sec Total Bandwidth: ~2.9 GB/sec = ~23.2 Gbps
System APIs
Core APIs
sendMessage(userId, chatId, message, mediaUrl)
- Send message to individual or group chat
- Returns: messageId, timestamp
getMessages(userId, chatId, limit, offset)
- Retrieve message history
- Returns: List of messages
markAsRead(userId, chatId, messageIds[])
- Mark messages as read
- Returns: success status
getChats(userId)
- Get list of user's chats
- Returns: List of chats with last message
updatePresence(userId, status)
- Update user presence (online, offline, away)
- Returns: success status
uploadMedia(userId, file)
- Upload media file
- Returns: mediaUrl
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Mobile/Web Clients │
└───────────────┬───────────────────────────────┬─────────────┘
│ │
┌───────▼────────┐ ┌────────▼────────┐
│ Load Balancer │ │ API Gateway │
│ (WebSocket) │ │ (REST API) │
└───────┬────────┘ └────────┬────────┘
│ │
┌───────────┼───────────┐ │
│ │ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐ ┌─────▼─────┐
│ WS │ │ WS │ │ WS │ │ Message │
│Server │ │Server │ │Server │ │ Service │
└───┬───┘ └───┬───┘ └───┬───┘ └─────┬─────┘
│ │ │ │
└───────────┼───────────┘ │
│ │
┌───────▼───────────────────────────────▼───────┐
│ Message Queue (Kafka/RabbitMQ) │
└───────┬───────────────────────────────┬───────┘
│ │
┌───────▼────────┐ ┌────────▼────────┐
│ Metadata │ │ Message │
│ Service │ │ Storage │
│ (Cassandra) │ │ (Cassandra) │
└────────────────┘ └────────────────┘
│ │
┌───────▼───────────────────────────────▼───────┐
│ Media Storage (Object Storage) │
└───────────────────────────────────────────────┘
Detailed Component Design
Communication Protocol: WebSocket vs HTTP Long Polling
Approach 1: WebSocket
How It Works: Persistent bidirectional connection between client and server.
Client WebSocket Server
------ ---------------
|--HTTP Upgrade--------->|
|<--101 Switching--------|
| Protocols |
| |
|<=====Persistent Connection=====>|
| |
|--Message-------------->|
|<--Acknowledgment-------|
| |
|<--Push Message---------|
|--Acknowledgment------->|
Pros:
- Low Latency: No connection overhead after initial handshake
- Bidirectional: Server can push messages immediately
- Efficient: Lower overhead than HTTP polling
- Real-time: True real-time communication
Cons:
- Connection Management: Must manage persistent connections
- Stateful Servers: Servers maintain connection state
- Scaling Complexity: Harder to scale (sticky sessions needed)
- Firewall Issues: Some networks block WebSocket
When to Use: Real-time requirements, low latency critical, high message frequency.
Approach 2: HTTP Long Polling
How It Works: Client sends request, server holds it open until message arrives or timeout.
Client HTTP Server
------ -----------
|--GET /messages-------->|
| | (Hold connection)
| | (Wait for message)
|<--Message (after 30s)--|
| |
|--GET /messages-------->| (Immediately poll again)
Pros:
- Stateless: Servers remain stateless
- Firewall Friendly: Works through most firewalls
- Simple Scaling: Easy to scale horizontally
- Fallback: Can fallback to short polling
Cons:
- Higher Latency: Up to polling interval delay
- Resource Usage: Many open connections
- Not True Real-time: Messages delayed until next poll
When to Use: Firewall restrictions, simpler scaling, acceptable latency.
Approach 3: Server-Sent Events (SSE)
How It Works: Server pushes events to client over HTTP connection.
Pros:
- One-Way Push: Efficient for server-to-client
- HTTP-Based: Works through firewalls
- Automatic Reconnection: Built-in reconnection
Cons:
- One-Way Only: Client must use separate HTTP for sending
- Limited Browser Support: Not all browsers support well
When to Use: One-way push scenarios, notification systems.
Decision: Use WebSocket for real-time chat, with HTTP long polling as fallback.
Message Delivery Strategies
Strategy 1: Direct Delivery (Online Users)
How It Works: If recipient is online, deliver directly via WebSocket.
Sender Message Service Recipient (Online)
------ -------------- ------------------
|--Send--------->| |
| | Store message |
| |--Push via WS------->|
| |<--ACK---------------|
|<--Success------| |
Pros:
- Low Latency: Immediate delivery
- Efficient: No queuing overhead
- Real-time: True real-time delivery
Cons:
- Requires Online: Only works for online users
- Connection Dependency: Requires active WebSocket
Strategy 2: Message Queue for Offline Users
How It Works: Store messages in queue, deliver when user comes online.
Sender Message Service Queue Recipient (Offline)
------ -------------- ----- -------------------
|--Send--------->| | |
| | Store message | |
| |--Enqueue----------->| |
|<--Success------| | |
| | | |
| | | (User comes online)
| |<--Dequeue-----------| |
| |--Deliver--------------------------->|
Queue Options:
Apache Kafka:
- Pros: High throughput, distributed, durable
- Cons: Complex setup, overkill for simple use cases
- When: High message volume, need ordering guarantees
RabbitMQ:
- Pros: Flexible routing, good management UI
- Cons: Lower throughput than Kafka
- When: Complex routing needs, moderate volume
Redis Pub/Sub:
- Pros: Simple, fast, low latency
- Cons: Not durable (messages lost if subscriber offline)
- When: Real-time only, don’t need persistence
Amazon SQS:
- Pros: Managed service, auto-scaling
- Cons: Vendor lock-in, cost at scale
- When: AWS ecosystem, want managed solution
Decision: Use Kafka for message queue (durability, ordering), Redis for online user delivery (low latency).
Strategy 3: Hybrid Approach
Online Users: Deliver via WebSocket immediately Offline Users: Store in Kafka, deliver on reconnection Group Messages: Store in Kafka, fan-out to all members
Presence Management
Approach 1: Heartbeat-Based
How It Works: Clients send periodic heartbeats, server tracks last heartbeat time.
Client Presence Service
------ ----------------
|--Heartbeat (every 30s)->|
|<--ACK-------------------|
| |
| (If no heartbeat for 60s, mark offline)
Pros:
- Simple: Easy to implement
- Accurate: Good accuracy with frequent heartbeats
Cons:
- Network Overhead: Constant heartbeat traffic
- Battery Drain: Mobile devices drain battery
- Scalability: High overhead at scale (50M users × heartbeat = high load)
Optimization: Adaptive heartbeat (increase interval when idle).
Approach 2: Event-Based
How It Works: Update presence only on state changes (app open/close, screen lock).
Client Presence Service
------ ----------------
|--App Open-------------->|
| (Mark online) |
| |
|--Screen Lock----------->|
| (Mark away) |
| |
|--App Close------------->|
| (Mark offline) |
Pros:
- Efficient: Minimal network usage
- Battery Friendly: No constant heartbeats
- Accurate: Based on actual user actions
Cons:
- Delayed Updates: May not detect crashes immediately
- Platform Dependent: Different events on different platforms
When to Use: Mobile-first applications, battery optimization critical.
Approach 3: Hybrid (Heartbeat + Events)
How It Works: Combine event-based updates with occasional heartbeats for accuracy.
Pros:
- Balanced: Good accuracy with low overhead
- Resilient: Handles edge cases (crashes, network issues)
Cons:
- More Complex: Must handle both mechanisms
Decision: Use hybrid approach - event-based primary, heartbeat as backup.
Database Design
Schema Design
Users Table:
user_id (PK)
username
email
phone_number
created_at
last_seen_at
profile_picture_url
Chats Table:
chat_id (PK)
chat_type (1-on-1, group)
created_at
updated_at
Chat_Participants Table:
chat_id (FK)
user_id (FK)
joined_at
role (admin, member)
Messages Table:
message_id (PK)
chat_id (FK)
sender_id (FK)
content
message_type (text, image, video, file)
media_url
created_at
Message_Status Table:
message_id (FK)
user_id (FK)
status (sent, delivered, read)
updated_at
User_Presence Table:
user_id (PK)
status (online, offline, away)
last_seen_at
Database Choice: SQL vs NoSQL
SQL (PostgreSQL):
- Pros: ACID transactions, complex queries, relationships
- Cons: Harder to scale horizontally
- When: Need transactions, complex queries
NoSQL (Cassandra):
- Pros: Horizontal scaling, high write throughput, partition tolerance
- Cons: Eventual consistency, limited queries
- When: High write volume, need horizontal scaling
Decision: Use Cassandra for messages (high write volume), PostgreSQL for user data (complex queries).
Sharding Strategy
Shard by User ID:
- Hash user_id to determine shard
- All user’s chats on same shard
- Pros: Efficient user queries
- Cons: Cross-shard group chats expensive
Shard by Chat ID:
- Hash chat_id to determine shard
- All messages for chat on same shard
- Pros: Efficient chat queries
- Cons: User’s chats spread across shards
Hybrid Approach:
- Messages sharded by chat_id
- User metadata sharded by user_id
- Chat list cached per user
Caching Strategy
Multi-Level Caching
Level 1: Client Cache:
- Recent messages cached locally
- Offline access
- Reduces server load
Level 2: Redis Cache:
- Active chats cached
- User presence cached
- Recent messages cached
Cache Invalidation:
- Write-Through: Write to cache and DB simultaneously
- Write-Back: Write to cache, flush to DB asynchronously
- TTL-Based: Expire after time period
Cache Keys:
user:{userId}:presence
chat:{chatId}:messages:recent
user:{userId}:chats
Media Handling
Storage Architecture
Object Storage: Store media files in S3/GCS CDN: Serve media via CDN for fast delivery Thumbnails: Generate and cache thumbnails
Processing Pipeline
Upload → Validation → Storage → Thumbnail Generation → CDN Distribution
Optimization:
- Compression: Compress images/videos
- Multiple Formats: Generate different sizes
- Lazy Loading: Load full media on demand
Scalability Patterns
Horizontal Scaling
Stateless Servers: Design servers to be stateless Load Balancing: Distribute connections across servers Sharding: Partition data across multiple databases
Message Fan-out for Groups
Challenge: Group with 1000 members, one message = 1000 deliveries
Approach 1: Synchronous Fan-out:
- Send to all members immediately
- Pros: Low latency
- Cons: Slow if any member slow
Approach 2: Asynchronous Fan-out:
- Queue message, fan-out asynchronously
- Pros: Fast response
- Cons: Slight delay for some members
Approach 3: Hybrid:
- Send to online members synchronously
- Queue for offline members
- Pros: Balance latency and throughput
Decision: Use hybrid approach.
Read Replicas
Write to Primary: All writes go to primary database Read from Replicas: Distribute reads across replicas Replication Lag: Acceptable for chat (eventual consistency)
Real-World Implementations
WhatsApp Architecture
Protocol: Custom protocol (not WebSocket) Message Delivery: Store-and-forward Encryption: End-to-end encryption (Signal Protocol) Scaling: Erlang-based, handles billions of messages Storage: Messages stored on user’s device primarily
Discord Architecture
Protocol: WebSocket for real-time, REST for API Message Delivery: Real-time via WebSocket Scaling: Microservices architecture Storage: PostgreSQL for metadata, object storage for media Presence: Event-based with heartbeat fallback
Telegram Architecture
Protocol: Custom MTProto protocol Message Delivery: Cloud-based, multi-DC Encryption: Optional end-to-end encryption Scaling: Distributed across multiple data centers Storage: Messages stored in cloud
Trade-offs Summary
WebSocket vs HTTP Long Polling:
- WebSocket: Lower latency, harder to scale
- HTTP: Easier to scale, higher latency
Synchronous vs Asynchronous Delivery:
- Synchronous: Lower latency, lower throughput
- Asynchronous: Higher throughput, slight latency
SQL vs NoSQL:
- SQL: Strong consistency, harder to scale
- NoSQL: Eventual consistency, easier to scale
Heartbeat vs Event-Based Presence:
- Heartbeat: More accurate, higher overhead
- Event-Based: Lower overhead, less accurate
Conclusion
Designing a scalable chat system requires balancing multiple concerns: real-time delivery, scalability, consistency, and user experience. The key is choosing the right trade-offs for your specific requirements.
Key decisions:
- WebSocket for real-time communication
- Kafka for message queuing
- Cassandra for message storage (high write volume)
- Redis for caching and online delivery
- Hybrid presence (events + heartbeat)
- Asynchronous fan-out for group messages
By understanding these trade-offs and making informed decisions, we can build chat systems that scale to millions of users while maintaining low latency and high availability.