Twitter System Design
Twitter is a social media platform that allows users to post and interact with messages known as "tweets". Let's analyze its system design.
System Requirements
Functional Requirements
- Users can post tweets (140-280 characters)
- Users can follow other users
- Users can view their timeline
- Users can like and retweet posts
- Real-time notifications
Non-Functional Requirements
- High availability
- Low latency
- Scalability for millions of users
- Data consistency
- Real-time updates
Architecture
High-Level Design
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Web/App │ │ API Layer │ │ Services │
│ Clients │────▶│ (REST/WS) │────▶│ Layer │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Caching │ │ Database │
│ Layer │ │ Layer │
└─────────────┘ └─────────────┘
Components
-
Load Balancers
- Distribute traffic across API servers
- Handle SSL termination
- Implement health checks
-
API Servers
- Handle user requests
- Implement rate limiting
- Manage authentication
-
Services
- Tweet Service
- User Service
- Timeline Service
- Notification Service
-
Caching
- Redis for timeline caching
- Memcached for user data
- CDN for media content
-
Database
- MySQL for user data
- Cassandra for tweets
- Redis for timelines
- MongoDB for analytics
Data Models
User
{
"user_id": "123",
"username": "john_doe",
"name": "John Doe",
"email": "john@example.com",
"followers": ["456", "789"],
"following": ["789"],
"created_at": "2023-01-01T00:00:00Z"
}
Tweet
{
"tweet_id": "abc123",
"user_id": "123",
"content": "Hello, Twitter!",
"created_at": "2023-01-01T00:00:00Z",
"likes": 100,
"retweets": 50,
"replies": 25
}
Key Features
Timeline Generation
-
Pull Model
- User requests timeline
- System fetches tweets from followed users
- Merge and sort tweets
- Return to user
-
Push Model
- Pre-compute timelines
- Store in cache
- Update on new tweets
- Serve from cache
Real-time Updates
- WebSocket connections
- Fan-out on write
- Event-driven architecture
- Message queues
Scaling Strategies
Database Scaling
- Sharding by user_id
- Read replicas
- Caching layer
- Data partitioning
Caching Strategy
- Multi-level caching
- Cache invalidation
- Write-through cache
- Cache warming
Challenges
-
Timeline Generation
- Handling celebrity users
- Real-time updates
- Consistency vs. Performance
-
Data Consistency
- Eventual consistency
- Conflict resolution
- Data synchronization
-
Performance
- Latency optimization
- Resource utilization
- Cost management