How WhatsApp Handles 100 Billion Messages a Day - Explained Like You're in an Interview
One Message, A Million Hops
It is 11:47 PM. You type "goodnight" to your friend in Tokyo. You tap send. One second later, their phone buzzes 9,000 kilometers away.
That single message - eight characters, barely 16 bytes - just traveled through a persistent WebSocket connection, hit a load balancer in Virginia, got routed to a connection server, passed through an encryption layer, landed in a message queue, got picked up by a routing service, discovered your friend was connected to a server in Singapore, hopped across an internal backbone, and triggered that buzz.
All in under one second.
Now here is the part that breaks brains in interviews: WhatsApp does this 100 billion times a day. And for most of its history, the engineering team was roughly 50 people. Fifty engineers. Two billion monthly active users.
When an interviewer asks you to "design WhatsApp," they want to see that you understand the fundamental architectural patterns that make real-time messaging work at scale - persistent connections, message ordering, delivery guarantees, and the trade-offs behind each decision.
Let us trace that "goodnight" message through every system it touches. By the end, you will have a mental framework for designing any real-time communication system an interviewer throws at you.
Connection Layer: WebSockets and the Long-Polling Fallback
The first thing your phone does when you open WhatsApp is establish a persistent connection to a server. This is the most fundamental architectural decision in any real-time messaging system, and it is the first thing you should mention in an interview.
Why Not Just HTTP?
Standard HTTP follows a request-response model. Your phone sends a request, the server responds, and the connection closes. To check for new messages, you would have to keep asking: "Any new messages? How about now? Now?"
This is called polling, and it is terrible for messaging. Poll every 5 seconds and you waste battery. Poll every 30 seconds and messages feel delayed.
Long polling improves this slightly. Your phone sends a request, and the server holds the connection open until it has something to send back (or until a 30-60 second timeout). This reduces unnecessary traffic, but each message still requires a full HTTP round-trip to re-establish the waiting connection.
WebSockets: The Right Tool
WhatsApp uses a custom protocol built on top of TCP (historically based on XMPP, later replaced with an internal binary protocol), but the concept maps directly to WebSockets - which is the technology you should discuss in an interview.
A WebSocket connection starts as an HTTP request (the "upgrade handshake"), then converts into a full-duplex, persistent TCP connection. Both the client and server can send data at any time without the overhead of HTTP headers on every message. For a messaging app, this means:
- Instant delivery: The server pushes messages to the client the moment they arrive. No polling delay.
- Low overhead: After the initial handshake, each message frame adds only 2-14 bytes of framing overhead, compared to hundreds of bytes of HTTP headers.
- Battery efficiency: One persistent connection is far cheaper than repeatedly opening and closing connections.
Connection Management at Scale
Here is where interviews get interesting. Two billion users means potentially two billion simultaneous persistent connections. How do you manage that?
Connection servers are the answer. These are specialized, lightweight servers whose only job is to hold open WebSocket connections and route messages. They are stateful - each one knows which users are connected to it - but they are horizontally scalable.
A single modern server can handle roughly 500,000 to 1 million concurrent WebSocket connections if you tune the OS correctly (increasing file descriptor limits, using epoll/kqueue, minimizing per-connection memory). WhatsApp famously achieved 2 million connections per server using Erlang, which has lightweight processes perfectly suited for this workload.
Behind these connection servers sits a connection registry - a distributed data store (think Redis cluster or a custom in-memory store) that maps each user ID to the connection server they are currently connected to. When a message arrives for User B, the routing service looks up User B in this registry to find which connection server to forward the message to.
User A's Phone
↓ (WebSocket)
Load Balancer
↓
Connection Server #42
↓
Message Router → Connection Registry → "User B is on Server #187"
↓
Connection Server #187
↓ (WebSocket)
User B's Phone
Heartbeats: Keeping Connections Alive
Persistent connections die all the time. The user walks into an elevator, switches from Wi-Fi to cellular, or their phone's OS kills the background process to save battery. You need a way to detect dead connections quickly.
Heartbeat messages solve this. The client sends a small ping every 30-60 seconds. If the server does not receive a heartbeat within the expected window, it marks the connection as dead and cleans up resources. If the client detects a drop, it reconnects and re-registers.
This interval is a trade-off worth mentioning in interviews. Too frequent (every 5 seconds) wastes battery. Too infrequent (every 5 minutes) and dead connections linger, causing failed deliveries. WhatsApp uses intervals around 30 seconds, adjusted dynamically based on network conditions.
Message Queue: The Unsung Hero
Your "goodnight" message has arrived at a connection server. But that server's job is to hold connections, not to process business logic. It needs to hand the message off to something else. Fast.
This is where message queues enter the picture - arguably the most important component in the entire system.
Store-and-Forward Architecture
WhatsApp uses a store-and-forward model. When your message arrives at the server, it is immediately written to a persistent message queue (or message store) and an acknowledgment is sent back to your phone. That single checkmark you see in WhatsApp? That means the server has durably stored your message. It has not been delivered yet, but it will not be lost.
This decoupling is critical. The sender's experience (message accepted) is separated from the receiver's experience (message delivered). If User B is offline, their messages sit in the queue until they reconnect. If a server crashes after accepting the message but before delivering it, the message survives in the queue and gets retried.
Message IDs and Idempotency
Every message gets a globally unique message ID, typically generated on the client side (a UUID or a combination of user ID + timestamp + random component). This ID serves multiple purposes:
-
Deduplication: Networks are unreliable. Your phone might send the same message twice because it did not receive the server's acknowledgment (maybe the ACK packet was lost). The server uses the message ID to detect and discard duplicates. This is called idempotent processing - processing the same message multiple times produces the same result as processing it once.
-
Ordering: Message IDs (combined with timestamps) help establish the order of messages within a conversation. More on this in a moment.
-
Delivery tracking: The ID follows the message through its entire lifecycle - sent, delivered, read - enabling the familiar checkmark system.
Guaranteed Delivery: At-Least-Once Semantics
WhatsApp guarantees that your message will be delivered. In distributed systems terms, it provides at-least-once delivery. The mechanism works like this:
- Your phone sends the message and starts a retry timer.
- The server receives the message, stores it, and sends back an ACK.
- If your phone does not receive the ACK within the timeout, it resends the message (with the same message ID).
- The server deduplicates using the message ID.
- Once User B's device receives the message, it sends an ACK back to the server.
- The server records delivery and notifies your phone (second checkmark).
- If User B's device does not ACK, the server retries delivery when User B reconnects.
This retry-with-deduplication pattern is a fundamental building block of reliable distributed systems. In an interview, demonstrating that you understand why you need both retries and deduplication shows real depth.
Message Routing
The routing layer decides where each message goes. For a one-to-one chat, the logic is straightforward: look up the recipient in the connection registry, find their connection server, and forward the message. But the routing service also handles:
- Multi-device delivery: The message must fan out to all active devices (phone, desktop, web).
- Offline queuing: If the recipient is not connected, the message goes into a durable offline queue.
- Priority handling: Call signaling might need priority routing over regular text.
Presence and Typing Indicators (The Hard Part)
Here is a secret that most candidates do not know: the hardest part of designing a messaging system is not the messages themselves. It is the presence system - the "online," "last seen," and "typing..." indicators.
Why? Because presence is a fundamentally different problem than messaging. A message is a discrete event that happens once and must be delivered reliably. Presence is a continuous state that changes constantly, applies to every contact you have, and - here is the kicker - does not need to be perfectly accurate.
The Scale Problem
Think about what "online" status means. When you open WhatsApp, it shows you which of your contacts are currently online. If you have 500 contacts, your phone needs to know the presence state of all 500. If 500 million users are online simultaneously and each has 500 contacts, the system needs to handle 250 billion presence relationships. Updating all of them in real time every time someone opens or closes the app would melt any system.
The Solution: Eventual Consistency and Gossip
WhatsApp solves this with a few clever strategies:
Lazy presence updates: Your phone does not subscribe to every contact's status. Instead, when you open a chat, your client subscribes to that specific person's presence. This dramatically reduces active subscriptions.
Gossip protocols: Connection servers gossip presence information among themselves. When User A connects to Server #42, that server tells neighboring servers. They propagate to their neighbors. Eventually, every server that needs to know learns about it. This is eventual consistency - not instant, but convergent within seconds.
Last seen timestamps: Rather than real-time online/offline tracking, the system stores a "last seen" timestamp updated on activity. Viewing a contact's status is a simple key-value lookup.
Typing indicators: These are fire-and-forget - no persistence, no retry, no guarantee. If lost, nobody cares. This is a perfect design trade-off: for ephemeral, non-critical data, you trade reliability for performance.
What to Say in the Interview
The key insight interviewers want to hear is that different types of data have different consistency and reliability requirements. Messages need strong delivery guarantees. Presence needs eventual consistency. Typing indicators need neither. A good system design recognizes these differences and uses appropriate strategies for each.
End-to-End Encryption Without Killing Performance
In 2016, WhatsApp rolled out end-to-end encryption for all messages, calls, photos, and videos. Two billion users. Overnight. And nobody noticed a performance difference.
How? Because good encryption, done right, adds almost no latency to individual messages.
The Signal Protocol
WhatsApp uses the Signal Protocol (originally called the Axolotl Ratchet), developed by Open Whisper Systems. Here is a simplified version of how it works:
Initial key exchange: Devices perform a key exchange using long-term identity keys and ephemeral pre-keys, based on the Extended Triple Diffie-Hellman (X3DH) handshake. Pre-keys are uploaded to the server in advance, so the exchange happens asynchronously - User B does not need to be online.
Double Ratchet Algorithm: Every message uses a new encryption key derived from the previous one (the "ratchet" only moves forward). This provides forward secrecy: compromising your current key cannot decrypt past messages, because those used different keys that no longer exist.
Why it does not add latency: The actual encryption and decryption of a message is a symmetric-key operation (AES-256), which takes microseconds on modern hardware. The expensive asymmetric operations (Diffie-Hellman) only happen during the initial key exchange, and even that uses pre-uploaded keys to avoid waiting for the other user.
The Server Sees Nothing
With end-to-end encryption, the server is just a dumb pipe - it stores and forwards encrypted blobs it cannot read. This simplifies some things (no content parsing needed) but makes others harder: no server-side search, spam detection limited to metadata, and cloud backup requires separate encryption management.
In an interview, focus on architectural implications over cryptographic details: the server cannot read messages, key exchange uses pre-uploaded keys for async operation, and per-message overhead is negligible because it uses symmetric-key cryptography.
Group Chats: Where Architectures Go to Die
One-to-one messaging is relatively straightforward. Group chats are where things get genuinely difficult, and where your system design answer can really shine.
The Fan-Out Problem
When you send a message to a group of 256 members (WhatsApp's maximum group size), the system needs to deliver that message to up to 255 other people. There are two fundamental strategies:
Fan-out on write (sender-side fan-out): When you send a group message, the system immediately creates 255 individual copies of the message, one for each recipient, and routes each copy independently. This is what WhatsApp actually does, and it is the simpler approach conceptually.
Pros: Each recipient's message delivery is independent. If User C is offline, their copy waits in their queue without affecting anyone else. Delivery tracking (checkmarks) works per-recipient. The read path is simple - each user just reads from their own inbox.
Cons: A single group message generates 255 write operations. For a 256-person group with active discussions, this creates significant write amplification. Storage usage is higher because the message content is duplicated.
Fan-out on read (recipient-side fan-out): The message is stored once, and each recipient reads from a shared group timeline. This saves storage and reduces writes, but the read path becomes more complex - you need to track each user's read position in each group, handle message deletion (User A deleted a message, but User B already read it), and manage access control.
WhatsApp's choice of fan-out-on-write makes sense given their constraints: relatively small group sizes (max 256 when this architecture was designed), the need for independent delivery tracking per recipient, and end-to-end encryption (each recipient's copy is encrypted with a different key).
Message Ordering in Groups
This is a subtle problem that trips up many candidates. In a one-to-one conversation, message order is straightforward - there are only two participants, and each one sees messages in the order they were sent and received.
In a group chat with 50 active members, you might have 10 people sending messages simultaneously. What order should everyone see them in?
Server-side ordering: The simplest approach is to let the server assign a sequence number to each message as it arrives. The server processes messages in the order it receives them, stamps each one with an incrementing counter, and all clients display messages in that order.
But this creates a problem: if User A and User B send messages at the same instant, the server's ordering might not match the order those users intended. User A might see their message first (because from their perspective, they sent it first), but the server received User B's message first.
Lamport timestamps and vector clocks: More sophisticated systems use logical clocks to track causal ordering. If message B is a reply to message A, the system guarantees B appears after A, regardless of server arrival order. But for concurrent messages (sent independently, neither caused by the other), the system accepts that any consistent ordering is acceptable.
For a WhatsApp-like system in an interview, server-assigned sequence numbers per group are sufficient. The key point is that you acknowledge the ordering problem and explain your approach.
Membership Management
Group membership changes (adding/removing members) need special handling because they affect encryption. When a member is removed from a group, the group's encryption keys must be rotated so the removed member cannot decrypt future messages. When a member is added, they need the current encryption state but should not be able to decrypt messages sent before they joined.
This key rotation is called a sender key update. WhatsApp uses a "sender key" mechanism for group encryption - each group member generates a sender key that is distributed to all other members. When membership changes, affected sender keys are regenerated and redistributed. This is more efficient than encrypting each message individually for each recipient (which would not scale for groups).
Storage: The "Last Seen" Problem
We have covered how messages travel through the system in real time. But what about messages that cannot be delivered immediately? And what about the billions of photos, videos, and voice messages sent every day?
Message Storage Strategy
WhatsApp follows a transient storage model for messages. The server stores a message only until it is delivered to the recipient's device. Once the recipient's phone acknowledges receipt (the second checkmark), the message is deleted from the server. The permanent copy lives on the users' devices.
This is a deliberate architectural choice with major implications:
- Storage costs are dramatically lower than a system like Slack or Discord, which stores all messages permanently on the server.
- Privacy is enhanced - there is no server-side archive of your conversations.
- But message history is device-dependent. If you lose your phone without a backup, your messages are gone. This is why WhatsApp introduced cloud backups (encrypted separately from the in-transit encryption).
Discuss this trade-off explicitly in interviews. Consumer apps like WhatsApp use transient storage (users expect messages on their devices). Enterprise apps like Slack need permanent server-side storage (searchable history across devices).
Offline Message Queuing
When a user is offline, undelivered messages accumulate in a durable offline queue - persisted to disk, ordered within conversations, and bounded by a retention policy (WhatsApp stores undelivered messages up to 30 days). When the user reconnects, queued messages are delivered in order, with the client loading incrementally rather than dumping everything at once.
Media Storage: A Completely Different Problem
Text messages are tiny. But WhatsApp also handles billions of photos, videos, and voice messages daily. Media storage is architected completely differently from text message routing.
When you send a photo:
- Your phone encrypts the photo with a random symmetric key.
- The encrypted photo is uploaded to a blob storage service (similar to Amazon S3) and a reference URL is returned.
- The symmetric key and the blob URL are sent as a regular message to the recipient.
- The recipient's phone downloads the encrypted blob from the storage service and decrypts it locally.
This separation means the real-time messaging infrastructure never has to handle large binary data. The connection servers and message queues only deal with small text messages and metadata. The heavy lifting of media transfer is offloaded to a dedicated storage and CDN infrastructure optimized for large files.
Media expiration: To manage storage costs, media stored on the server is temporary. If a recipient does not download a photo within a certain period, the server-side copy may be deleted. The recipient would then see a "download failed" message and the sender might need to resend.
The Database Question
What database powers all this? WhatsApp historically used Mnesia, an Erlang-native distributed database, for connection state and presence data. For message storage, they evolved through several systems, eventually building custom storage solutions optimized for their specific access patterns (write-heavy, time-ordered, short-lived data).
In an interview, you do not need to name the exact databases. What matters is that you recognize different components need different storage solutions:
| Component | Access Pattern | Good Fit |
|---|---|---|
| Connection registry | Key-value, high read/write, in-memory | Redis, Memcached |
| User profiles | Key-value, read-heavy | Cassandra, DynamoDB |
| Message queues | Append-only, ordered, durable | Kafka, custom WAL |
| Offline messages | Write-then-read-once, TTL | Cassandra with TTL |
| Media blobs | Write-once, read-few, large | S3, GCS, blob store |
| Group metadata | Read-heavy, low write | PostgreSQL, MySQL |
Putting It All Together
Let us trace our "goodnight" message one final time, with every component in place:
- Your phone encrypts the message using the Signal Protocol's current session key.
- The encrypted message is sent over a persistent WebSocket connection to a connection server.
- The connection server forwards the message to the message routing service via an internal message queue.
- The routing service assigns a server-side timestamp and stores the message in the offline queue (durable storage).
- It sends an ACK back to your phone (first checkmark - message received by server).
- The routing service looks up the recipient in the connection registry.
- The recipient is online, connected to a different connection server. The message is forwarded there.
- That connection server pushes the encrypted message down the recipient's WebSocket connection.
- The recipient's phone decrypts the message, displays it, and sends an ACK back to the server.
- The server records delivery and sends a delivery notification to your phone (second checkmark).
- When the recipient reads the message, a read receipt is sent back (blue checkmarks).
- The server deletes the message from its storage - the only copies now live on the two devices.
Twelve steps. Under one second. One hundred billion times a day.
The Real Takeaway for Interviews
When you design a messaging system in an interview, the interviewer is not expecting you to recreate WhatsApp. They want to see that you can:
- Start with the right connection model (WebSockets for real-time, with fallback).
- Separate concerns (connection handling, message routing, storage, presence - each is its own service).
- Reason about delivery guarantees (at-least-once delivery with idempotent processing).
- Acknowledge trade-offs (fan-out on write vs. read, transient vs. permanent storage, strong vs. eventual consistency).
- Handle failure cases (offline users, dead connections, network partitions, server crashes).
- Discuss encryption at the architectural level (where keys are exchanged, what the server can and cannot see).
If you can trace a message from send to delivery, explain what happens at each hop, and articulate why each component exists, you will crush the messaging system design interview.
FAQ
Should I use WebSocket or HTTP long-polling?
Use WebSockets as your primary transport for any real-time messaging system. WebSockets provide full-duplex communication over a single persistent TCP connection, which means lower latency, less overhead per message, and better battery life on mobile devices. Long-polling should be discussed as a fallback for environments where WebSockets are not available - older browsers, restrictive corporate proxies, or certain mobile networks that strip WebSocket upgrade headers. In an interview, mention both: lead with WebSockets as the primary approach, then acknowledge long-polling as a fallback. If the interviewer pushes on trade-offs, explain that long-polling is simpler to implement and works everywhere HTTP works, but it introduces latency (each message delivery requires a new HTTP connection) and consumes more server resources at scale.
How do you handle message ordering in distributed systems?
Message ordering has different levels of guarantee depending on your requirements. For one-to-one chats, you can rely on per-conversation sequence numbers assigned by the server - since there are only two participants, a single sequence counter per conversation provides a total order. For group chats, the same approach works if a single server (or partition) handles each group. For systems that span multiple data centers, you need to be more careful: use Lamport timestamps or vector clocks to capture causal ordering (message B that replies to message A always appears after A), and accept that truly concurrent messages (sent independently at the same time) can appear in any consistent order. In practice, server-arrival order combined with client-side timestamps for display is sufficient for most messaging applications. The important thing in an interview is to acknowledge the problem, explain your chosen strategy, and discuss what guarantees it does and does not provide.
What database does WhatsApp use?
WhatsApp's technology choices evolved over time. They originally built on Erlang and used Mnesia (Erlang's built-in distributed database) for connection state, session management, and presence data. For message storage, they developed custom solutions tuned for their write-heavy, time-ordered, short-lived data patterns. In an interview, rather than naming a specific database, focus on matching storage technology to access patterns. Use an in-memory key-value store (like Redis) for connection registry and presence data that requires microsecond lookups. Use a distributed log or queue (like Kafka) for message transit and offline queuing. Use a wide-column store (like Cassandra or HBase) for per-user message storage with time-based ordering and automatic TTL expiration. Use blob storage (like S3) for media files. The key insight is that no single database handles all these workloads well - a real messaging system uses multiple specialized storage systems.
How do read receipts work at scale?
Read receipts are essentially lightweight messages that flow through the same infrastructure as regular messages, but with lower priority and weaker delivery guarantees. When User B opens a chat and their client renders messages from User A, the client sends a "read" event back to the server containing the message IDs that were displayed. The server records these read events and forwards them to User A's device, which updates the checkmark UI (single gray check for sent, double gray for delivered, double blue for read). The scale challenge is that read receipts multiply message traffic - every message read generates a receipt flowing in the reverse direction. To manage this, systems batch read receipts (sending one "I read everything up to message #347" instead of individual receipts for each message) and treat them as best-effort delivery. If a read receipt is lost, the sender sees delivered instead of read - an acceptable degradation. In group chats, read receipts are even more complex because each member reads at different times, so the system tracks per-member read state and often only shows "read by" counts rather than sending individual receipt notifications to the sender for each group member.
Want to practice designing real-time systems with an AI mentor that gives you real-time feedback? Check out Levelop's System Design Canvas - it's the closest thing to a live interview you can get without the sweaty palms.
Next up: Learn how load balancers and API gateways actually differ in production, or dive into database sharding with Instagram's story.