This is Distributed Chat application, Users can talk in real time. i implemented this because i am curious i how distributed systems work, how they coordinate, what problems they faced and the besy way to do it, is by doing it so i started this project. My first intial target is hanlind 10K online users. I have documented my learning jouney and problems i faced in this readme itself.
- Designed for 10K online users
- Utilizes bucketing strategy to store, retrieve, and send messages faster and more efficiently
-
How RabbitMQ works
-
AMQP Protocol
-
How different distributed systems coordinate.
-
Different Schema design patterns like bucket, tree, approximation, and attribute
-
MongoDB ObjectId contains a timestamp, making them roughly sortable based on creation time, which is suitable for this case.
!! Now I know all of these! 😄
Majorly these 2 are the main problems i faced.
-
How to handle so many messages ??
- I choosed RabbitMQ queue for storing messages before they are handled by the server. Kafka is a overkill for this and redis don't have durability. RabbiMQ comes out to be the best option here.
-
How to know which server is user connected to?
- Solved it by using a RabbitMQ queue and exchange. For user 1 on server 2, I bind the server queue to the exchange based on userId. I didn't want to add more stack like Redis to create a "Server to User" table, as this would add more complexity. So I came up with this solution. After discussing the potential performance impact of having so many bindings with the RabbitMQ team, i finalized this architecture.
Here is the flow of a message from sender to reciever.
- Time synchronization for bucketing:
- Problem: For bucketing to work properly, servers need to sync time.
Dec 29 - Jan 4
- Started researching different technologies used in distributed systems, and learned about:
- Kafka
- Redis
- RabbitMQ
- Nginx
Jan 4 - Jan 5
- Set up Docker environment for fast setup
- Read Discord blogs to understand how they work
- Read about WhatsApp's use of ejabberd
- Tried RabbitMQ with a queue for each user [Failed 1][Having so many queues caused significant delays during RabbitMQ node restarts.]
Jan 6
- Found possible approaches to solve the first two problems
- Prepared architecture diagram
- Studied Discord message schema
- Explored compound keys
- Learned about MongoDB IDs being sortable and their composition
Jan 7
- Tried RabbitMQ for message and user updates [Failed 2][Not ideal if a node goes down, it becomes unaware of who is online]
Jan 8
- Used RabbitMQ for message delivery and Redis for "Server to User" map [Failed 3][Complexity is too high to handle each user's presence. Redis will become a single point of failure and it will require three different stacks (Server, RabbitMQ, Redis) to scale up together]
Jan 9
-
Created a queue for each server and bound it to exchange based on userID [Green Go - Only problem is having so many bindings, solved this by creating multiple exchanges and partitioning users to them by a simple hash function.]
To keep the setup simple, I will provide only one way of doing it. Clone the repo and spin up the Docker container:
docker compose up --build -d
Start the Frontend:
cd client && npm i && npm run dev
- MERN
- WebSocket
- RabbitMQ (AMQP Protocol)
- Redis
- Popular Chat and Instant Messaging Protocols
- How Discord Stores Trillions of Messages
- How Discord Stores Billions of Messages
- MongoDB Indexes
- Schema Design Process
- Schema Design Patterns
I'm happy to colaborate. Contact me.