Understanding Kafka in simple English
Kafka (or Message Queues) is simple yet difficult to understand topic. Let us try to understand it in simple English.
Different ways of data transfer / processing
Even before we understand what Kafka is, we need to understand the different ways to transfer the data in a Software Application
Majorly it can be classified as
- Synchronous
- Asynchronous
In synchronous form of communication, response or data is returned soon after making the request
In asynchronous form of communication, response may or may not be received. And if received it is received once the request processing is done and the caller need not wait for it
Kafka deals with the latter i.e. Asynchronous Communication
Short examples (non technical) of each of the process
- Synchronous
a. You give money to the shopkeeper and he gives you the product — REST API or Function Call
b. You wait patiently in the Salon to get the hair cut done — Long running synchronous calls - Asynchronous
a. Luggage conveyor belt in Airports where Airport staffs drops the luggage on the belt without bothering about who will pick it and when they will pick it
b. Making an order on an e-commerce website. You continue doing your regular while the order is being processed and shipped independently
c. Making an announcement in a gathering. Only the interested ones respond while the others ignore
As you might have noticed I related the synchronous ones to their respective coding analogy but not the asynchronous ones because we are going to find out in detail about them
Technical yet simple understanding of Kafka
Let us start understanding Kafka now,
Kafka is a Messaging Queue — That means it is a Queue (FIFO) that stores Messages
Think of it like a conveyor belt in a Sushi restaurant, a queue full of sushis (messages) and the person eats the one they like and ignores the rest. Same is the case with Kafka
But before diving deeper into how it happens let us understand some terminology
Kafka Terminologies
- Topic — It specifies the category of the of the message or the classification of the message. Listeners can then just respond to the messages that belong to the topics they are listening on.
- Partitions —Number of nodes or servers serving the queue. Think of it like number of conveyor belts in the restaurant. Suppose a restaurant has 3 belts and 2 of them are damaged, restaurant can still continue to run on single belt. Similarly with more number of partitions Kafka gets better reliability and lower downtime.
- Replicas — Number of copies of each message. Single partition can have only one copy of the message. That means
replica<=partitions.
Replicas are important to have data consistency, suppose a node crashes completely along with the messages in it, then there will always be a copy of that message in some other node which will be used for communication - Group — In micro service world, it is very common to have more than one deployment of the server and if all of them starts to respond to the incoming messages then there will be duplication in processing. To avoid this duplicate processing only one listener (consumer) in the group responds to the message.
When to use Kafka then? And when not to…
As we understand now, Kafka can deliver the message or events asynchronously with reliability and consistency without the loss of information hence it can be used for all the asynchronous tasks that require the data consistency. Few of such examples can be
- Sending an email to consumer — Emails might contain important information which might get lost if email servers are not responding correctly, using Kafka can make sure that messages are adequately retried before actually failing
- Processing a user information — Sometimes user information or requests take time to process and you don’t want to keep the user waiting during this time
- Publishing an event — Your server might have updated some data or performed some task and now you want to notify all the other servers that are dependant on this task. Calling them individually can be troublesome instead a Kafka event can be produced and all the other dependant servers can listen to it and respond accordingly
Few of the other simple asynchronous can still be performed using multithreading and may not require Kafka
- Performing a large calculation
- Inserting some non critical records to DB
- Performing some cleanup operations
I will wrap up the Introduction here, in my next article I will walk through the technical details and integration with Spring Boot