Apache Kafka workflow


May 26, 2021 16:00 Apache Kafka


Table of contents


So far, we've discussed kafka's core concepts. Let's take a look at Kafka's workflow now.

Kafka is just a collection of topics divided into one or more partitions. K afka partitions are linear, ordered sequences of messages, each of which is identified by their index, called offset. A ll data in a Kafka cluster is a partition federation that is not connected. T he incoming message is written at the end of the partition and is read in consumer order. Provides persistence by copying messages to different agents.

Kafka provides pub-sub- and queue-based messaging systems in a fast, reliable, durable, fault-0200s and zero-downtime manner. I n both cases, producers simply send messages to the topic, and consumers can choose any type of messaging system based on their needs. Let's follow the steps in the next section to learn how consumers choose the messaging system of their choice.

Publish - The workflow for subscribing to messages

Here's a step-by-step workflow for Pub-Sub messages -

  • Producers periodically send messages to topics.
  • The Kafka agent stores all messages in the partition configured for that particular topic. I t ensures that messages are shared equally between partitions. If the producer sends two messages and has two partitions, Kafka stores a message in the first partition and a second message in the second partition.
  • Consumers subscribe to specific topics.
  • Once the consumer subscribes to the topic, Kafka will provide the consumer with the current offset of the topic and save the offset in the Zookeeper system.
  • Consumers will periodically request new messages from Kafka (such as 100 Ms).
  • Once Kafka receives messages from producers, it forwards them to consumers.
  • The consumer will receive the message and process it.
  • Once the message is processed, the consumer sends an acknowledgement to the Kafka agent.
  • Once Kafka receives confirmation, it changes the offset to the new value and updates it in Zookeeper. Because the offset is maintained in Zookeeper, consumers can correctly read the next message, even during server violence.
  • The above process will be repeated until the consumer stops the request.
  • Consumers can always fall back/jump to the desired topic offset and read all subsequent messages.

The workflow for the queue message/user group

In a queue messaging system instead of a single consumer, a group of consumers with the same group ID subscribes to the topic. S imply put, consumers who subscribe to topics with the same Group ID are considered a single group, and messages are shared between them. Let's examine the actual workflow of this system.

  • Producers send messages to a topic at regular intervals.
  • Kafka stores all messages in the partition configured for that particular topic, similar to the previous scenario.
  • A single consumer subscribes to a specific topic, assuming Topic-01 is Group-1.
  • Kafka interacts with consumers in the same way as publish-subscribe messages until new consumers subscribe to the same topic Topic-01 1 with the same group ID.
  • Once new consumers arrive, Kafka switches its operations to shared mode and shares data between the two consumers. This share continues until the number of users reaches the number of partitions configured for that particular topic.
  • Once the number of consumers exceeds the number of partitions, the new consumer will not receive any further messages until the existing consumer unsubscribes from any one consumer. This occurs because each consumer in Kafka will be assigned at least one partition, and once all partitions are assigned to existing consumers, new consumers will have to wait.
  • This feature is also known as a consumer group. Similarly, Kafka offers the best of both systems in a very simple and efficient way.

ZooKeeper's role

A key dependency of Apache Kafka is Apache Zookeeper, a distributed configuration and synchronization service. Z ookeyer is a coordination interface between Kafka agents and consumers. T he Kafka server shares information through the Zookeeper cluster. Kafka stores basic metadata in Zookeeper, such as information about topics, agents, consumer offsets (queue readers), and so on.

Because all critical information is stored in Zookeeper, and it typically replicates this data as a whole, the Kafka agent/Zookeyer failure does not affect the state of the Kafka cluster. K afka will return to state once Zookeeper restarts. T his brings zero downtime to Kafka. The leadership election between Kafka agents was also completed by using Zookeyer in the event of a leadership failure.