Apache Kafka Architecture and Design

Apache Kafka was originally developed by LinkedIn, and was subsequently donated to the Apache Software Foundation to be open sourced in 2011, since than Kafka has quickly evolved from a messaging queue to a full-fledged event streaming platform.

Kafka is not just a message queue but a distributed, horizontally-scalable and fault-tolerant streaming platform that can be used in following three ways:

1) Publish + Subscribe (message bus with huge throughput (millions/sec))
2) Store (Store absurd amounts of data)
3) Process (real-time stream processing)

Kafka is distributed, that means it run as a cluster on multiple nodes called "brokers", even brokers can span on multiple datacenters. A distributed system is Horizontally scalable and fault-tolerance.

Being horizontally-scalable Kafka handles the overload by adding more machines to the cluster, this prevents vertical scaling issues like limits defined by the hardware (cannot scale a single machine indefinitely) and machine downtime (increasing the resources (CPU, RAM, SSD) of a machine usually requires downtime). Adding a new machine does not require downtime nor are there any limits to the amount of machines you can have in your cluster.

A Kafka cluster stores streams of records in categories called topics. Each record consists of a key, a value, and a timestamp. In a Kafka cluster the communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol.


Kafka Architecture

A "topic" is the core abstraction in Kafka, a topic is a category or feed name to which records are published. For each "topic" Kafka maintains a "portioned log", each "record" published to a particular "topic" is appended to any one of it's partition log.

A "record" published to a particular "topic" is not necessarily be appended to a single log, but the log can be partitioned in several "portioned logs" present on different "brokers". This helps the log to scale beyond a size that will fit on a single server, secondly "portioned logs" helps in "parallelism" while data is read by consumers.

Each "partitioned log" may be replicated to a configurable number of "brokers", this makes each partition to has one server which acts as the "leader" and zero or more servers which act as "followers".

The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

Each partition of a "topic" is a structured commit log, i.e. it's an ordered and immutable sequence of records that is continually appended to it. A sequential id number (offset) is assigned to each record in a partition, this offset uniquely identifies each record in a particular partition.

Kafka supports a configurable retention period, all incoming published records will be persisted (whether consumed or not) by Kafka for a time specified by retention period, after that the data will be deleted to free space.

Retaining data in Kafka for a long time is not a problem because Kafka's performance is effectively constant with respect to size of data it holds.



An "offset" helps "consumers" to read data from a particular topic and is controlled by consumers itself. In general consumers read data from a topic sequentially but they can reset their position anytime to a previous offset to replay the data.

The only information a Kafka cluster maintain about a consumer is the "offset" or position of a consumer's last read, this makes consumers very cheap i.e. they can come and go without much impact on the cluster.

The "producer" controls which "record" should go to which "partition" within a topic, this can be done using a partition function based on some key in the record, or a producer may choose to append data in round-robin fashion simply to balance load.

Each "Consumer" is associated with a "consumer group", and each record published to a topic is delivered to one consumer instance within the group of "consumers" having same "consumer group". This is where "portioned logs" comes handy, for example if a topic has "n" partitions we can start "n" consumers with same "consumer group" to make each consumer to read data parallelly from n "portioned logs" of the topic.

The Kafka protocol dynamically handles mismatch in topic partitions and consumer instances by making sure that each consumer instance is the exclusive consumer of a "fair share" of partitions at any point in time.

If new consumer instances join the group they will take over some partitions from other members of the group on the other hand if a consumer instance dies, its partitions will be distributed by the remaining instances.

Kafka only provides a total order over records within a partition, not between different partitions in a topic however a total order over records can be achieved with a topic that has only one partition.

Kafka guarantees that "messages" sent by a producer to a particular topic partition will be appended in the order they are sent.

That's it for this article you can also read about:

MULTI-BROKER APACHE KAFKA + ZOOKEEPER CLUSTER SETUP, HOW TO WRITE A KAFKA PRODUCER IN JAVA - EXAMPLE, HOW TO WRITE A KAFKA CONSUMER IN JAVA - AUTOMATIC OFFSET COMMIT, HOW TO WRITE A KAFKA CONSUMER IN JAVA - MANUAL OFFSET COMMIT, HOW TO WRITE A KAFKA CONSUMER IN JAVA - ASSIGNABLE TO A SPECIFIC PARTITION, and DOWNLOAD AND INSTALL APACHE ZOOKEEPER ON UBUNTU.