Zero to Hero — Master Kafka Tutorial 1 (Basic Theory)

This is the first tutorial of this series, which covers the basic concepts of Kafka

Basic Concept — Data Topics

  1. Topic: A particular stream of data, identified by the topic name.
  2. Partition: Topic splits into partitions; Each partition is a queue of messages.
  3. Offset: The incremental id assigned to the message in the partition.

Note

  1. Offset guarantees the order of messages in the partition.
  2. Offset (the incremental Id) is only meaningful within the partition (i.e. message with id 2 in partition 1 is not guaranteed to be older than message with id 0 in partition 2).
  3. Message is kept in the partition for a limited time (default 1 week).
  4. Message in the partition is immutable.

Kafka Cluster

  1. Cluster: A cluster contains multiple brokers.
  2. Broker: One broker is one server, with a specific ID; The topic partitions are distributed across all the brokers in the cluster.
  3. Replication: The replication of data for the topic, so that the partitions in the topic will be replicated for safety purpose.

Note

  1. At any time, only one broker hosts the leader of a given partition (accepts reads and writes), the other brokers host the followers (only sync the data).
  2. If the broker hosts the leader is down, a new leader will be selected from the followers.

Producer & Consumer

Producer

  1. Write data to topic
  2. Acknowledge mode: acks=0 no data acknowledgment; acks=1 data acknowledgment from leader; acks=all data acknowledgement from leader and all replicas.
  3. Message Key: Determines how the data is sent to the partitions; key=null data is sent round robin; key=some_field, messages with the same value for the field will be sent to the same partition.

Consumer

  1. Read data from topic
  2. Read data from multiple partitions in parallel, and read data in order within each partition.
  3. Consumer Group: One consumer belongs to one consumer group; One consumer group can have multiple consumers; One consumer group reads all the partitions of a topic (e.g. If we have 2 consumer groups, each message will be consumed twice, once in consumer group 1, once in consumer group 2); Each consumer within the same consumer group reads from exclusive partitions.
  4. Consumer Offset: Kafka stores the offsets each consumer group has been reading (__consumer_offsets) on the topic; So that when consumer dies, it will be able to read back from where they left off after it recovers.
  5. Consumer Offset Committing: At most once — offset is committed as soon as the message is received, could result in message lost if the processing goes wrong; At least once — offset is committed only after the message is processed, can result in one message processed multiple time, need to make sure the process is idempotent; Exactly once — can only be achieved between Kafka to Kafka workflows.
  6. Consumer Connection: consumer connects to one broker only in the cluster, and it will know all the other brokers, topics and partitions base on the metadata.

Zookeeper

  1. Manages brokers, and handles leader election for partitions.
  2. Sends notifications to Kafka on topics changes, broker status, etc.
  3. Usually we have odd number of zookeepers (1, 3, 5, …).
  4. One of the zookeepers is a leader (handle writes), the rest are followers (handle reads).

Software Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store