This is the first tutorial of this series, which covers the basic concepts of Kafka
Basic Concept — Data Topics
- Topic: A particular stream of data, identified by the topic name.
- Partition: Topic splits into partitions; Each partition is a queue of messages.
- Offset: The incremental id assigned to the message in the partition.
- Offset guarantees the order of messages in the partition.
- Offset (the incremental Id) is only meaningful within the partition (i.e. message with id 2 in partition 1 is not guaranteed to be older than message with id 0 in partition 2).
- Message is kept in the partition for a limited time (default 1 week).
- Message in the partition is immutable.
- Cluster: A cluster contains multiple brokers.
- Broker: One broker is one server, with a specific ID; The topic partitions are distributed across all the brokers in the cluster.
- Replication: The replication of data for the topic, so that the partitions in the topic will be replicated for safety purpose.
- At any time, only one broker hosts the leader of a given partition (accepts reads and writes), the other brokers host the followers (only sync the data).
- If the broker hosts the leader is down, a new leader will be selected from the followers.
Producer & Consumer
- Write data to topic
- Acknowledge mode:
acks=0no data acknowledgment;
acks=1data acknowledgment from leader;
acks=alldata acknowledgement from leader and all replicas.
- Message Key: Determines how the data is sent to the partitions;
key=nulldata is sent round robin;
key=some_field, messages with the same value for the field will be sent to the same partition.
- Read data from topic
- Read data from multiple partitions in parallel, and read data in order within each partition.
- Consumer Group: One consumer belongs to one consumer group; One consumer group can have multiple consumers; One consumer group reads all the partitions of a topic (e.g. If we have 2 consumer groups, each message will be consumed twice, once in consumer group 1, once in consumer group 2); Each consumer within the same consumer group reads from exclusive partitions.
- Consumer Offset: Kafka stores the offsets each consumer group has been reading (
__consumer_offsets) on the topic; So that when consumer dies, it will be able to read back from where they left off after it recovers.
- Consumer Offset Committing:
At most once— offset is committed as soon as the message is received, could result in message lost if the processing goes wrong;
At least once— offset is committed only after the message is processed, can result in one message processed multiple time, need to make sure the process is idempotent;
Exactly once— can only be achieved between Kafka to Kafka workflows.
- Consumer Connection: consumer connects to one broker only in the cluster, and it will know all the other brokers, topics and partitions base on the metadata.
- Manages brokers, and handles leader election for partitions.
- Sends notifications to Kafka on topics changes, broker status, etc.
- Usually we have odd number of zookeepers (1, 3, 5, …).
- One of the zookeepers is a leader (handle writes), the rest are followers (handle reads).