Kafka Zero to Hero

Beginner-friendly Kafka tutorial content with runnable examples and Docker-based local setup.

What is Kafka?

Kafka is a distributed event streaming platform. In simple words, it is a system that lets one part of your application publish events and other parts consume those events in real time.

Simple real-time scenario

Think about a food delivery app:

A customer places an order.
The restaurant dashboard needs that order immediately.
The delivery partner app needs updates when the order is ready.
The analytics team wants to count orders by city.

Instead of every service calling every other service directly, the order service can publish an event like order_created to Kafka. Every interested system can read that event independently.

That makes Kafka a central event highway for your data.

Why Kafka?

Kafka is useful when many systems need the same information at the same time, but for different reasons.

Same real-time scenario

In the food delivery example, one order event can be used by:

the restaurant service to start preparing food
the delivery system to plan pickup
the notification service to send updates to the customer
the analytics service to build dashboards

Without Kafka, the order service would need direct integrations with every consumer. That creates tight coupling.

With Kafka:

producers and consumers stay decoupled
multiple consumers can read the same event
messages are stored for replay
the platform can scale as traffic grows

Kafka Building Blocks

Producer

The application that sends data to Kafka.

Example: an order service publishing order_created events.

Topic

A named stream of events.

Example: orders, payments, notifications.

Partition

Each topic is split into partitions. Partitions allow Kafka to scale and process events in parallel.

Important rule: ordering is guaranteed only inside a single partition.

Broker

A Kafka server. A Kafka cluster is made of one or more brokers.

Consumer

An application that reads messages from Kafka.

Example: a notification service reading from the orders topic.

Consumer Group

Multiple consumer instances can share work under one group id.

same group: messages are divided across consumers
different groups: each group gets its own copy of the topic data

Offset

Every message in a partition has an offset. Kafka uses offsets to track which messages were already read.

Visual Model

flowchart LR
    A[Order Service Producer] --> B[Kafka Topic: orders]
    B --> C[Restaurant Consumer Group]
    B --> D[Delivery Consumer Group]
    B --> E[Analytics Consumer Group]

Kafka Setup

Prerequisites

Docker Desktop
Python 3.11 or later

1. Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .

2. Start Kafka

./scripts/start-kafka.sh

This starts a single-node Kafka broker on localhost:9092 using KRaft mode.

Kafka UI is also started and available in the browser at http://localhost:8080.

3. Stop Kafka

./scripts/stop-kafka.sh

Live Terminal Demo (Recommended For Presentations)

This flow matches how you would explain Kafka in real time:

Start Kafka.
Start producer and show events being written.
Start consumer and show near real-time reads.

Terminal 1: Start Kafka

./scripts/start-kafka.sh

Optional browser view:

Kafka UI: http://localhost:8080

Terminal 2: Create topic once

source .venv/bin/activate
python examples/live_topic_setup.py --topic order-events-live

Terminal 3: Start producer

source .venv/bin/activate
python examples/live_producer.py --topic order-events-live --interval 1

You will see PRODUCED_EVENT ... logs continuously.

Terminal 4: Start consumer

source .venv/bin/activate
python examples/live_consumer.py --topic order-events-live

You will see CONSUMED_EVENT ... logs almost immediately after each produced event.

Useful options:

Read old events too: python examples/live_consumer.py --topic order-events-live --from-beginning
Produce a fixed number of events: python examples/live_producer.py --topic order-events-live --max-events 20

What it demonstrates

creating a topic
publishing JSON events continuously
consuming those events continuously
seeing near real-time flow from producer to consumer

Challenges with Kafka

Kafka is powerful, but it introduces real engineering challenges.

1. Ordering is not global

Ordering is guaranteed only within a partition, not across the whole topic.

2. Duplicate processing can happen

Consumers may process a message more than once, so applications should be idempotent.

3. Schema changes need discipline

If event structure changes carelessly, consumers can break.

4. Operations become more complex

You need monitoring, alerting, topic planning, retention settings, and capacity planning.

5. It can be overkill

For a very small application with simple request-response communication, Kafka may add unnecessary complexity.

Repo Structure

.
├── README.md
├── compose.yaml
├── examples
│   ├── live_consumer.py
│   ├── live_producer.py
│   └── live_topic_setup.py
├── scripts
│   ├── start-kafka.sh
│   └── stop-kafka.sh
├── src
│   └── kafka_zero_to_hero
│       ├── __init__.py
│       └── common.py