Apache Kafka a Quick Overview
Apache Kafka is an open-source distributed streaming platform that allows you to build real-time data pipelines and streaming applications. Kafka is designed to handle high volumes of data in real time, making it an ideal choice for applications that require fast and reliable data processing.
It provides a distributed architecture for handling data streams and supports various use cases, including real-time data processing, messaging, and log aggregation. Kafka is based on three capabilities: Publish, Store, and Process, where data is produced by publishers and consumed by subscribers. Producers write data to Kafka topics, which are essentially streams of records, and consumers read data from these topics in real time.
This quick start guide will cover the basics of setting up and using Apache Kafka for your data streaming needs.
Prerequisites
Before getting started with Apache Kafka, you will need the following prerequisites:
- Java JDK 8 or later
- Apache Kafka binary
Setting up Apache Kafka
- Download the Apache Kafka binary distribution and extract it to a directory of your choice.
- Navigate to the Kafka directory and start the ZooKeeper “zookeeper-server-start” server by running the following command in a terminal window:
- In a new terminal window, start the Kafka server “kafka-server-start” by running the following command:
Windows:
bin/zookeeper-server-start.bat config/zookeeper.properties
bin/kafka-server-start.bat config/server.properties
Linux:
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
This will start the Kafka server, which will listen on port 9092 by default.
Creating a Topic
Before sending data to Kafka, you need to create a topic. A topic is a named stream of records in Kafka, the primary data storage unit.
To create a topic, run the following command. This will create a topic named “test-topic” with one partition and one replication factor.
Windows:
bin/kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test-topic
Linux:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test-topic
Sending and Receiving Messages
Once you have created a topic, you can send and receive messages using the Kafka command line tools.
Run the following command to send a message to the “test-topic” topic will send the message “Hello World” to the “test-topic” topic.
Windows:
echo "Hello World" | bin/kafka-console-producer.bat --broker-list localhost:9092 --topic test-topic
Linux:
echo "Hello World" | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic
Run the following command to receive messages from the “test-topic” topic will start a console-based consumer that will read messages from the “test-topic” topic and print them to the console.
Windows:
bin/kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test-topic --from-beginning
Linux:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning
Simple Java Producer and Consumer
The code below provides an overview of implementing a simple Java program to produce and consume messages using Apache Kafka.
https://github.com/thiagomarsal/java-kafka
Conclusion
Apache Kafka is a powerful tool for building real-time data pipelines and streaming applications. With its distributed architecture and support for high volumes of data, it is an ideal choice for applications that require fast and reliable data processing.
This quickstart guide covered the basics of setting up and using Apache Kafka, including creating a topic, sending and receiving messages, and more. With this knowledge, you can build your data streaming applications using Apache Kafka.
I will cover some key features and configurations for high-volume data handling in a dedicated article, which requires more leverage.