Introduction: Setting Up Kafka
Apache Kafka is an open-source scalable and high-throughput messaging system developed by the Apache Software Foundation written in Scala. Apache Kafka is specially designed to allow a single cluster to serve as the central data backbone for a large environment. It has a much higher throughput compared to other message brokers systems like ActiveMQ and RabbitMQ. It is capable of handling large volumes of real-time data efficiently. You can deploy Kafka on single Apache server or in a distributed clustered environment.
The general features of Kafka are as follows :
Persist message on disk that provide constant time performance.
High throughput with disk structures that supporting hundreds of thousands of messages per second.
Distributed system scales easily with no downtime.
Supports multi-subscribers and automatically balances the consumers during failure.
This tutorial shows how to install and configure Apache Kafka on a Ubuntu 16.04 server.
A Ubuntu 16.04 server.
Non-root user account with super user privilege set up on your server.
Step 1: Getting Started and Installing Java
1)Let's start making sure that your Ubuntu 16.04 server is fully up to date.
You can update your server by running the following commands:-
sudo apt-get update -y
sudo apt-get upgrade -y
2) Installing Java
Check if your machine has java that is already installed or has a java default version by the following command:-
Even if you have java but a lower version,You will have to upgrade it.
You can install Java by:-
sudo apt-get install default-jdk
You can install Oracle JDK 8 using the Webupd8 team PPA repository.
To add the repository, run the following command:-
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get install oracle-java8-installer -y
Step 2: Install Zookeeper
What is Zookeeper?
Zookeeper is a centralised service for maintaining configuration information, naming, providing distributed synchronisation, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
Before installing Apache Kafka, you will need to have zookeeper available and running. ZooKeeper is an open source service for maintaining configuration information, providing distributed synchronization, naming and providing group services.
1)By default Zookeeper package is available in Ubuntu's default repository
You can install it by running the following command:-
sudo apt-get install zookeeperd
Once installation is finished, it will be started as a daemon automatically. By default Zookeeper will run on port 2181.
You can test it by running the following command:
netstat -ant | grep :2181
The out put should show you that the port 2181 is being listened to.
Step 3: Install and Start Kafka Server
Now that Java and ZooKeeper are installed, it is time to download and extract Kafka from Apache website.
1)You can use curl or wget to download Kafka:(Kafka version 0.10.1.1)
Run the following command to download the kafka setup:-
2)Create a directory for Kafka
Next, create a directory for Kafka installation:
sudo mkdir /opt/kafka
3)Unzip downloaded folder
sudo tar -zxvf /home/user_name/Downloads/kafka_2.11-0.10.1.1.tgz -C /opt/kafka/
*Change user name according to your username
4)Start the kafka server
The next step is to start Kafka server, you can start it by running kafka-server-start.sh script located at /opt/kafka/kafka_2.11-0.10.1.1/bin/ directory by using the following command:-
sudo /opt/kafka/kafka_2.11-0.10.1.1/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.1/config/server.properties
5)Check if the Kafka Server is working well
You now have a Kafka server running and listening on port 9092.
Now, we can check listening ports:
- ZooKeeper : 2181
- Kafka : 9092
netstat -ant | grep -E ':2181|:9092'
Step 4: Test Your Kafka Server
Now, it is time to verify the Kafka server is operating correctly.
1)Create a new topic
To test Kafka, create a sample topic with name "testing" in Apache Kafka using the following command:
/opt/kafka/kafka_2.11-0.10.1.1/bin/kafka-topics.sh --create --topic testing --zookeeper localhost:2181 --partitions 1 --replication-factor 1
2)Check if your topic was created successfully
Now, ask Zookeeper to list available topics on Apache Kafka by running the following command:
/opt/kafka/kafka_2.11-0.10.1.1/bin/kafka-topics.sh --list --zookeeper localhost:2181
3)Publish a message using the topic you created
echo "hello world" | /opt/kafka/kafka_2.11-0.10.1.1/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testing
4)Receive the message on the topic created
/opt/kafka/kafka_2.11-0.10.1.1/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testing –from-beginning
5)To send a file using kafka over a topic
kafka-console-producer.sh --broker-list localhost:9092 –topic testing < /home/user_name/Downloads/dataset25000records.txt