Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It is widely adopted for handling large-scale, real-time data feeds. By installing Kafka on your VPS, you can easily set up a robust messaging system for processing high-throughput data streams.
In this guide, we will walk you through the process of installing Apache Kafka on your VPS for data streaming.
Prerequisites
Before proceeding, ensure you have the following:
- A VPS running Ubuntu 20.04 or later (Other distributions like CentOS or Debian are also supported, but this guide uses Ubuntu).
- Java 8 or later (Kafka requires Java to run).
- A domain name (Optional for easier access).
- Basic knowledge of Linux commands.
- Root or sudo user privileges.
Step 1: Update and Install Java
Apache Kafka is built in Java and requires Java to be installed on your system. If Java is not installed, follow these steps.
-
Install OpenJDK 8
- Update your package index and install OpenJDK 8 (or newer if needed):
sudo apt update sudo apt install openjdk-11-jdk -y
- Update your package index and install OpenJDK 8 (or newer if needed):
-
Verify Java Installation
- Check if Java is installed correctly:
java -version
- You should see output similar to:
java
openjdk version "11.0.9" 2020-10-20 OpenJDK Runtime Environment (build 11.0.9+11-post-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.9+11-post-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
- Check if Java is installed correctly:
-
Set Java as the Default
- You can set the installed Java as the default by using
update-alternatives
:sudo update-alternatives --config java
- You can set the installed Java as the default by using
Step 2: Download and Install Apache Kafka
-
Create a Kafka Directory
- Create a directory for Kafka:
sudo mkdir /opt/kafka cd /opt/kafka
- Create a directory for Kafka:
-
Download Kafka
- Go to the Apache Kafka download page and copy the link to the latest version. Then, use
wget
to download the binary:sudo wget https://downloads.apache.org/kafka/2.8.0/kafka_2.13-2.8.0.tgz
- Go to the Apache Kafka download page and copy the link to the latest version. Then, use
-
Extract Kafka
- Extract the downloaded file:
sudo tar -xvzf kafka_2.13-2.8.0.tgz sudo mv kafka_2.13-2.8.0/* /opt/kafka/
- Extract the downloaded file:
-
Verify Kafka Installation
- List the contents to make sure Kafka is correctly extracted:
ls /opt/kafka
- You should see various directories and files related to Kafka.
- List the contents to make sure Kafka is correctly extracted:
Step 3: Install and Configure Zookeeper
Kafka uses Zookeeper for distributed coordination. You need to install and configure Zookeeper before running Kafka.
-
Install Zookeeper
- Install Zookeeper on your VPS:
sudo apt install zookeeperd -y
- Install Zookeeper on your VPS:
-
Check Zookeeper Status
- Ensure that the Zookeeper service is running:
sudo systemctl status zookeeper
- If it’s not running, start it using:
sudo systemctl start zookeeper
- Ensure that the Zookeeper service is running:
-
Configure Zookeeper
- Zookeeper is configured by default, but you can edit
/etc/zookeeper/conf/zoo.cfg
to adjust the settings if needed. The default configuration should suffice for most setups.
- Zookeeper is configured by default, but you can edit
Step 4: Configure Kafka
-
Edit Kafka Configuration
- Open the Kafka configuration file:
sudo nano /opt/kafka/config/server.properties
- Open the Kafka configuration file:
-
Adjust Kafka Settings
- Here are some of the key settings you can modify:
broker.id
: A unique identifier for each Kafka broker (set it to1
if it's the only broker).listeners
: Define the protocol and address. UsePLAINTEXT://<your-vps-ip>:9092
.zookeeper.connect
: Set this to the address of your Zookeeper instance, which is usuallylocalhost:2181
.
Example configuration:
propertiesbroker.id=1 listeners=PLAINTEXT://<your-vps-ip>:9092 log.dirs=/tmp/kafka-logs zookeeper.connect=localhost:2181
- Here are some of the key settings you can modify:
-
Save and Exit
- Save the changes and exit the editor (CTRL + X, then Y, then Enter).
Step 5: Start Kafka Server
-
Start Kafka Server
- Now that Kafka is configured, start the Kafka service:
sudo nohup /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties &
- This will start Kafka in the background.
- Now that Kafka is configured, start the Kafka service:
-
Verify Kafka is Running
- You can check if Kafka is running by listing the processes or using the following command:
ps aux | grep kafka
- You can check if Kafka is running by listing the processes or using the following command:
-
Check Kafka Logs
- Kafka logs are stored in
/tmp/kafka-logs
. You can check them for any errors.
- Kafka logs are stored in
Step 6: Set Up Kafka Topics
-
Create a Kafka Topic
- To create a new Kafka topic, use the
kafka-topics.sh
script:/opt/kafka/bin/kafka-topics.sh --create --topic my-topic --bootstrap-server <your-vps-ip>:9092 --partitions 1 --replication-factor 1
- This command creates a topic named
my-topic
.
- To create a new Kafka topic, use the
-
List Kafka Topics
- To view the created topics:
/opt/kafka/bin/kafka-topics.sh --list --bootstrap-server <your-vps-ip>:9092
- To view the created topics:
Step 7: Test Kafka with Producer and Consumer
-
Start Kafka Producer
- Kafka Producer sends messages to the topic. You can run the producer from the command line:
/opt/kafka/bin/kafka-console-producer.sh --broker-list <your-vps-ip>:9092 --topic my-topic
- You can type messages into the terminal. These messages will be sent to the
my-topic
topic.
- Kafka Producer sends messages to the topic. You can run the producer from the command line:
-
Start Kafka Consumer
- To receive messages from a topic, start a Kafka Consumer:
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server <your-vps-ip>:9092 --topic my-topic --from-beginning
- You should now see the messages that you typed into the producer.
- To receive messages from a topic, start a Kafka Consumer:
Step 8: Configure Kafka as a System Service (Optional)
To automatically start Kafka on boot, you can configure it as a system service.
-
Create a Kafka Service File
- Create a new service file for Kafka:
sudo nano /etc/systemd/system/kafka.service
- Create a new service file for Kafka:
-
Add the Following Configuration
- Add the following content to the
kafka.service
file:ini[Unit] Description=Apache Kafka Documentation=http://kafka.apache.org/documentation.html After=zookeeper.service [Service] Type=simple ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties ExecStop=/opt/kafka/bin/kafka-server-stop.sh User=kafka Group=kafka [Install] WantedBy=multi-user.target
- Add the following content to the
-
Enable and Start the Kafka Service
- Reload systemd and enable Kafka to start on boot:
sudo systemctl daemon-reload sudo systemctl enable kafka sudo systemctl start kafka
- Reload systemd and enable Kafka to start on boot:
-
Verify Kafka Service
- Check the status of the Kafka service:
sudo systemctl status kafka
- Check the status of the Kafka service:
Conclusion
You have successfully installed Apache Kafka on your VPS for real-time data streaming. With Kafka, you now have a powerful messaging system that can handle high-throughput data streams and build real-time data pipelines for various applications.
Kafka’s distributed nature allows you to scale your system as needed. You can create multiple brokers, producers, and consumers to handle large amounts of data in real time. Whether you’re building a data pipeline, monitoring system, or real-time analytics platform, Kafka is a versatile solution to support your needs.