How to Install Apache Kafka on VPS for Data Streaming

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It is widely adopted for handling large-scale, real-time data feeds. By installing Kafka on your VPS, you can easily set up a robust messaging system for processing high-throughput data streams.

In this guide, we will walk you through the process of installing Apache Kafka on your VPS for data streaming.


Prerequisites

Before proceeding, ensure you have the following:

  1. A VPS running Ubuntu 20.04 or later (Other distributions like CentOS or Debian are also supported, but this guide uses Ubuntu).
  2. Java 8 or later (Kafka requires Java to run).
  3. A domain name (Optional for easier access).
  4. Basic knowledge of Linux commands.
  5. Root or sudo user privileges.

Step 1: Update and Install Java

Apache Kafka is built in Java and requires Java to be installed on your system. If Java is not installed, follow these steps.

  1. Install OpenJDK 8

    • Update your package index and install OpenJDK 8 (or newer if needed):
       
      sudo apt update sudo apt install openjdk-11-jdk -y
  2. Verify Java Installation

    • Check if Java is installed correctly:
       
      java -version
    • You should see output similar to:
      java
       
      openjdk version "11.0.9" 2020-10-20 OpenJDK Runtime Environment (build 11.0.9+11-post-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.9+11-post-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
  3. Set Java as the Default

    • You can set the installed Java as the default by using update-alternatives:
       
      sudo update-alternatives --config java

Step 2: Download and Install Apache Kafka

  1. Create a Kafka Directory

    • Create a directory for Kafka:
       
      sudo mkdir /opt/kafka cd /opt/kafka
  2. Download Kafka

    • Go to the Apache Kafka download page and copy the link to the latest version. Then, use wget to download the binary:
       
      sudo wget https://downloads.apache.org/kafka/2.8.0/kafka_2.13-2.8.0.tgz
  3. Extract Kafka

    • Extract the downloaded file:
       
      sudo tar -xvzf kafka_2.13-2.8.0.tgz sudo mv kafka_2.13-2.8.0/* /opt/kafka/
  4. Verify Kafka Installation

    • List the contents to make sure Kafka is correctly extracted:
       
      ls /opt/kafka
    • You should see various directories and files related to Kafka.

Step 3: Install and Configure Zookeeper

Kafka uses Zookeeper for distributed coordination. You need to install and configure Zookeeper before running Kafka.

  1. Install Zookeeper

    • Install Zookeeper on your VPS:
       
      sudo apt install zookeeperd -y
  2. Check Zookeeper Status

    • Ensure that the Zookeeper service is running:
       
      sudo systemctl status zookeeper
    • If it’s not running, start it using:
       
      sudo systemctl start zookeeper
  3. Configure Zookeeper

    • Zookeeper is configured by default, but you can edit /etc/zookeeper/conf/zoo.cfg to adjust the settings if needed. The default configuration should suffice for most setups.

Step 4: Configure Kafka

  1. Edit Kafka Configuration

    • Open the Kafka configuration file:
       
      sudo nano /opt/kafka/config/server.properties
  2. Adjust Kafka Settings

    • Here are some of the key settings you can modify:
      • broker.id: A unique identifier for each Kafka broker (set it to 1 if it's the only broker).
      • listeners: Define the protocol and address. Use PLAINTEXT://<your-vps-ip>:9092.
      • zookeeper.connect: Set this to the address of your Zookeeper instance, which is usually localhost:2181.

    Example configuration:

    properties
     
    broker.id=1 listeners=PLAINTEXT://<your-vps-ip>:9092 log.dirs=/tmp/kafka-logs zookeeper.connect=localhost:2181
  3. Save and Exit

    • Save the changes and exit the editor (CTRL + X, then Y, then Enter).

Step 5: Start Kafka Server

  1. Start Kafka Server

    • Now that Kafka is configured, start the Kafka service:
       
      sudo nohup /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties &
    • This will start Kafka in the background.
  2. Verify Kafka is Running

    • You can check if Kafka is running by listing the processes or using the following command:
       
      ps aux | grep kafka
  3. Check Kafka Logs

    • Kafka logs are stored in /tmp/kafka-logs. You can check them for any errors.

Step 6: Set Up Kafka Topics

  1. Create a Kafka Topic

    • To create a new Kafka topic, use the kafka-topics.sh script:
       
      /opt/kafka/bin/kafka-topics.sh --create --topic my-topic --bootstrap-server <your-vps-ip>:9092 --partitions 1 --replication-factor 1
    • This command creates a topic named my-topic.
  2. List Kafka Topics

    • To view the created topics:
       
      /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server <your-vps-ip>:9092

Step 7: Test Kafka with Producer and Consumer

  1. Start Kafka Producer

    • Kafka Producer sends messages to the topic. You can run the producer from the command line:
       
      /opt/kafka/bin/kafka-console-producer.sh --broker-list <your-vps-ip>:9092 --topic my-topic
    • You can type messages into the terminal. These messages will be sent to the my-topic topic.
  2. Start Kafka Consumer

    • To receive messages from a topic, start a Kafka Consumer:
       
      /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server <your-vps-ip>:9092 --topic my-topic --from-beginning
    • You should now see the messages that you typed into the producer.

Step 8: Configure Kafka as a System Service (Optional)

To automatically start Kafka on boot, you can configure it as a system service.

  1. Create a Kafka Service File

    • Create a new service file for Kafka:
       
      sudo nano /etc/systemd/system/kafka.service
  2. Add the Following Configuration

    • Add the following content to the kafka.service file:
      ini
       
      [Unit] Description=Apache Kafka Documentation=http://kafka.apache.org/documentation.html After=zookeeper.service [Service] Type=simple ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties ExecStop=/opt/kafka/bin/kafka-server-stop.sh User=kafka Group=kafka [Install] WantedBy=multi-user.target
  3. Enable and Start the Kafka Service

    • Reload systemd and enable Kafka to start on boot:
       
      sudo systemctl daemon-reload sudo systemctl enable kafka sudo systemctl start kafka
  4. Verify Kafka Service

    • Check the status of the Kafka service:
       
      sudo systemctl status kafka

Conclusion

You have successfully installed Apache Kafka on your VPS for real-time data streaming. With Kafka, you now have a powerful messaging system that can handle high-throughput data streams and build real-time data pipelines for various applications.

Kafka’s distributed nature allows you to scale your system as needed. You can create multiple brokers, producers, and consumers to handle large amounts of data in real time. Whether you’re building a data pipeline, monitoring system, or real-time analytics platform, Kafka is a versatile solution to support your needs.

 
4o mini
Was this answer helpful? 0 Users Found This Useful (0 Votes)

Powered by WHMCompleteSolution