Setting up your own Apache Kafka cluster with Vagrant – Step by Step

By | May 7, 2016

cluster of Vagrant virtual machines configured with Kafka, take a look at this awesome blog post. It sets up all the VMs for you and configures each node in the cluster, in one fell swoop.

However, if you want to learn how to install and configure a Kafka cluster yourself, utilizing your own Vagrant boxes, then read on. This step-by-step walk-through will guide you through building a Kafka cluster from the ground up, with vanilla Debian as a base. Kafka requires Apache Zookeeper, a service that coordinates distributed applications. In this walk-through, we will setup our first box from scratch. We will then package that box and use it as the base box for the other nodes in the cluster. When we’re finished, we’ll have a fully functional 3-node Zookeeper and Kafka cluster. It would probably be a better practice to automate this via existing chef recipes, but that’s hardly walk-through material. We are going to do it the simple, long-winded way. And I think you will find that it isn’t too painful. Onward!

Part I – Setting up a single Zookeeper/Kafka node, starting from a Vagrant base box

1. Download and install Virtualbox from virtualbox.org
Note: This walk-through uses a Vagrant base box that requires Virtualbox 4.2.10. If you already have Vagrant configured to work with VMWare, there is a VMWare Fusion version of the same base box. I will point it out in step 3 below.

2. Download and install Vagrant from vagrantup.com

3. Initialize a new Vagrant box. This particular box is vanilla Debian from Puppet Labs. I recommend creating it in a directory with a name that accurately describes what the box represents. If you are new to Vagrant, it’s easy to get carried away and wind up with an over-abundance of VMs on your machine.


mkdir debian-cluster-node-1
cd debian-cluster-node-1
vagrant init debian-cluster-node-1 http://puppet-vagrant-boxes.puppetlabs.com/debian-70rc1-x64-vbox4210.box

Or, if you’re using Vagrant with VMWare:

vagrant init debian-cluster-node-1 http://puppet-vagrant-boxes.puppetlabs.com/debian-70rc1-x64-vf503.box

This will create a Vagrantfile in the directory. You use this file to configure your VM.

4. Edit the Vagrantfile to your liking
It’s a good idea to bump up the memory. 2048 should be sufficient.

config.vm.provider :virtualbox do |vb|
  vb.customize ["modifyvm", :id, "--memory", "2048"]
end

The only other setting of note is the private IP. This allows the host (your computer’s OS) and other VMs to access your new Vagrant box via a local network IP address.

Find the line

# config.vm.network :private_network, ip: "192.168.33.10"

 


Uncomment it, and change the IP address if you feel like it, otherwise just leave it as is. I set mine to 192.168.33.21. I will be referring to that IP address throughout this walk-through.

5. Setup the Vagrant box
Start the box:

vagrant up

The first time takes quite awhile. It needs to download and unpack the box first.

Login to the box:

vagrant ssh

Install dependencies (you only need Java, and you might want to install a text editor too)

sudo apt-get update
sudo apt-get install openjdk-7-jdk

For the following steps, change to the root user (sudo su)
4. Download, build, and install Kafka
I’ve had issues trying to get things up and running with just the binary download, so we’ll build from source. Even the Kafka Quick Start tells you to build from source, so that’s what we’re going to do here. Don’t worry, it’s easy.
Note: You don’t have to install it in /usr/local/kafka. You can put it wherever you want.

wget https://archive.apache.org/dist/kafka/kafka-0.8.0-beta1-src.tgz
mkdir /usr/local/kafka
tar -zxvf kafka-0.8.0-beta1-src.tgz
cd kafka-0.8.0-beta1-src
./sbt update
./sbt package
./sbt assembly-package-dependency
cd ../
mv kafka-0.8.0-beta1-src /usr/local/kafka


5. Install Zookeeper
Note: You don't have to install it in /usr/local/zookeeper. You can put it wherever you want.

wget http://apache.claz.org/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
mkdir /usr/local/zookeeper
tar -zxvf zookeeper-3.4.6.tar.gz --directory /usr/local/zookeeper
cp /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo.cfg

 


6. Configure Zookeeper
Before configuring, create a directory for the Zookeeper data.

mkdir -p /var/zookeeper/data

Edit the Zookeeper configuration file, /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo.cfg

Change the dataDir property to the directory you created above.

dataDir=/var/zookeeper/data

Find the list of servers that’s commented out. If these lines aren’t there, add them.

#server.1=zookeeper1:2888:3888
#server.2=zookeeper2:2888:3888
#server.3=zookeeper3:2888:3888

Uncomment the server.1 property, and change “zookeeper1” to the private IP address that you assigned to this VM.

server.1=192.168.33.21:2888:3888

Important step, often forgot!
We need to create a myid file in the data directory.
Zookeeper uses a file named “myid” to identify itself within the cluster. It holds a single character, 1-255. Let’s set it to 1.

echo "1" > /var/zookeeper/data/myid

7. Configure Kafka
If you followed the above installation instructions, the config directory will be here:
/usr/local/kafka/kafka-0.8.0-beta1-src/config
Edit the server.properties file
Take note of the broker.id value. Each Kafka instance will need to have a unique broker.id, just as each Zookeeper instance needs to have a distinct value in the myid file. Let’s set this to 1.

broker.id=1

Uncomment #host.name=localhost and set it to the private IP address of the VM.

host.name=192.168.33.21

Locate the zookeeper.connect property. The default setting is fine, but we will be adding more nodes as we build up the cluster.
Change “localhost” to the IP address of the VM.

zookeeper.connect=192.168.33.21:2181

8. Test the current setup
You probably want to add these to your ~/.bash_profile first

export ZK_HOME=/usr/local/zookeeper/zookeeper-3.4.6/
export KAFKA_HOME=/usr/local/kafka/kafka-0.8.0-beta1-src/
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH

Start Zookeeper

sudo $ZK_HOME/bin/zkServer.sh start

Start Kafka

sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

Test Kafka
List topics (should not have any to start with)

$KAFKA_HOME/bin/kafka-list-topic.sh --zookeeper 192.168.33.21:2181

 


Create a new topic

$KAFKA_HOME/bin/kafka-create-topic.sh --zookeeper 192.168.33.21:2181 --replica 1 --partition 1 --topic topic-1

 


Produce messages to that topic from the console

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.21:9092 --topic topic-1
Hi
My
Name
Is
Kafka

 


(ctrl-c to kill the console producer)
Run the console consumer to verify that the messages are there for the new topic

$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.21:2181 --topic topic-1 --from-beginning


You should see the output

Hi
My
Name
Is
Kafka

Assuming that everything works, it’s time to package up this box so that we can use it as our new base box for the other VMs in the cluster.

On your host, find the name of your current VM.

VBoxManage list vms

 


Mine happens to be “vagrant_default_1399123653833_13594”

Now package it up into a box.

vagrant package --base vagrant_default_1399123653833_13594 --output debian-cluster.box

 


Put the box in a more easily recognizable location.

mkdir ~/boxes
mv debian-cluster.box ~/boxes

 


10. Shutdown the VM

vagrant halt

Part II – Adding new nodes to the cluster from the newly created base box

1. Make a directory for a new cluster node and cd to it. “debian-cluster-node-2” sounds good to me.

vagrant init debian-cluster-node-2 ~/boxes/debian-cluster.box

2. Edit the Vagrantfile, do NOT overwrite it with the Vagrantfile from your other box.
Set the memory to 2048 and set the private IP address to something different this time. I will use this: 192.168.33.22

3. Start up the new box and log in

vagrant up
vagrant ssh

4. Edit the Kafka config settings
If you set $KAFKA_HOME in your .bash_profile before packaging the box in Part I of this walk-through, it will be here:
$KAFKA_HOME/config/server.properties

Set the following properties:

broker.id=2
host.name=192.168.33.22

 


Leave the Zookeeper settings alone for now.

5. In another terminal window, start your first Vagrant box up again and log in. (Cluster Node 1)

vagrant up
vagrant ssh

 


6. Start Zookeeper and Kafka

sudo $ZK_HOME/bin/zkServer.sh start
sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

 


7. Go back to your newly created VM for your second cluster node and start Kafka (Cluster Node 2)

sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

That’s it! Your Kafka servers are now clustered together.
To test, go back to the terminal window for node 1.
Produce some messages to the topic that you created earlier, but this time use your new VM as the broker.

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.22:9092 --topic topic-1
Hello
From
Broker 2

 


(ctrl-c)

Check to see that your messages were successfully produced

$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.21:2181 --topic topic-1 --from-beginning

You should be able to produce messages to either broker now, or you can pass in both brokers to the console producer:

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.21:9092,192.168.33.22:9092 --topic topic-1

 


What about Zookeeper?

Zookeeper uses a “majority rule” strategy to make its decisions. If we were to setup a 2-server Zookeeper cluster, and 1 server died, then there would only be 1 out of 2 remaining, which is not enough to be a “majority.” See this post for a better explanation.

Now let’s add a third node so that we can configure a 3-node Zookeeper cluster.
Follow steps 1-3 above, but name this node “debian-cluster-node-3”, and give it a different private IP in the Vagrantfile. I will use 192.168.33.23. At step 4, we’ll do things a little differently, so come back here when you’ve finished steps 1-3.

4. Edit the Kafka server properties

$KAFKA_HOME/config/server.properties

Just as we did for the second node, we set the broker.id and host.name properties.

broker.id=3
host.name=192.168.33.23

 


This time, since we will have a 3-node Zookeeper cluster, we will also edit the zookeeper.connect property.

zookeeper.connect=192.168.33.21:2181,192.168.33.22:2181,192.168.33.23:2181

At this time, go back and edit the server.properties file in your other two boxes and set the zookeeper.connect property to be the same as what you have here.

5. Edit the Zookeeper config (for all servers)
We ignored this step when setting up the second node in the cluster, because we didn’t have enough servers for a proper Zookeeper cluster yet. We’re going to have to go back and take care of that now.

In all three of your servers, open up $ZK_HOME/conf/zoo.cfg file, and make sure you have the following:

server.1=192.168.33.21:2888:3888
server.2=192.168.33.22:2888:3888
server.3=192.168.33.23:2888:3888

 


6. Set the myid file for the second and third servers
Remember that 1-character long file we created on the first box?
We need to do the same thing for the second and third servers, or our Zookeeper cluster will not work.
Since we used “1” for the first server, let’s keep it simple for the other servers.

On your second server:

echo "2" > /var/zookeeper/data/myid

On your third server:

echo "3" > /var/zookeeper/data/myid

7. Shut them all down, hurry!
Ok, no hurry, but let’s shut down all the boxes and then bring them up one at a time, just to be sure we’re starting fresh.

For each VM

exit
vagrant halt

8. Start them all up again

vagrant up
vagrant ssh

9. Start Zookeeper and Kafka on each server
Zookeeper first
It’s a good idea to start up all the Zookeeper instances first before starting Kafka, so for each VM:

sudo $ZK_HOME/bin/zkServer.sh start

Now start Kafka on each node.

sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

Your cluster should be in full swing now!

Test again with the console producer, this time using the third node as the broker.

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.23:9092 --topic topic-1
Hello
From
Broker
3

(ctrl-c)

And then use the console consumer to read the topic. This time, use one of your new Zookeeper nodes for the –zookeeper argument.

$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.23:2181 --topic topic-1 --from-beginning

Now let’s create a new replicated topic and produce some messages to it.

$KAFKA_HOME/bin/kafka-create-topic.sh --zookeeper 192.168.33.22:2181 --replica 3 --partition 1 --topic replicated-topic-1
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.23:9092 --topic replicated-topic-1
I
Am
A
Replicated
Topic

Now consume the new topic from one of your other servers.

$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.21:2181 --topic replicated-topic-1 --from-beginning

Play around with producing to different brokers and consuming with different zookeepers. Hopefully, it all works!

You can do A LOT with Zookeeper and Kafka. The purpose of this walk-through is just to get you to a point where you can be ready to explore all of Kafka’s goodness within a clustered environment. For more information, please read the documentation.

http://kafka.apache.org/documentation.html
http://zookeeper.apache.org/doc/r3.4.6/

 

reference -> https://objectpartners.com/2014/05/06/setting-up-your-own-apache-kafka-cluster-with-vagrant-step-by-step/

Сomments аrchive