Setting up your own Apache Kafka cluster with Vagrant – Step by Step

By | May 7, 2016

cluster of Vagrant virtual machines configured with Kafka, take a look at this awesome blog post. It sets up all the VMs for you and configures each node in the cluster, in one fell swoop.

However, if you want to learn how to install and configure a Kafka cluster yourself, utilizing your own Vagrant boxes, then read on. This step-by-step walk-through will guide you through building a Kafka cluster from the ground up, with vanilla Debian as a base. Kafka requires Apache Zookeeper, a service that coordinates distributed applications. In this walk-through, we will setup our first box from scratch. We will then package that box and use it as the base box for the other nodes in the cluster. When we’re finished, we’ll have a fully functional 3-node Zookeeper and Kafka cluster. It would probably be a better practice to automate this via existing chef recipes, but that’s hardly walk-through material. We are going to do it the simple, long-winded way. And I think you will find that it isn’t too painful. Onward!

Part I – Setting up a single Zookeeper/Kafka node, starting from a Vagrant base box

1. Download and install Virtualbox from
Note: This walk-through uses a Vagrant base box that requires Virtualbox 4.2.10. If you already have Vagrant configured to work with VMWare, there is a VMWare Fusion version of the same base box. I will point it out in step 3 below.

2. Download and install Vagrant from

3. Initialize a new Vagrant box. This particular box is vanilla Debian from Puppet Labs. I recommend creating it in a directory with a name that accurately describes what the box represents. If you are new to Vagrant, it’s easy to get carried away and wind up with an over-abundance of VMs on your machine.

mkdir debian-cluster-node-1
cd debian-cluster-node-1
vagrant init debian-cluster-node-1

Or, if you’re using Vagrant with VMWare:

vagrant init debian-cluster-node-1

This will create a Vagrantfile in the directory. You use this file to configure your VM.

4. Edit the Vagrantfile to your liking
It’s a good idea to bump up the memory. 2048 should be sufficient.

config.vm.provider :virtualbox do |vb|
  vb.customize ["modifyvm", :id, "--memory", "2048"]

The only other setting of note is the private IP. This allows the host (your computer’s OS) and other VMs to access your new Vagrant box via a local network IP address.

Find the line

# :private_network, ip: ""


Uncomment it, and change the IP address if you feel like it, otherwise just leave it as is. I set mine to I will be referring to that IP address throughout this walk-through.

5. Setup the Vagrant box
Start the box:

vagrant up

The first time takes quite awhile. It needs to download and unpack the box first.

Login to the box:

vagrant ssh

Install dependencies (you only need Java, and you might want to install a text editor too)

sudo apt-get update
sudo apt-get install openjdk-7-jdk

For the following steps, change to the root user (sudo su)
4. Download, build, and install Kafka
I’ve had issues trying to get things up and running with just the binary download, so we’ll build from source. Even the Kafka Quick Start tells you to build from source, so that’s what we’re going to do here. Don’t worry, it’s easy.
Note: You don’t have to install it in /usr/local/kafka. You can put it wherever you want.

mkdir /usr/local/kafka
tar -zxvf kafka-0.8.0-beta1-src.tgz
cd kafka-0.8.0-beta1-src
./sbt update
./sbt package
./sbt assembly-package-dependency
cd ../
mv kafka-0.8.0-beta1-src /usr/local/kafka

5. Install Zookeeper
Note: You don't have to install it in /usr/local/zookeeper. You can put it wherever you want.

mkdir /usr/local/zookeeper
tar -zxvf zookeeper-3.4.6.tar.gz --directory /usr/local/zookeeper
cp /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo.cfg


6. Configure Zookeeper
Before configuring, create a directory for the Zookeeper data.

mkdir -p /var/zookeeper/data

Edit the Zookeeper configuration file, /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo.cfg

Change the dataDir property to the directory you created above.


Find the list of servers that’s commented out. If these lines aren’t there, add them.


Uncomment the server.1 property, and change “zookeeper1” to the private IP address that you assigned to this VM.


Important step, often forgot!
We need to create a myid file in the data directory.
Zookeeper uses a file named “myid” to identify itself within the cluster. It holds a single character, 1-255. Let’s set it to 1.

echo "1" > /var/zookeeper/data/myid

7. Configure Kafka
If you followed the above installation instructions, the config directory will be here:
Edit the file
Take note of the value. Each Kafka instance will need to have a unique, just as each Zookeeper instance needs to have a distinct value in the myid file. Let’s set this to 1.

Uncomment and set it to the private IP address of the VM.

Locate the zookeeper.connect property. The default setting is fine, but we will be adding more nodes as we build up the cluster.
Change “localhost” to the IP address of the VM.


8. Test the current setup
You probably want to add these to your ~/.bash_profile first

export ZK_HOME=/usr/local/zookeeper/zookeeper-3.4.6/
export KAFKA_HOME=/usr/local/kafka/kafka-0.8.0-beta1-src/
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH

Start Zookeeper

sudo $ZK_HOME/bin/ start

Start Kafka

sudo $KAFKA_HOME/bin/ $KAFKA_HOME/config/ &

Test Kafka
List topics (should not have any to start with)

$KAFKA_HOME/bin/ --zookeeper


Create a new topic

$KAFKA_HOME/bin/ --zookeeper --replica 1 --partition 1 --topic topic-1


Produce messages to that topic from the console

$KAFKA_HOME/bin/ --broker-list --topic topic-1


(ctrl-c to kill the console producer)
Run the console consumer to verify that the messages are there for the new topic

$KAFKA_HOME/bin/ --zookeeper --topic topic-1 --from-beginning

You should see the output


Assuming that everything works, it’s time to package up this box so that we can use it as our new base box for the other VMs in the cluster.

On your host, find the name of your current VM.

VBoxManage list vms


Mine happens to be “vagrant_default_1399123653833_13594”

Now package it up into a box.

vagrant package --base vagrant_default_1399123653833_13594 --output


Put the box in a more easily recognizable location.

mkdir ~/boxes
mv ~/boxes


10. Shutdown the VM

vagrant halt

Part II – Adding new nodes to the cluster from the newly created base box

1. Make a directory for a new cluster node and cd to it. “debian-cluster-node-2” sounds good to me.

vagrant init debian-cluster-node-2 ~/boxes/

2. Edit the Vagrantfile, do NOT overwrite it with the Vagrantfile from your other box.
Set the memory to 2048 and set the private IP address to something different this time. I will use this:

3. Start up the new box and log in

vagrant up
vagrant ssh

4. Edit the Kafka config settings
If you set $KAFKA_HOME in your .bash_profile before packaging the box in Part I of this walk-through, it will be here:

Set the following properties:


Leave the Zookeeper settings alone for now.

5. In another terminal window, start your first Vagrant box up again and log in. (Cluster Node 1)

vagrant up
vagrant ssh


6. Start Zookeeper and Kafka

sudo $ZK_HOME/bin/ start
sudo $KAFKA_HOME/bin/ $KAFKA_HOME/config/ &


7. Go back to your newly created VM for your second cluster node and start Kafka (Cluster Node 2)

sudo $KAFKA_HOME/bin/ $KAFKA_HOME/config/ &

That’s it! Your Kafka servers are now clustered together.
To test, go back to the terminal window for node 1.
Produce some messages to the topic that you created earlier, but this time use your new VM as the broker.

$KAFKA_HOME/bin/ --broker-list --topic topic-1
Broker 2



Check to see that your messages were successfully produced

$KAFKA_HOME/bin/ --zookeeper --topic topic-1 --from-beginning

You should be able to produce messages to either broker now, or you can pass in both brokers to the console producer:

$KAFKA_HOME/bin/ --broker-list, --topic topic-1


What about Zookeeper?

Zookeeper uses a “majority rule” strategy to make its decisions. If we were to setup a 2-server Zookeeper cluster, and 1 server died, then there would only be 1 out of 2 remaining, which is not enough to be a “majority.” See this post for a better explanation.

Now let’s add a third node so that we can configure a 3-node Zookeeper cluster.
Follow steps 1-3 above, but name this node “debian-cluster-node-3”, and give it a different private IP in the Vagrantfile. I will use At step 4, we’ll do things a little differently, so come back here when you’ve finished steps 1-3.

4. Edit the Kafka server properties


Just as we did for the second node, we set the and properties.


This time, since we will have a 3-node Zookeeper cluster, we will also edit the zookeeper.connect property.


At this time, go back and edit the file in your other two boxes and set the zookeeper.connect property to be the same as what you have here.

5. Edit the Zookeeper config (for all servers)
We ignored this step when setting up the second node in the cluster, because we didn’t have enough servers for a proper Zookeeper cluster yet. We’re going to have to go back and take care of that now.

In all three of your servers, open up $ZK_HOME/conf/zoo.cfg file, and make sure you have the following:



6. Set the myid file for the second and third servers
Remember that 1-character long file we created on the first box?
We need to do the same thing for the second and third servers, or our Zookeeper cluster will not work.
Since we used “1” for the first server, let’s keep it simple for the other servers.

On your second server:

echo "2" > /var/zookeeper/data/myid

On your third server:

echo "3" > /var/zookeeper/data/myid

7. Shut them all down, hurry!
Ok, no hurry, but let’s shut down all the boxes and then bring them up one at a time, just to be sure we’re starting fresh.

For each VM

vagrant halt

8. Start them all up again

vagrant up
vagrant ssh

9. Start Zookeeper and Kafka on each server
Zookeeper first
It’s a good idea to start up all the Zookeeper instances first before starting Kafka, so for each VM:

sudo $ZK_HOME/bin/ start

Now start Kafka on each node.

sudo $KAFKA_HOME/bin/ $KAFKA_HOME/config/ &

Your cluster should be in full swing now!

Test again with the console producer, this time using the third node as the broker.

$KAFKA_HOME/bin/ --broker-list --topic topic-1


And then use the console consumer to read the topic. This time, use one of your new Zookeeper nodes for the –zookeeper argument.

$KAFKA_HOME/bin/ --zookeeper --topic topic-1 --from-beginning

Now let’s create a new replicated topic and produce some messages to it.

$KAFKA_HOME/bin/ --zookeeper --replica 3 --partition 1 --topic replicated-topic-1
$KAFKA_HOME/bin/ --broker-list --topic replicated-topic-1

Now consume the new topic from one of your other servers.

$KAFKA_HOME/bin/ --zookeeper --topic replicated-topic-1 --from-beginning

Play around with producing to different brokers and consuming with different zookeepers. Hopefully, it all works!

You can do A LOT with Zookeeper and Kafka. The purpose of this walk-through is just to get you to a point where you can be ready to explore all of Kafka’s goodness within a clustered environment. For more information, please read the documentation.


reference ->

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.