Tuesday, August 23, 2016

Kafka and Load Balancer

I was reviewing a Kafka - Spark Streaming application architecture for a client. The client proposed the below architecture at the Kafka producer side. 

Kafka Producer --> F5 --> Kafka Broker cluster

The Kafka Broker cluster is composed of 3 nodes and is hidden from the Kafka producer behind the F5 load balancer. Producer cannot connects to the Kafka brokers directly without going through F5. I immediately pointed out that such architecture does not work. 

There are total two steps when Kafka producer sends messages to Kafka broker. 

The first step is to retrieve the metadata information. During this step, We use configuration metadata.broker.list to pass in a list of bootstrap brokers. This list does not need to include ALL brokers in the Kafka cluster. Any broker in the cluster can retrieve metadata information. We usually recommend set at least 3 brokers in the list to achieve HA. It is OK to use a load balancer during the metadata retrieval step. 

However, once Kafka producer has the metadata information, during the second step, the producer connects to the broker directly, without F5 sitting in the middle. The producer is a smart client. For example, it uses partition key to determine the destination partition of the message. By default, a hashing-based partitioner is used to determine the partition id given the key, and people can use customized partitioners too. Hiding the whole Kakfa broker cluster behinds the firewall will defeat the purpose. 

What if the event producer side has to go through a load balancer to access the Kafka brokers? One possible solution is to build a restful service acting as Kafka producer. The event generators are going to post events to the restful service end point, which is behind a load balancer and can scale out based on the volume of the events. The restful service then sends messages to Kafka brokers directly without load balancer in the middle. If you don't feel like writing your own restful service as Kafka producer client, you can use this open source project https://github.com/confluentinc/kafka-rest. However, building a restful service is not very hard if you decide to DIY. 

No comments: