Adventures with Kafka

Published: Aug 22, 2025

Recently, I had to solve a rather curious problem with connecting to Kafka. There was a need to connect to an external Kafka cluster from a secure environment. Three nodes of the cluster are provided as input to the Python client, and their addresses are known. The pods in the secure environment do not have direct access to the internet. All HTTP requests are routed through a specially set up proxy. However, Kafka does not operate over HTTP. For such purposes, the proxy has a special mode - the host and port that need to be tunneled are passed in the HTTP headers, and after the header exchange, the HTTP connection remains active. The socket from this HTTP connection can then be used as a TCP tunnel to the specified port. However, there is a problem - the Kafka client cannot be provided with a socket for connection.

Solution to the tunnel issue

Within the same Python process, a socket is raised that listens on a local port and the loopback interface, and proxies all traffic to the HTTP socket obtained from the external proxy. The Kafka client, in turn, connects to the local port.

However, it's not enough to be connected to just one node of the cluster. When the client connects to any of the nodes, it receives the current topology of the cluster and for stable operation, it must be able to connect to any of the nodes. One tunnel is not sufficient here. It is not a problem to open a tunnel to each of the target hosts, but at the same time, it is necessary to somehow communicate these tunnels to the Kafka client.

Solution to the problem with multiple nodes

At the transport level, we cannot organize routing by domain, but we need to connect to the same loopback interface. On the other hand, nothing prevents us from creating new virtual loopback interfaces and assigning each of them to one of the domains in the Kafka cluster in the hosts file. After that, for the Kafka client, everything starts to work in a completely transparent manner.