Skip to main content

Preparing to run the Kafka - Confluent Cloud collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item

Requirement

Hardware

RAM

8 GB

CPU

2 Ghz processor

Software

Docker

Click here to get Docker.

Java Runtime Environment

OpenJDK 17 is supported and available here.

data.world specific objects

Dataset

You must have a ddw-catalogs (or other) dataset set up to hold your catalog files when you are done running the collector.



Setting up Kafka - Confluent Cloud

Setting up Kafka-specific resource API key

For the collector to access your Kafka clusters and Kafka resources, you will need to set up Kafka API keys. Each key is valid only for one specific Kafka cluster.

  • From Confluent Cloud, navigate to the cluster you want to add the API key to using the instructions here.

Setting up Schema Registry-specific resource API key

For the collector to access your Confluent Cloud Schema Registry, you will need to set up Confluent Schema Registry API keys. Each key is valid for one specific Schema Registry. This is optional and needs to be set only if you want the the collector to harvest information from Confluent Schema Registry.

  • From Confluent Cloud, navigate to the environment containing your schema registry. Add the API key using the instructions here.

Setting up permissions for topics and topic procedures

Review the Confluent permissions for Confluent resources.

  • In order for the Kafka collector to harvest metadata about a topic (including partitions, consumers, consumer groups, and schemas) the cluster user passed to the collector must have DESCRIBE permission on the topic. If the user does not have the DESCRIBE permission, the collector doesn't see that topic and cannot write any information for the topic to the catalog graph.

  • In order for the collector to harvest information about a topic’s producers, the cluster user passed to the collector must have READ permission on the topic. If the user lacks READ permission, the collector will write a warning message indicating that producer information cannot be harvested. If a user has READ permission for a topic, that user automatically has DESCRIBE permission as well.