Preparing to run the Confluent Platform Collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software | |
Docker | Click here to get Docker. |
data.world specific objects | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Setting up Kafka - Confluent Platform
Setting up a user
For the collector to access your Confluent Platform, work with your Confluent Platform administrator to set up a user that the collector will use to authenticate to Confluent.
Setting up Schema Registry specific resource API key
For the collector to access your Confluent Schema Registry, you will need to set up the standard Confluent Schema Registry API. Each key is valid for one specific Schema Registry. This is optional and needs to be set only if you want the the collector to harvest information from Confluent Schema Registry. Set up the API key using the instructions here.
Setting up permissions for topics and topic procedures
Review the Confluent permissions for Confluent resources.
In order for the Kafka collector to harvest metadata about a topic (including partitions, consumers, consumer groups, and schemas) the cluster user passed to the collector must have DESCRIBE permission on the topic. If the user does not have the DESCRIBE permission, the collector doesn't see that topic and cannot write any information for the topic to the catalog graph.
In order for the collector to harvest information about a topic’s producers, the cluster user passed to the collector must have READ permission on the topic. If the user lacks READ permission, the collector will write a warning message indicating that producer information cannot be harvested. If a user has READ permission for a topic, that user automatically has DESCRIBE permission as well.