Skip to main content

About the Kafka - Confluent Cloud collector

Use this collector to harvest metadata from Confluent Cloud. The collector harvests metadata from a Kafka cluster running in Confluent Cloud. The collector can optionally harvest metadata from Avro, json-schema, and Protobuf schemas stored in Confluent Schema Registry.

Important

The Kafka - Confluent Cloud collector can be run in the Cloud or on-premise using Docker or Jar files.

What is cataloged

The collector catalogs the following information.

Important

Note that the collectors only harvest schemas in the Confluent Schema Registry registered under a subject that matches a topic’s key or value, according to the default TopicNameStrategy naming strategy, described in the Confluent Schema Registry documentation. Schemas in the schema registry registered under other subjects are not currently harvested.

Table 1.

Object

Information cataloged

Cluster

  • Identifier, Display name

Producer

  • Identifier

Consumer

  • Identifier, Client ID, Client host

Broker

  • Identifier, Display name, Host, Port, Rack

Partition

  • Partition number

Schema

  • Identifier, Title, Is Current Schema, Schema Version, Type (avro, json, producers), Schema text

Consumer Group

  • Identifier, State, Partition assignor

Topic

  • Name, Identifier, Is internal (whether the topic is internal)

Environment

  • Identifier, Display name



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page

Relationship

Cluster

  • Brokers within Cluster

  • Topics hosted by Cluster

Producer

  • Partition that receive messages from Producers

Consumer Group

  • Consumers that are members of Consumer Group

Consumer

  • Partition that Consumer is assigned to

Broker

  • Cluster containing Broker

  • Partitions having replicates on Broker

Topic

  • Cluster hosting Topic

  • Partitions that segment Topic

  • Schema that constrains this Topic’s values and keys

Partition

  • Topic segmented into this Partition

  • Consumer that is assigned to Partition

  • Broker that is replica for Partition

Schema

  • Topic key and value constrained by this schema

  • Other schemas related to this schema

Environment

  • Clusters provisioned within this environment



Versions supported

  • The collector supports V2 of the Cluster APIs and Confluent Cloud Organization.

Authentication supported

  • The collector authenticates to a Kafka cluster using Simple Authentication and Security Layer (SASL), with a username/password credential. By default, the collector assumes that SASL is used over Secure Sockets Layer (SSL). In cases where SSL is disabled (for example, internal test clusters in Kafka), you can disable SSL for the collector. Consult the Apache Kafka documentation for more information on Kafka security.