Skip to main content

About the Amazon Managed Streaming for Kafka collector

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a managed service offered by Amazon that provides Apache Kafka clusters on AWS infrastructure. Use this collector to harvest metadata from Amazon MSK clusters.

Important

The Amazon MSK collector can be run on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.200. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Table 1.

Object

Information cataloged

Cluster

  • Identifier, Display name

Producer

  • Identifier

Consumer

  • Identifier, Client ID, Client host

Broker

  • Identifier, Display name, Host, Port, Rack

Partition

  • Partition number

Consumer Group

  • Identifier, State, Partition assignor

Topic

  • Name, Identifier, Is internal (whether the topic is internal)



Note

The collector can only harvest schema registry information from Confluent Schema Registry. Other schema registry implementations, such as AWS Glue, are not supported.

Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page

Relationship

Cluster

  • Brokers within Cluster

  • Topics hosted by Cluster

Producer

  • Partition that receive messages from Producers

Consumer Group

  • Consumers that are members of Consumer Group

Consumer

  • Partition that Consumer is assigned to

Broker

  • Cluster containing Broker

  • Partitions having replicates on Broker

Topic

  • Cluster hosting Topic

  • Partitions that segment Topic

Partition

  • Topic segmented into this Partition

  • Consumer that is assigned to Partition

  • Broker that is replica for Partition



Versions supported

  • The collector currently supports version 2.7.x and above of Apache Kafka (limited support) and 3.3.x and above of Apache Kafka (full support).

    Note

    Limited support means that Kafka versions prior to 3.x do not guarantee uniqueness of topic identifiers, so the collector relies upon the topic name to create the distinct catalog resource identifier for each topic.

Authentication supported

  • The collector supports authentication to a Kafka cluster using Simple Authentication and Security Layer (SASL), with a username/password credential. For SASL, the collector supports both PLAIN and SCRAM-SHA-512 authentication mechanism.

    By default, the collector assumes that SASL is used over Secure Sockets Layer (SSL). In cases where SSL is disabled (for example, internal test clusters in Kafka), you can disable SSL for the collector. Consult the Apache Kafka documentation for more information on Kafka security.

  • The collector will not authenticate to clusters that use any other authentication mechanism (e.g., IAM authentication on Amazon MSK, IAM access control - Amazon Managed Streaming for Apache Kafka).