About the Hive metastore collector

Use this collector to harvest metadata for Hive metastore tables and columns across enterprise systems and make it searchable and discoverable in data.world.

The Hive Metastore collector connects directly to the Hive Metastore relational database (PostgreSQL, Oracle, MySQL/MariaDB, or Microsoft SQL Server) to harvest structural metadata — databases, schemas, tables, columns, and Hive-specific table properties — without requiring HiveServer2.

When to use this collector:

HiveServer2 is unavailable or not network-accessible
You need only structural metadata (functions and column statistics are not required)
Direct metastore access is simpler than JDBC connectivity

Important

The Hive metastore collector can be run on-premise using Docker or JAR files.

Note

The latest version of the Collector is 2.330. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Table 1.

Object	Information cataloged
Columns	Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Key type (Primary, Foreign), Column size, Column index
Table	Name, Description, Primary key, Schema, Last DDL time, Last modified, Last modified by, Row count, Total size, Raw data size, File count, Erasure coded file count, Bucketing version, Is external, External table purge, Is translated to external, Column stats accurate
Views	Name, description, SQL definition
Schema	Identifier
Database	Type, Name, Identifier, Server, Port, Environment, JDBC URL

Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 1.

Resource page	Relationship
Table	Columns, Table Indexes
Columns	Table, Table Indexes
Table Indexes	Columns
Schema	Database that contains Schema, Table that is part of Schema
Database	Schema contained in Database

Lineage for Hive Metastore

The following lineage information is collected by the Hive Metastore collector.

Table 2.

Object	Lineage available
Tables in View	The collector identifies only table-level lineage

Authentication supported

The collector supports username/password authentication to Hive metastore.

IRI consistency between Hive Metastore and Hive collectors

If you are using both the Hive Metastore collector and the Hive collector to catalog the same Hive environment, configure them with matching parameters to ensure they produce identical IRIs for the same resources. This prevents duplicate entries in data.world and maintains a unified view of your metadata.

Configuration alignment:

Table 3.

Hive Metastore collector parameter	Hive collector parameter
--hive-server-host	--server
--hive-server-port	--port
--hive-database	--database
--hive-database-id	--database-id

When these values match across both collectors, metadata from either source will resolve to the same resources in your catalog, ensuring consistent resource identification regardless of which collector harvested the metadata.

In this section:

About the Hive metastore collector

Important

Note

What is cataloged

Relationships between objects

Lineage for Hive Metastore

Important

Authentication supported

IRI consistency between Hive Metastore and Hive collectors

Search results