Preparing to run the Hive metastore collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software | |
Docker | Click here to get Docker. |
data.world specific objects | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Capturing table metadata properties from the Hive metastore
The Hive Collector has the ability to capture the table metadata properties while also harvesting other valuable table-level metadata from the Hive metastore. To catalog information from the metastore you need to use the following Collector parameters:
--hive-metastore-jdbc-url=<hiveMetastoreJdbcUrl>
- The JDBC URL for the Hive Metastore database. The value you should pass is the same value you specify for javax.jdo.option.ConnectionURL in their Hive config.--hive-metastore-password=<hiveMetastorePassword>
- The password to use in authenticating to the Hive Metastore database.--hive-metastore-user=<hiveMetastoreUser>
- The user to use in authenticating to the Hive Metastore.
You must pass all three --hive-metastore
options for the collector to attempt to harvest anything from the hive metastore. if --hive-metastore-jdbc-url
isn’t passed, the collector will write a warning and harvest the standard jdbc collector content--it won’t prevent cataloging the basic jdbc db/schema/table/column objects, it just won’t get the table-level metadata from the metastore.
Important
Make sure to supply a jdbc driver for the specific database as needed. In particular, if your metastore db is oracle or mysql, you will need to put the driver jar in the jdbc drivers directory (just as you would if you were running those databases’ collectors). If your metastore db is postgres, derby, or sql server, we ship the necessary drivers with the data.world COllector.