Skip to main content

About the Qlik Talend Data Integration collector

Warning

This collector is in public preview. It has passed our standard testing, but it is not yet widely adopted. You might encounter unforeseen edge cases in your environment. data.world is committed to promptly addressing any issues with public preview collectors. If you face any problems, please report them through your Customer Success Director, implementation team, or support team for assistance.

Use this collector to harvest lineage within Talend jobs. Users run jobs from Talend Studio, which generates job files (.properties and .item files). The user must specify the talend workspace location of the job files (.properties and .item) to the collector. The collector scans the workspace location and parses the job .properties file to harvest job properties and .item file to harvest lineage. The collector should be running on the same system where the talend workspace with job files (.properties and .item) is located.

Important

The Talend collector can be run on-premise using Docker or JAR files.

Note

The latest version of the Collector is 2.258. To view the release notes for this version and all previous versions, please go here.

What is cataloged?

The collector catalogs following information.

Table 1.

Object

Information cataloged

Job

  • Name

  • ID

  • Created At

  • Modified At



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page

Relationship

Job

  • Generates Database table

  • Generates files in local system

  • Generates Amazon S3 files

Database Table

  • Was generated by Job

Amazon S3 files

  • Was generated by Job

Files in local system

  • Was generated by Job



Lineage for Talend

The collector catalogs the following lineage information.

Table 3.

Object

Lineage available

Database (Relational database)

  • Sourced data from database (relational database), AWS S3, and files in the local file system

Amazon S3

  • Sourced data from database (relational database), AWS S3, and files in the local file system

Files in the local file system

  • Sourced data from database (relational database), AWS S3, and files in the local file system