Skip to main content

About the Informatica Cloud Data Integration (CDI) collector

Important

This collector is available in Private Preview. If you would like access to this collector, please contact your Customer Success Director.

The Informatica CDI collector harvests metadata from Informatica Cloud Data Integration (CDI) such as tasks, mappings, mapplets, taskflow, workflows, and connections. The collector harvests lineage information between database tables, columns, S3 Objects, and Azure BLOBs from connections, task targets, and sources.

Note

The latest version of the Collector is 2.237. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

Cataloging of the Python transformation is not supported.

Table 1.

Object

Information cataloged

Mapping task

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Is Valid Mapping Task

Dynamic mapping task

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Is Mapping Task Valid

Masking task

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Source Object Name, Target Operation

Data transfer task

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag

Replication task

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Table Prefix

Synchronization task

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Operation, Target Object, Preprocessing Command, Postprocessing Command

Power Center task

ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Tag

Mass injection task

ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Tag

Mapping

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Is Mapping Valid

Mapplet

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time

Taskflow

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time

Workflow

ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time

Connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID

JDBC connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Host, Port, Database, Schema, DB User Name, JDBC URL

ODBC connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, ODBC Subtype, Authentication Type, Client ID, Host, Port, Database, Schema, DB User Name, JDBC URL

Amazon S3 v2 connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, ODBC Subtype, Authentication Type, Client ID, AWS Region Name, Bucket Name, S3 Account Type, IAM Role ARN

BigQuery v2 connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Project ID, Dataset Name

Flat file connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Directory, Date Format, Code Page

Azure v1 connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account Name, Client ID, Informatica Type, Informatica Subtype

Azure v2 connection

ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Tenant ID, File System Name

Job

ID v2, Title, Ended At Time, Started At Time, Was Started By, Error Message, Failed Source Rows, Failed Target Rows, Job State Code, Success Source Rows, Success Target Rows

Runtime Environment

ID v3, Title, Description, Update By, Update Time, Created By, Create Time

Secure Agent

ID v2, ID v3, Title, Description, Update By, Update Time, Agent Host, Agent Version, Platform

Shared Sequence

ID v2, ID v3, Title, Path, Description, Update By, Update Time

Saved Query

ID v2, ID v3, Title, Path, Description, Update By, Update Time

User Defined Function

ID v2, ID v3, Title, Path, Description, Update By, Update Time



Relationships between objects

By default, the data.world catalog includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. Note that the catalog presentation and relationships are fully configurable, so these will list the default configuration.

Table 2.

Resource page

Relationship

Mapping task

  • Uses: Mapping, Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment, DB Column, BigQuery column, BigQuery Table, DB Table, S3 object, and Azure BLOB

  • Generates: DB Column, BigQuery column, BigQuery Table, DB Table, S3 object, Azure BLOB, Job

  • Is used by: Taskflow

  • Has: Tag

Dynamic mapping task

  • Uses: Mapping, uses Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment

  • Generates: Job

  • Is used by: Taskflow

  • Has: Tag

Masking task

  • Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment

  • Generates: Job

  • Is used by: Taskflow

  • Has: Tag

Data transfer task

  • Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment

  • Generates: Job

  • Is used by: Taskflow

  • Has: Tag

Replication task

  • Uses: DB Columns, DB Table, Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection

  • Generates: DB Columns, DB Table, Job

  • Is used by: Taskflow

  • Has: Tag

Synchronization task

  • Uses: DB Columns, DB Table, Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection

  • Generates: DB Columns, DB Table, Job

  • Is used by: Taskflow

  • Has: Tag

Mapping

  • Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Shared Sequence, Saved Query, User Defined Function, Mapplet

  • Generates: Job

  • Is used by: Mapplet, Task

  • Has: Tag

Mapplet

  • Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Shared Sequence, Saved Query, User Defined Function, Mapplet

  • Is used by: Mapping, Mapplet

  • Has: Tag

Taskflow

  • Uses: Task

  • Has: Tag

JDBC connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)

  • Uses: Runtime Environment

  • Is used by: Mapping Task, Dynamic mapping Task, Masking Task, Data transfer Task, Replication Task, Synchronization Task, Mapping

ODBC connection

  • Uses: Runtime Environment

  • Is used by: Mapping Task, Mapping

Amazon S3 v2 connection

  • Uses: Runtime Environment

  • Is used by: Mapping Task, Mapping

BigQuery connection

  • Uses: Runtime Environment

  • Is used by: Mapping Task, Mapping, Data transfer Task

Flat file connection

  • Uses: Runtime Environment

  • Is used by: Mapping Task, Mapping

Azure connection

  • Uses: Runtime Environment

  • Is used by: Mapping Task, Mapping

Job

  • Is run on: Runtime Environment

  • Is associated with: Mapping Task, Masking Task, Data transfer Task, Replication Task, Synchronization Task, Mapping

Runtime Environment

  • Contains: Secure Agent

  • Runs: Job

  • Is used by: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Mapping Task, Masking Task, Data transfer Task, Replication Task, Synchronization Task, Mapping

Secure Agent

  • Is part of: Runtime Environment

Shared Sequence

  • Is used by: Mapping, Mapplet

Saved Query

  • Is used by: Mapping, Mapplet

User Defined Function

  • Is used by: Mapping, Mapplet

DB Column

  • Is used by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task

  • Was generated by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task

DB Table

  • Is used by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task

  • Was generated by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task

BigQuery column

  • Is used by: Mapping Task, Is used by Mapping Task

  • Was generated by: Mapping Task, Is used by Mapping Task

BigQuery Table

  • Is used by: Mapping Task, Is used by Mapping Task

  • Was generated by: Mapping Task, Is used by Mapping Task

S3 object

  • Is used by: Mapping Task

  • Was generated by: Mapping Task

Azure BLOB

  • Is used by: Mapping Task

  • Was generated by: Mapping Task



Relationships between tasks and objects

Relationships for Sources in Mapping tasks

The following table outlines the types of relationships that can be established between tasks and data sources in mapping tasks. It specifies the supported connection types (e.g., JDBC, BigQuery, Azure, S3, Salesforce), the levels at which tasks are associated with these sources, and any conditions or exceptions, such as sorting/filtering rules.

Table 3.

Connection type

Query source type

Single object source type

Multiple object source type

JDBC Connections

Table level (task is associated with table)

  • Table level (task is associated with table).

    Note: Skipped if sorting and/or filtering are used.

  • Column level (task is associated with column).

    Note: Created only if sorting and/or filtering are used.

  • Table level (task is associated with table).

    Note: Skipped if sorting and/or filtering are used.

  • Column level (task is associated with column).

    Note: Created only if sorting and/or filtering are used.

  • Both regular and configured relationships are supported.

BigQuery Connection

Table level (task is associated with BigQuery table)

  • Table level (task is associated with BigQuery table).

    Note: Skipped if sorting and/or filtering are used.

  • Column level (task is associated with column).

    Note: Created only if sorting and/or filtering are used.

  • Table level (task is associated with table).

    Note: Skipped if sorting and/or filtering are used.

  • Column level (task is associated with column).

    Note: Created only if sorting and/or filtering are used.

  • Both regular and configured relationships are supported.

Azure Connection

Not supported in Informatica CDI

  • BLOB level (task is associated with Azure BLOB)

Not supported in Informatica CDI

S3 Connection

Not supported in Informatica CDI

  • S3 Object level (task is associated with S3 Object)

Not supported in Informatica CDI

Salesforce Connection

Not supported by Collector

Not supported by Collector

Not supported by Collector



Relationships for Targets in Mapping tasks

The following table details the connections and operations supported for mapping tasks, specifically for different types of data sources (e.g., JDBC, BigQuery, Azure, S3, Salesforce). It outlines how tasks are associated with new and existing objects during various operations like Insert, Update, Upsert, Delete, and Data-Driven, and specifies conditions where certain associations are created or skipped.

Table 4.

Connection type

New object. Operation: Insert

Existing object. Operation: Insert

Existing object. Operation: Update, Upsert, Delete, and Data-Driven

JDBC Connections

  • Table level (table was generated by task)

  • Table level (task is associated with table)

  • Table level (task is associated with table).

    Note: Skipped if target update table columns are specified.

  • Column level (task is associated with column).

    Note: Created only if target update table columns are specified

BigQuery Connection

  • Table level (BigQuery table was generated by task

  • Table level (task is associated with BigQuery table)

  • Table level (task is associated with BigQuery table).

    Note: Skipped if target update table columns are specified.

  • Column level (task is associated with column).

    Note: Created only if target update table columns are specified

Azure Connection

  • BLOB level (Azure BLOB was generated by task)

  • BLOB level (task is associated with Azure BLOB)

Not supported in Informatica CDI

S3 Connection

  • S3 Object level (S3 Object was generated by task)

  • S3 Object level (task is associated with S3 Object)

  • S3 Object level (task is associated with S3 Object)

Salesforce Connection

Not supported by Collector

Not supported by Collector

Not supported by Collector



Relationships for Sources in Replication tasks

The following table outlines how relationships are established between tasks and data sources in replication tasks, specifically for different connection types (JDBC, BigQuery, Azure, S3, Salesforce). It describes the conditions under which table-level and column-level relationships are created or skipped, with specific notes for unsupported connection types in Informatica CDI.

Table 5.

Connection type

Relationship

JDBC Connections

If the task property replicate all objects is set to false, the list of source tables is available. Some table columns are used as filters, and the list of table columns is available in this case. Columns marked as excluded in task properties are excluded from the list of table columns to collect in the catalog.

  • Table level (task is associated with table) relationship is created.

    Note: Skipped if column names list is available and only table name is available.

  • Column level (task is associated with column).

    Note: Create only if the list of column names is available.

BigQuery Connection

Not supported in Informatica CDI

Azure Connection

Not supported in Informatica CDI

S3 Connection

Not supported in Informatica CDI

Salesforce Connection

Not supported by Collector



Relationships for Targets in Replication tasks

The following table details how relationships are established between tasks and target data sources in replication tasks. It specifies the conditions under which table-level and column-level relationships are created for various connection types, and notes any unsupported connections in Informatica CDI and Salesforce.

Table 6.

Connection type

Relationship

JDBC Connections

If the task property replicate all objects is set to false, the list of source tables is available. Some table columns are used as filters, and the list of table columns is available in this case. Columns marked as excluded in task properties are excluded from the list of table columns to collect in the catalog.

  • Table level (task is associated with table) relationship is created.

    Note: Skipped if column names list is available and only table name is available.

  • Column level (task is associated with column).

    Note: Create only if the list of column names is available.

BigQuery Connection

Not supported in Informatica CDI

Azure Connection

Not supported in Informatica CDI

S3 Connection

Not supported in Informatica CDI

Salesforce Connection

Not supported by Collector



Relationships for Sources in Synchronization tasks

The following table explains how relationships are established for sources in synchronization tasks, detailing the support level for various connection types (JDBC, BigQuery, Azure, S3, Salesforce). It specifies whether saved query sources, single object sources, and multiple object sources are supported, highlighting that most connection types are not supported for synchronization tasks within Informatica CDI and Salesforce.

Table 7.

Connection type

Saved query source type

Single object source type

Multiple object source type

JDBC Connections

Not supported by Collector

Column level (task is associated with column)

Column level (task is associated with column)

BigQuery Connection

Not supported in Informatica CDI

Not supported in Informatica CDI

Not supported in Informatica CDI

Azure Connection

Not supported in Informatica CDI

Not supported in Informatica CDI

Not supported in Informatica CDI

S3 Connection

Not supported in Informatica CDI

Not supported in Informatica CDI

Not supported in Informatica CDI

Salesforce Connection

Not supported by Collector

Not supported by Collector

Not supported by Collector



Relationships for Targets in Synchronization tasks

The following table details how relationships are established between tasks and target data sources in synchronization tasks for various connection types.

Table 8.

Connection type

Relationship

JDBC Connections

Column level (task is associated with column)

BigQuery Connection

Not supported in Informatica CDI

Azure Connection

Not supported in Informatica CDI

S3 Connection

Not supported in Informatica CDI

Salesforce Connection

Not supported by Collector



Lineage for Informatica CDI

Table 9.

Object

Lineage available

DB Column

  • Synchronization task: The collector identifies the target DB columns sourced data from source DB columns.

    Note: When the source and target are JDBC Connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)

  • Replication tasks: The collector identifies the target DB columns sourced data from source DB columns.

    Note: When the source and target are JDBC Connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)



Authentication supported

  • The Informatica CDI collector supports username and password authentication.