About the Informatica Cloud Data Integration (CDI) collector

Warning

This collector is in public preview. It has passed our standard testing, but it is not yet widely adopted. You might encounter unforeseen edge cases in your environment. data.world is committed to promptly addressing any issues with public preview collectors. If you face any problems, please report them through your Customer Success Director, implementation team, or support team for assistance.

The Informatica CDI collector harvests metadata from Informatica Cloud Data Integration (CDI) such as tasks, mappings, mapplets, taskflow, workflows, and connections. The collector harvests lineage information between database tables, columns, S3 Objects, and Azure BLOBs from connections, task targets, and sources.

Note

The latest version of the Collector is 2.330. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

Cataloging of the Python transformation is not supported.

Table 1.

Object	Information cataloged
Mapping task	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Is Valid Mapping Task
Dynamic mapping task	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Is Mapping Task Valid
Masking task	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Source Object Name, Target Operation
Data transfer task	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag
Replication task	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Table Prefix
Synchronization task	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Operation, Target Object, Preprocessing Command, Postprocessing Command
Power Center task	ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Tag
Mass injection task	ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Tag
Mapping	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Is Mapping Valid
Mapplet	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time
Taskflow	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time
Workflow	ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time
Connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID
JDBC connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Host, Port, Database, Schema, DB User Name, JDBC URL
ODBC connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, ODBC Subtype, Authentication Type, Client ID, Host, Port, Database, Schema, DB User Name, JDBC URL
Amazon S3 v2 connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, ODBC Subtype, Authentication Type, Client ID, AWS Region Name, Bucket Name, S3 Account Type, IAM Role ARN
BigQuery v2 connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Project ID, Dataset Name
Flat file connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Directory, Date Format, Code Page
Azure v1 connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account Name, Client ID, Informatica Type, Informatica Subtype
Azure v2 connection	ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Tenant ID, File System Name
Job	ID v2, Title, Ended At Time, Started At Time, Was Started By, Error Message, Failed Source Rows, Failed Target Rows, Job State Code, Success Source Rows, Success Target Rows
Runtime Environment	ID v3, Title, Description, Update By, Update Time, Created By, Create Time
Secure Agent	ID v2, ID v3, Title, Description, Update By, Update Time, Agent Host, Agent Version, Platform
Shared Sequence	ID v2, ID v3, Title, Path, Description, Update By, Update Time
Saved Query	ID v2, ID v3, Title, Path, Description, Update By, Update Time
User Defined Function	ID v2, ID v3, Title, Path, Description, Update By, Update Time

Relationships between objects

By default, the data.world catalog includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. Note that the catalog presentation and relationships are fully configurable, so these will list the default configuration.

Table 2.

Resource page	Relationship
Mapping task	Uses: Mapping, Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment, DB Column, BigQuery column, BigQuery Table, DB Table, S3 object, and Azure BLOB Generates: DB Column, BigQuery column, BigQuery Table, DB Table, S3 object, Azure BLOB, Job Is used by: Taskflow Has: Tag
Dynamic mapping task	Uses: Mapping, uses Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment Generates: Job Is used by: Taskflow Has: Tag
Masking task	Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment Generates: Job Is used by: Taskflow Has: Tag
Data transfer task	Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Runtime Environment Generates: Job Is used by: Taskflow Has: Tag
Replication task	Uses: DB Columns, DB Table, Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection Generates: DB Columns, DB Table, Job Is used by: Taskflow Has: Tag
Synchronization task	Uses: DB Columns, DB Table, Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection Generates: DB Columns, DB Table, Job Is used by: Taskflow Has: Tag
Mapping	Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Shared Sequence, Saved Query, User Defined Function, Mapplet Generates: Job Is used by: Mapplet, Task Has: Tag
Mapplet	Uses: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Shared Sequence, Saved Query, User Defined Function, Mapplet Is used by: Mapping, Mapplet Has: Tag
Taskflow	Uses: Task Has: Tag
JDBC connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)	Uses: Runtime Environment Is used by: Mapping Task, Dynamic mapping Task, Masking Task, Data transfer Task, Replication Task, Synchronization Task, Mapping
ODBC connection	Uses: Runtime Environment Is used by: Mapping Task, Mapping
Amazon S3 v2 connection	Uses: Runtime Environment Is used by: Mapping Task, Mapping
BigQuery connection	Uses: Runtime Environment Is used by: Mapping Task, Mapping, Data transfer Task
Flat file connection	Uses: Runtime Environment Is used by: Mapping Task, Mapping
Azure connection	Uses: Runtime Environment Is used by: Mapping Task, Mapping
Job	Is run on: Runtime Environment Is associated with: Mapping Task, Masking Task, Data transfer Task, Replication Task, Synchronization Task, Mapping
Runtime Environment	Contains: Secure Agent Runs: Job Is used by: Connection, JDBC Connection, Azure Connection, BigQuery Connection, Flat file connection, AWS S3 Connection, Mapping Task, Masking Task, Data transfer Task, Replication Task, Synchronization Task, Mapping
Secure Agent	Is part of: Runtime Environment
Shared Sequence	Is used by: Mapping, Mapplet
Saved Query	Is used by: Mapping, Mapplet
User Defined Function	Is used by: Mapping, Mapplet
DB Column	Is used by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task Was generated by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task
DB Table	Is used by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task Was generated by: Mapping Task, Data transfer Task, Replication Task, and Synchronization Task
BigQuery column	Is used by: Mapping Task, Is used by Mapping Task Was generated by: Mapping Task, Is used by Mapping Task
BigQuery Table	Is used by: Mapping Task, Is used by Mapping Task Was generated by: Mapping Task, Is used by Mapping Task
S3 object	Is used by: Mapping Task Was generated by: Mapping Task
Azure BLOB	Is used by: Mapping Task Was generated by: Mapping Task

Relationships between tasks and objects

Relationships for Sources in Mapping tasks

The following table outlines the types of relationships that can be established between tasks and data sources in mapping tasks. It specifies the supported connection types (e.g., JDBC, BigQuery, Azure, S3, Salesforce), the levels at which tasks are associated with these sources, and any conditions or exceptions, such as sorting/filtering rules.

Table 3.

Connection type	Query source type	Single object source type	Multiple object source type
JDBC Connections	Table level (task is associated with table)	Table level (task is associated with table). Note: Skipped if sorting and/or filtering are used. Column level (task is associated with column). Note: Created only if sorting and/or filtering are used.	Table level (task is associated with table). Note: Skipped if sorting and/or filtering are used. Column level (task is associated with column). Note: Created only if sorting and/or filtering are used. Both regular and configured relationships are supported.
BigQuery Connection	Table level (task is associated with BigQuery table)	Table level (task is associated with BigQuery table). Note: Skipped if sorting and/or filtering are used. Column level (task is associated with column). Note: Created only if sorting and/or filtering are used.	Table level (task is associated with table). Note: Skipped if sorting and/or filtering are used. Column level (task is associated with column). Note: Created only if sorting and/or filtering are used. Both regular and configured relationships are supported.
Azure Connection	Not supported in Informatica CDI	BLOB level (task is associated with Azure BLOB)	Not supported in Informatica CDI
S3 Connection	Not supported in Informatica CDI	S3 Object level (task is associated with S3 Object)	Not supported in Informatica CDI
Salesforce Connection	Not supported by Collector	Not supported by Collector	Not supported by Collector

Relationships for Targets in Mapping tasks

The following table details the connections and operations supported for mapping tasks, specifically for different types of data sources (e.g., JDBC, BigQuery, Azure, S3, Salesforce). It outlines how tasks are associated with new and existing objects during various operations like Insert, Update, Upsert, Delete, and Data-Driven, and specifies conditions where certain associations are created or skipped.

Table 4.

Connection type	New object. Operation: Insert	Existing object. Operation: Insert	Existing object. Operation: Update, Upsert, Delete, and Data-Driven
JDBC Connections	Table level (table was generated by task)	Table level (task is associated with table)	Table level (task is associated with table). Note: Skipped if target update table columns are specified. Column level (task is associated with column). Note: Created only if target update table columns are specified
BigQuery Connection	Table level (BigQuery table was generated by task	Table level (task is associated with BigQuery table)	Table level (task is associated with BigQuery table). Note: Skipped if target update table columns are specified. Column level (task is associated with column). Note: Created only if target update table columns are specified
Azure Connection	BLOB level (Azure BLOB was generated by task)	BLOB level (task is associated with Azure BLOB)	Not supported in Informatica CDI
S3 Connection	S3 Object level (S3 Object was generated by task)	S3 Object level (task is associated with S3 Object)	S3 Object level (task is associated with S3 Object)
Salesforce Connection	Not supported by Collector	Not supported by Collector	Not supported by Collector

Relationships for Sources in Replication tasks

The following table outlines how relationships are established between tasks and data sources in replication tasks, specifically for different connection types (JDBC, BigQuery, Azure, S3, Salesforce). It describes the conditions under which table-level and column-level relationships are created or skipped, with specific notes for unsupported connection types in Informatica CDI.

Table 5.

Connection type	Relationship
JDBC Connections	If the task property replicate all objects is set to false, the list of source tables is available. Some table columns are used as filters, and the list of table columns is available in this case. Columns marked as excluded in task properties are excluded from the list of table columns to collect in the catalog. Table level (task is associated with table) relationship is created. Note: Skipped if column names list is available and only table name is available. Column level (task is associated with column). Note: Create only if the list of column names is available.
BigQuery Connection	Not supported in Informatica CDI
Azure Connection	Not supported in Informatica CDI
S3 Connection	Not supported in Informatica CDI
Salesforce Connection	Not supported by Collector

Relationships for Targets in Replication tasks

The following table details how relationships are established between tasks and target data sources in replication tasks. It specifies the conditions under which table-level and column-level relationships are created for various connection types, and notes any unsupported connections in Informatica CDI and Salesforce.

Table 6.

Connection type	Relationship
JDBC Connections	If the task property replicate all objects is set to false, the list of source tables is available. Some table columns are used as filters, and the list of table columns is available in this case. Columns marked as excluded in task properties are excluded from the list of table columns to collect in the catalog. Table level (task is associated with table) relationship is created. Note: Skipped if column names list is available and only table name is available. Column level (task is associated with column). Note: Create only if the list of column names is available.
BigQuery Connection	Not supported in Informatica CDI
Azure Connection	Not supported in Informatica CDI
S3 Connection	Not supported in Informatica CDI
Salesforce Connection	Not supported by Collector

Relationships for Sources in Synchronization tasks

The following table explains how relationships are established for sources in synchronization tasks, detailing the support level for various connection types (JDBC, BigQuery, Azure, S3, Salesforce). It specifies whether saved query sources, single object sources, and multiple object sources are supported, highlighting that most connection types are not supported for synchronization tasks within Informatica CDI and Salesforce.

Table 7.

Connection type	Saved query source type	Single object source type	Multiple object source type
JDBC Connections	Not supported by Collector	Column level (task is associated with column)	Column level (task is associated with column)
BigQuery Connection	Not supported in Informatica CDI	Not supported in Informatica CDI	Not supported in Informatica CDI
Azure Connection	Not supported in Informatica CDI	Not supported in Informatica CDI	Not supported in Informatica CDI
S3 Connection	Not supported in Informatica CDI	Not supported in Informatica CDI	Not supported in Informatica CDI
Salesforce Connection	Not supported by Collector	Not supported by Collector	Not supported by Collector

Relationships for Targets in Synchronization tasks

The following table details how relationships are established between tasks and target data sources in synchronization tasks for various connection types.

Table 8.

Connection type	Relationship
JDBC Connections	Column level (task is associated with column)
BigQuery Connection	Not supported in Informatica CDI
Azure Connection	Not supported in Informatica CDI
S3 Connection	Not supported in Informatica CDI
Salesforce Connection	Not supported by Collector

Lineage for Informatica CDI

Table 9.

Object	Lineage available
DB Column	Synchronization task: The collector identifies the target DB columns sourced data from source DB columns. Note: When the source and target are JDBC Connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata) Replication tasks: The collector identifies the target DB columns sourced data from source DB columns. Note: When the source and target are JDBC Connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)

Object

Lineage available

DB Column

Synchronization task: The collector identifies the target DB columns sourced data from source DB columns.
Note: When the source and target are JDBC Connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)
Replication tasks: The collector identifies the target DB columns sourced data from source DB columns.
Note: When the source and target are JDBC Connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata)

Authentication supported

The Informatica CDI collector supports username and password authentication.

In this section:

About the Informatica Cloud Data Integration (CDI) collector

Warning

Note

What is cataloged

Note

Relationships between objects

Relationships between tasks and objects

Relationships for Sources in Mapping tasks

Relationships for Targets in Mapping tasks

Relationships for Sources in Replication tasks

Relationships for Targets in Replication tasks

Relationships for Sources in Synchronization tasks

Relationships for Targets in Synchronization tasks

Lineage for Informatica CDI

Important

Authentication supported

Search results