About the Informatica Cloud Data Integration (CDI) collector
Important
This collector is available in Private Preview. If you would like access to this collector, please contact your Customer Success Director.
The Informatica CDI collector harvests metadata from Informatica Cloud Data Integration (CDI) such as tasks, mappings, mapplets, taskflow, workflows, and connections. The collector harvests lineage information between database tables, columns, S3 Objects, and Azure BLOBs from connections, task targets, and sources.
Note
The latest version of the Collector is 2.239. To view the release notes for this version and all previous versions, please go here.
What is cataloged
The collector catalogs the following information.
Note
Cataloging of the Python transformation is not supported.
Object | Information cataloged |
---|---|
Mapping task | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Is Valid Mapping Task |
Dynamic mapping task | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Is Mapping Task Valid |
Masking task | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Source Object Name, Target Operation |
Data transfer task | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag |
Replication task | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Table Prefix |
Synchronization task | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Parameter File Name, Parameter File Type, Tag, Operation, Target Object, Preprocessing Command, Postprocessing Command |
Power Center task | ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Tag |
Mass injection task | ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Tag |
Mapping | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time, Is Mapping Valid |
Mapplet | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time |
Taskflow | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time |
Workflow | ID v2, ID v3, Title, Description, Path, Description, Update By, Update Time, Created By, Create Time |
Connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID |
JDBC connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata) | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Host, Port, Database, Schema, DB User Name, JDBC URL |
ODBC connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, ODBC Subtype, Authentication Type, Client ID, Host, Port, Database, Schema, DB User Name, JDBC URL |
Amazon S3 v2 connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, ODBC Subtype, Authentication Type, Client ID, AWS Region Name, Bucket Name, S3 Account Type, IAM Role ARN |
BigQuery v2 connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Project ID, Dataset Name |
Flat file connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Directory, Date Format, Code Page |
Azure v1 connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account Name, Client ID, Informatica Type, Informatica Subtype |
Azure v2 connection | ID v2, ID v3, Title, Description, Update By, Update Time, Created By, Create Time, Service URL, Account User Name, Account Name, Informatica Type, Informatica Subtype, Authentication Type, Client ID, Tenant ID, File System Name |
Job | ID v2, Title, Ended At Time, Started At Time, Was Started By, Error Message, Failed Source Rows, Failed Target Rows, Job State Code, Success Source Rows, Success Target Rows |
Runtime Environment | ID v3, Title, Description, Update By, Update Time, Created By, Create Time |
Secure Agent | ID v2, ID v3, Title, Description, Update By, Update Time, Agent Host, Agent Version, Platform |
Shared Sequence | ID v2, ID v3, Title, Path, Description, Update By, Update Time |
Saved Query | ID v2, ID v3, Title, Path, Description, Update By, Update Time |
User Defined Function | ID v2, ID v3, Title, Path, Description, Update By, Update Time |
Relationships between objects
By default, the data.world catalog includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. Note that the catalog presentation and relationships are fully configurable, so these will list the default configuration.
Resource page | Relationship |
---|---|
Mapping task |
|
Dynamic mapping task |
|
Masking task |
|
Data transfer task |
|
Replication task |
|
Synchronization task |
|
Mapping |
|
Mapplet |
|
Taskflow |
|
JDBC connection (Redshift, MS SQL Server, Netezza, Snowflake, MySQL, Oracle, Postgres, Teradata) |
|
ODBC connection |
|
Amazon S3 v2 connection |
|
BigQuery connection |
|
Flat file connection |
|
Azure connection |
|
Job |
|
Runtime Environment |
|
Secure Agent |
|
Shared Sequence |
|
Saved Query |
|
User Defined Function |
|
DB Column |
|
DB Table |
|
BigQuery column |
|
BigQuery Table |
|
S3 object |
|
Azure BLOB |
|
Relationships between tasks and objects
Relationships for Sources in Mapping tasks
The following table outlines the types of relationships that can be established between tasks and data sources in mapping tasks. It specifies the supported connection types (e.g., JDBC, BigQuery, Azure, S3, Salesforce), the levels at which tasks are associated with these sources, and any conditions or exceptions, such as sorting/filtering rules.
Connection type | Query source type | Single object source type | Multiple object source type |
---|---|---|---|
JDBC Connections | Table level (task is associated with table) |
|
|
BigQuery Connection | Table level (task is associated with BigQuery table) |
|
|
Azure Connection | Not supported in Informatica CDI |
| Not supported in Informatica CDI |
S3 Connection | Not supported in Informatica CDI |
| Not supported in Informatica CDI |
Salesforce Connection | Not supported by Collector | Not supported by Collector | Not supported by Collector |
Relationships for Targets in Mapping tasks
The following table details the connections and operations supported for mapping tasks, specifically for different types of data sources (e.g., JDBC, BigQuery, Azure, S3, Salesforce). It outlines how tasks are associated with new and existing objects during various operations like Insert, Update, Upsert, Delete, and Data-Driven, and specifies conditions where certain associations are created or skipped.
Connection type | New object. Operation: Insert | Existing object. Operation: Insert | Existing object. Operation: Update, Upsert, Delete, and Data-Driven |
---|---|---|---|
JDBC Connections |
|
|
|
BigQuery Connection |
|
|
|
Azure Connection |
|
| Not supported in Informatica CDI |
S3 Connection |
|
|
|
Salesforce Connection | Not supported by Collector | Not supported by Collector | Not supported by Collector |
Relationships for Sources in Replication tasks
The following table outlines how relationships are established between tasks and data sources in replication tasks, specifically for different connection types (JDBC, BigQuery, Azure, S3, Salesforce). It describes the conditions under which table-level and column-level relationships are created or skipped, with specific notes for unsupported connection types in Informatica CDI.
Connection type | Relationship |
---|---|
JDBC Connections | If the task property replicate all objects is set to false, the list of source tables is available. Some table columns are used as filters, and the list of table columns is available in this case. Columns marked as excluded in task properties are excluded from the list of table columns to collect in the catalog.
|
BigQuery Connection | Not supported in Informatica CDI |
Azure Connection | Not supported in Informatica CDI |
S3 Connection | Not supported in Informatica CDI |
Salesforce Connection | Not supported by Collector |
Relationships for Targets in Replication tasks
The following table details how relationships are established between tasks and target data sources in replication tasks. It specifies the conditions under which table-level and column-level relationships are created for various connection types, and notes any unsupported connections in Informatica CDI and Salesforce.
Connection type | Relationship |
---|---|
JDBC Connections | If the task property replicate all objects is set to false, the list of source tables is available. Some table columns are used as filters, and the list of table columns is available in this case. Columns marked as excluded in task properties are excluded from the list of table columns to collect in the catalog.
|
BigQuery Connection | Not supported in Informatica CDI |
Azure Connection | Not supported in Informatica CDI |
S3 Connection | Not supported in Informatica CDI |
Salesforce Connection | Not supported by Collector |
Relationships for Sources in Synchronization tasks
The following table explains how relationships are established for sources in synchronization tasks, detailing the support level for various connection types (JDBC, BigQuery, Azure, S3, Salesforce). It specifies whether saved query sources, single object sources, and multiple object sources are supported, highlighting that most connection types are not supported for synchronization tasks within Informatica CDI and Salesforce.
Connection type | Saved query source type | Single object source type | Multiple object source type |
---|---|---|---|
JDBC Connections | Not supported by Collector | Column level (task is associated with column) | Column level (task is associated with column) |
BigQuery Connection | Not supported in Informatica CDI | Not supported in Informatica CDI | Not supported in Informatica CDI |
Azure Connection | Not supported in Informatica CDI | Not supported in Informatica CDI | Not supported in Informatica CDI |
S3 Connection | Not supported in Informatica CDI | Not supported in Informatica CDI | Not supported in Informatica CDI |
Salesforce Connection | Not supported by Collector | Not supported by Collector | Not supported by Collector |
Relationships for Targets in Synchronization tasks
The following table details how relationships are established between tasks and target data sources in synchronization tasks for various connection types.
Connection type | Relationship |
---|---|
JDBC Connections | Column level (task is associated with column) |
BigQuery Connection | Not supported in Informatica CDI |
Azure Connection | Not supported in Informatica CDI |
S3 Connection | Not supported in Informatica CDI |
Salesforce Connection | Not supported by Collector |
Lineage for Informatica CDI
Object | Lineage available |
---|---|
DB Column |
|
Authentication supported
The Informatica CDI collector supports username and password authentication.