Skip to main content

About the Microsoft SQL Server collector

Use this collector for the following data sources:

  • Microsoft SQL Server

  • Azure Synapse Analytics: Note that the collector currently only supports SQL databases with dedicated SQL pools.

The collector is suitable for both self-hosted Microsoft SQL Server instances and managed Microsoft SQL Server instances, such as those hosted on AWS RDS or Azure SQL.

Important

The Microsoft SQL Server collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.251. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.

Table 1.

Object

Information cataloged

Columns

Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Key type (Primary, foreign), Column size, Column index

Table

Name, Description, Primary key, Schema

Extended metadata: Created date, Modified date

Views

Name, Description, SQL definition

Materialized View

Name, description

Schema

Identifier, Name

Extended metadata: Created date, Modified date

Database

Type, Name, Identifier, Server, Port, Environment, JDBC URL

Functions

Name, Description, Function Type

Stored Procedures

Name, Description, Stored Procedure Type

Extended metadata: Definition, Created, Last modified



Relationship between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 3.

Resource page

Relationship

Table

Columns

View

Schema that contains Views, Columns that are part of Views

Materialized View

Schema that contains Materialized Views, Columns that are part of Materialized Views

Columns

Table

Schema

Database that contains Schema, Table that is part of Schema, View that is part of Schema, Materialized View that is part of Schema

Database

Schema contained in Database



Lineage and dependencies for Microsoft SQL Server

Lineage

The following lineage information is collected by the Microsoft SQL Server collector. This lineage information is available only for the target server and databases specified while running the collector. Harvesting lineage from referenced objects located in another server is not supported.

Table 4.

Object

Lineage available

View

The collector identifies the associated column in an upstream view or table:

  • Where the data is sourced from

  • That sort the rows via ORDER BY

  • That filter the rows via WHERE/HAVING

  • That aggregate the rows via GROUP BY

Note: For Views, the collector first tries to parse the view SQL to harvest lineage metadata. If the SQL parser of the collector cannot parse the view SQL, the collector will catalog some lineage relationships using the dm_sql_referencing_entities system function, when available. For each row in the referenced entities, if is_selected or is_select_all is true, the collector will catalog a relationship between the referencing entity and the database column. This function is not supported by Azure Synapse.

Stored Procedure

The collector identifies:

  • The associated column in an upstream view or table:

    • Where the data is sourced from

    • That sort the rows via ORDER BY

    • That filter the rows via WHERE/HAVING

    • That aggregate the rows via GROUP BY

  • The downstream table that has its tables updated.

Note: The following stored procedures are not supported:

  • Stored procedures with multitable inserts. However, multiple separate insert statements that insert into one table are supported.

  • Stored procedure with multiple SELECT and INSERT statements not separated by a semicolon delimiter

Note: For Stored Procedures, the collector attempts to parse INSERT, UPDATE, and SELECT statements within the procedure definition. Additionally, the collector catalogs some lineage relationships using the dm_sql_referencing_entities system function, when available. This function is used to retrieve columns and tables which are referenced by views or stored procedures. For each row in the referenced entities, if is_selected or is_select_all is true, the collector will catalog a relationship between the referencing entity and the database column. For each row in the referenced entities, if is_updated or is_insert_all is true, the collector will catalog a relationship between the database table which the referenced column is part of and the referencing entity. This function is not supported by Azure Synapse.



Dependencies

The SQL Server collector catalogs dependencies between tables, views, and stored procedures using  sys.sql_expression_dependencies.

Dependencies are created when one entity appears by name in a persisted SQL expression of another entity. For each row in this view, if the referenced or referencing entity is a view, table, or stored procedure, the collector catalogs a dependency relationship between the two objects. See the Microsoft documentation for more information about the system dependencies view.

Authentication supported

  • The collector supports the following authentication methods to Azure Synapse Analytics and Microsoft SQL Server - username/password authentication, NTLM authentication, and Microsoft Entra authentication methods such Active directory service principal and Active directory password authentication.