Skip to main content

About Sensitive Data Discovery

Warning

This is a Beta feature and is not generally available to all customers at this time. Note that you have to purchase an add-on to be able to use this feature. Please contact your Customer Success specialist to find out more details.

What is Sensitive Data Discovery?

The Sensitive Data Discovery collector (DWCC-SDD) uses machine learning to automatically detect and tag any sensitive data in the system, such as personal Identifiable Information (PII), Protected Health Information (PHI), and Payment Card Industry (PCI) Once the data is detected and tagged, users can view the tags as they use the data and can be aware of it.

The collector can scan any table that can be queried by SQL which has at least one row of data. It is not limited by the quantity of tables. Tables are scanned at the column level.

Note

The Sensitive Data Collector does not scan live connected datasets or virtualized data assets.

DWCC-SDD internally uses private.ai to process and tag the data. The private.ai virtual machine is run in the customer's environment to discover the sensitive data. The collector is run on sample data from the data source and that sample data is never stored in the application and is not shared with private.ai.

Sensitive_Data_Discovery_diagram.png

Key Features

There are four key features of Sensitive Data Discovery:

  • Scan: Scan your different data sources. The tool is pre-trained using machine learning to identify 30+ sensitive data types out-of-the-box.

  • Classify: This capability enables you to differentiate between sensitive data types and the rules you apply for how you should work with that specific data type. Confidential, for example, may have a specific meaning within your organization. Applying the confidential classifications allows you to define your own business logic to the data.

  • Take action: All information is fully reportable. You can create a report that shows a tabular view of all of your assets and the sensitive data types and classifications that are applied to them.

  • Integrate: You can then export reports to your favorite BI tool to leverage as part of a broader system or initiative.

Scanned entity types

DWCC-SDD scans the following entity types. This is a subset of the entity types supported by private.ai. For a description of these entity types, see the Private AI documentation.

  • LOCATION_ADDRESS

  • DATE

  • EMAIL_ADDRESS

  • SSN

  • NAME

  • PASSPORT_NUMBER

  • NUMERICAL_PII

  • ORGANIZATION

  • OCCUPATION

  • ORIGIN

  • PASSWORD

  • PHYSICAL_ATTRIBUTE

  • POLITICAL_AFFILIATION

  • RELIGION

  • TIME

  • URL

  • ZODIAC_SIGN

  • CREDIT_CARD

  • CREDIT_CARD_EXPIRATION

  • CVV

  • BANK_ACCOUNT

  • ROUTING_NUMBER

  • ID_NUMBER

  • IP_ADDRESS

  • USERNAME

  • HEALTHCARE_NUMBER

  • BLOOD_TYPE

  • MEDICAL_CONDITION

  • DRUG

  • INJURY

  • MEDICAL_PROCESS

  • MEDICAL_OTHER

  • MEDICAL_STATISTICS

Supported sources

The Sensitive Data Collector can be run on the following sources:

  • Amazon Athena

  • AWS Redshift

  • Google BigQuery

  • PostgreSQL

  • Snowflake

Supported languages for the content processed from the source

The Collector can be run on content in various languages. For the complete list of languages supported, see the Private AI documentation.

Important things to note

  • Before you start running the Sensitive Data Discovery collector, ensure that your organization has a database that has already been cataloged.

  • The Sensitive Data Collector does not work on the datasets that are already imported and virtualized into the system. It should be run along with the regular Collector to tag the data as it comes in the application.

  • The Sensitive Data Collector should be re-run every time new data is brought into the application using the regular Collector.