About Sensitive Data Discovery
Warning
This is a Beta feature and is not generally available to all customers at this time. Note that you have to purchase an add-on to be able to use this feature. Please contact your Customer Success specialist to find out more details.
What is Sensitive Data Discovery?
The Sensitive Data Discovery collector (DWCC-SDD) uses machine learning to automatically detect and tag any sensitive data in the system, such as personal Identifiable Information (PII), Protected Health Information (PHI), and Payment Card Industry (PCI) Once the data is detected and tagged, users can view the tags as they use the data and can be aware of it.
The collector can scan any table that can be queried by SQL which has at least one row of data. It is not limited by the quantity of tables. Tables are scanned at the column level.
Note
The Sensitive Data Collector does not scan live connected datasets or virtualized data assets.
DWCC-SDD internally uses private.ai to process and tag the data. The private.ai virtual machine is run in the customer's environment to discover the sensitive data. The collector is run on sample data from the data source and that sample data is never stored in the application and is not shared with private.ai.

Key Features
There are four key features of Sensitive Data Discovery:
Scan: Scan your different data sources. The tool is pre-trained using machine learning to identify 30+ sensitive data types out-of-the-box.
Classify: This capability enables you to differentiate between sensitive data types and the rules you apply for how you should work with that specific data type. Confidential, for example, may have a specific meaning within your organization. Applying the confidential classifications allows you to define your own business logic to the data.
Take action: All information is fully reportable. You can create a report that shows a tabular view of all of your assets and the sensitive data types and classifications that are applied to them.
Integrate: You can then export reports to your favorite BI tool to leverage as part of a broader system or initiative.
Scanned entity types
DWCC-SDD scans the following entity types. This is a subset of the entity types supported by private.ai. For a description of these entity types, see the Private AI documentation.
LOCATION_ADDRESS
DATE
EMAIL_ADDRESS
SSN
NAME
PASSPORT_NUMBER
NUMERICAL_PII
ORGANIZATION
OCCUPATION
ORIGIN
PASSWORD
PHYSICAL_ATTRIBUTE
POLITICAL_AFFILIATION
RELIGION
TIME
URL
ZODIAC_SIGN
CREDIT_CARD
CREDIT_CARD_EXPIRATION
CVV
BANK_ACCOUNT
ROUTING_NUMBER
ID_NUMBER
IP_ADDRESS
USERNAME
HEALTHCARE_NUMBER
BLOOD_TYPE
MEDICAL_CONDITION
DRUG
INJURY
MEDICAL_PROCESS
MEDICAL_OTHER
MEDICAL_STATISTICS
Supported sources
The Sensitive Data Collector can be run on the following sources:
Amazon Athena
AWS Redshift
Google BigQuery
PostgreSQL
Snowflake
Supported languages for the content processed from the source
The Collector can be run on content in various languages. For the complete list of languages supported, see the Private AI documentation.
Important things to note
Before you start running the Sensitive Data Discovery collector, ensure that your organization has a database that has already been cataloged.
The Sensitive Data Collector does not work on the datasets that are already imported and virtualized into the system. It should be run along with the regular Collector to tag the data as it comes in the application.
The Sensitive Data Collector should be re-run every time new data is brought into the application using the regular Collector.