Archie FAQ

What data does Archie send to the AI model?

Depending on the task, Archie wraps the needed contextual metadata from the knowledge graph - including resource type, description, raw sql, etc. into a prompt Archie sends our internally hosted AI model. Where applicable (for example, text-to-sql), Archie also wraps user inputs to get better responses.

Does data.world use any customer data to train Archie?

No. Archie is powered by a foundation model. We do not train the AI model or Archie with Customer Data, Inputs, or Outputs. We assess the accuracy of Archie Outputs via Feedback (within the app and via support) as well as benchmark testing with our own test metadata.

Is the data processed through Archie encrypted?

Yes. We employ standard industry security practices designed to ensure that data are encrypted during transmission on the network as well as encrypted at rest.

What terms apply to the use of Archie?

Customers and their users who utilize our platform and features, including Archie, agree to abide by our Acceptable Use Policy and Documentation for Archie. Customers remain responsible and liable for the use or distribution of their Inputs and the Outputs generated by Archie. Any Outputs that users decide to keep, manage, and maintain, are the sole responsibility of those users.

Does Archie process personal data or other sensitive data?

Archie does not require raw data to function nor does Archie attempt to mask personal data or sensitive data. Your catalog on data.world is not structured to include personal data or sensitive data. Rather it catalogs metadata that does not typically include any personal or sensitive data. However, if you decide to add personal or sensitive data to your data.world catalog, Archie will process it.

What is the data retention policy for Archie?

We store the most recent Archie Chat conversations for the convenience of the user. These conversations are available from the UI. The Archie Enrich Outputs a user has decided to save to the user’s graph (for example, save a suggested description), retention is at the customer’s discretion. All user event logs, including those created using Archie, follow the same retention policy outlined in our Policies and Guidelines.

How does data.world manage security vulnerabilities for Archie?

Access to data.world is controlled via SSO, and this data is not accessible by or given to third-party generative AI systems. Our identity and access management controls are designed to prevent unauthorized access to data across customer accounts, and we employ industry standard security practices designed to ensure that data are encrypted during transmission on the network as well as at rest. Information is passed along as transient state information to our private instance of the AI model.

How does data.world manage the performance and scale for Archie?

data.world runs an internally hosted LLM, distributed across multiple nodes. We have standard rate limiting and usage thresholds in place which are designed to govern access and isolate user traffic for uptime and performance purposes across our user base.

Does data.world use user interactions and data exchange with the Generative AI model to train and refine the model?

No. Archie is powered by a foundation model. We do not train Archie with Customer Data, Inputs, or Outputs. We assess the accuracy of Archie Outputs via Feedback (within the app and via support) as well as benchmark testing with our own test metadata.

Does data.world share customer data with third parties to provide the Generative AI services?

No. We do not share Customer Data with third parties to provide the Generative AI services. Archie does not use the underlying data of a customer’s data catalog. Depending on the task, Archie wraps the needed contextual metadata from the knowledge graph - including resource type, description, raw sql, etc. into a prompt Archie sends to our internally hosted AI model. Where applicable (for example, text-to-sql), Archie also wraps Inputs to get better responses.

What specific customer data from data.world is used to guide the Generative AI?

For instance, when suggesting a description for a table or column, is it based on just the resource name, other table and column names, other descriptions in the collection, or data from datasets?

Archie does not use the underlying data of a customer’s data catalog. Depending on the task, Archie wraps the needed contextual metadata from the knowledge graph - including resource type, description, raw SQL, etc. into a prompt Archie sends to the model. Where applicable (for example, text-to-sql), Archie also wraps Inputs to get better responses.

Does the Archie chat store any customer data?

Chat inputs are saved in the user’s chat window for future use and context. The conversations are not used for foundation model training. Interaction data is used in metrics and insights in order to understand user needs and improve the product (excluding training the foundation model). Customers can archive the chats and submit them to our support team for debugging and feedback.

Does this feature comply with any specific regulations (e.g., GDPR, SOC 2)?

Security: Archie does not decrease our security assurance programs. Read about those programs here. Because Inputs are not accessible to external third-party generative AI systems, our security assurance programs as applied to Archie are consistent with their design to ensure the same security measures over all platform data, including Archie inputs and outputs.

Privacy: In terms of privacy, Archie is not different from other features in our platform and is subject to our Privacy Policy.

Is Archie available to all users in my private instance?

Not automatically. Archie features can be turned on for specific users or all users in any organization on your Private Instance. During the Public Preview period, customers should reach out to their customer service representative to discuss their preferred preview method for Archie Chat and/or Archie Enrich features.

Once enabled, Archie features will begin to show up for users entitled to preview these features.

Does Archie respect access control when making suggestions?

Yes. Archie respects access control and checks user access before suggesting a resource or term and will respect those boundaries when responding to questions. If a user does not have access to a resource in the catalog, Archie chat will not include that resource in the response to the user.

What happens if the chat cannot answer a question?

If the answer provided feels incorrect and you feel fairly certain Archie should know the answer, try asking the question again in a different way. Archie is fairly good at letting users know about its expected behavior and boundary of knowledge of the catalog.For example, if you ask Archie to aggregate or provide a metric about catalog resources, Archie is usually good at explaining why it can’t and provides an alternatively helpful suggestion.

What data does Archie access when I chat with it?

Archie has access to data.world product documentation, data.world website resources (blogs, catalog and cocktail podcast, etc), and a set of indexed catalog resources (for example, business terms, analytics assets, etc.), enabling it to answer a wide range of user questions.

Are there any limits to usage of Archie chat?

Yes, the free preview comes with usage limits. Contact your Customer Success Director to discuss what these limits are and what are additional buying options.

Important

data.world reserves the right to throttle usage performance in general and depending on selected package/plan.

In this section: