Skip to main content

Archie FAQ

What AI model does Archie use?

The Archie is powered by our own private instance of Meta Llama 3.

What data does Archie send to the AI model?

Archie does not use data from customer datasets. Depending on the task, we wrap the needed contextual metadata from the knowledge graph - including resource type, description, raw sql, etc. into a prompt we send to the AI model. Where applicable (e.g. text-to-sql), we also wrap user inputs to get better responses.

Does data.world use any customer metadata or data to train Archie?

No. Archie is powered by a generally trained model. We do not “train” the AI model or Archie with customer metadata, data, or inputs. We assess the accuracy of Archie outputs via user feedback (within the app and via support) as well as benchmark testing with our own test metadata.

Is the data processed through Archie encrypted?

Our identity and access management controls are designed to prevent unauthorized access to data across customer accounts, and we employ standard industry security practices designed to ensure that data are encrypted during transmission on the network as well as encrypted at rest. Information is passed along as transient state information to our private instance of the AI model.

What terms apply to the use of Archie?

Customers who utilize our platform and features, including Archie, agree to abide by our Acceptable Use Policy. By using Archie, customers and their users also agree to abide by Meta’s Llama 3 license. Customers remain responsible and liable for the use or distribution of their inputs and the outputs generated by Archie. Any outputs that users decide to keep, manage, and maintain, are the sole responsibility of those users.

Does Archie process PII or other sensitive data?

Archie does not require raw data to function nor does Archie attempt to mask PII or sensitive data. Insomuch as there might be PII or sensitive data in your catalog or user inputs, Archie will process it.

What is the data retention policy for Archie?

User inputs and Archie outputs are stored as logs for diagnostic purposes. Current retention is 7 days. Of the Archie outputs a user has decided to save to the user’s graph, retention is at the customer’s discretion (e.g. save a suggested description, questions, and queries).

How does data.world manage security vulnerabilities for Archie?

Access to data.world is controlled via SSO, and this data is not accessible by or given to third-party generative AI systems. Our identity and access management controls are designed to prevent unauthorized access to data across customer accounts, and we employ industry standard security practices designed to ensure that data are encrypted during transmission on the network as well as at rest. Information is passed along as transient state information to our private instance of the AI model.

How does data.world manage the performance and scale for Archie?

data.world runs an internally hosted LLM, distributed across multiple nodes.  We have standard rate limiting and usage thresholds in place which are designed to govern access and isolate user traffic for uptime and performance purposes across our user base.

Does data.world use user interactions and data exchange with the Generative AI model to train and refine the model?

No. Archie is powered by a generally trained model. We do not “train” Archie with customer metadata, data, or inputs. We assess the accuracy of Archie outputs via user feedback (within the app and via support) as well as benchmark testing with our own test metadata.

Does data.world share customer data with third parties to provide the Generative AI services?

No. We do not share customer data with third parties to provide the Generative AI services. Archie does not use data from customer datasets. Depending on the task, we wrap the needed contextual metadata from the knowledge graph - including resource type, description, raw sql, etc. into a prompt we send to our internally hosted AI model. Where applicable (e.g. text-to-sql), we also wrap user inputs to get better responses.

What specific customer data from data.world is used to guide the Generative AI?

For instance, when suggesting a description for a table or column, is it based on just the resource name, other table and column names, other descriptions in the collection, or data from datasets?

Archie does not use data from customer datasets. Depending on the task, we wrap the needed contextual metadata from the knowledge graph - including resource type, description, raw sql, etc. into a prompt we send to the model. Where applicable (e.g. text-to-sql), we also wrap user inputs to get better responses.