Data Analysis: File Security

Introduction

At Glean, we prioritize security, especially when it comes to executing code in a shared environment. Our code sandbox solution for Data Analysis is designed to ensure that all code is executed in a secure, isolated, and controlled environment, safeguarding both user data and system integrity.

How the Sandbox Works

The sandbox is a virtual environment that allows customers to upload and analyze data files, such as spreadsheets or CSVs, using custom code. The environment provides an isolated instance for each session where code can be safely executed without impacting other users or systems. Each sandbox is temporary, specifically created for each user’s session and destroyed afterward to ensure no data persistence. Here is the lifecycle of a data analysis request:

And here is the sandbox architecture:

Key Features

Resource Limits: The sandbox restricts the amount of time, memory, and CPU that code can consume.
Isolation: Each sandbox operates independently, meaning that code execution is isolated from other users. There is no network egress allowed, which ensures that no data can be sent out from the sandbox environment.
Temporary Nature: Once a session ends or a certain period of inactivity is detected, the sandbox is terminated. No code or data is stored beyond the active session.

Security Implications

Our sandboxing solution has been carefully designed with security in mind to minimize potential risks. The following controls are implemented to ensure a high level of security:

Network Isolation: Sandboxes are strictly network isolated, preventing data from leaving the sandbox environment. The only allowed network traffic is ingress from authenticated internal systems for code execution purposes.
No Data Sharing: Each sandbox is assigned to a specific session, ensuring that data and code uploaded during that session are not shared across sandboxes. Once the sandbox is destroyed, all session data is wiped.
Resource Monitoring: Each sandbox is containerized and uses Kubernetes pod limits monitored to prevent overuse of system resources. This helps mitigate denial-of-service (DoS) risks by ensuring users cannot consume excessive amounts of memory or CPU.

User Controls and Customization

Users have control over the following aspects of the sandbox environment:

Code Execution: LLM-generated code will be executed in the sandbox. Code can be executed and re-executed during the session until the sandbox is reset or destroyed.
Session Termination: Sandboxes are automatically cleaned up after a session ends, and users can also clear their chat session from the sidebar.

While the sandbox environment operates within predefined security and resource limits, users can control how long they keep their session active and how much data they upload for analysis, all within safe limits.

Costs and Efficiency

Our sandbox solution is designed to be both cost-effective and scalable. The sandbox runs on cloud infrastructure, which allows us to optimize resource usage based on demand.

Cost Per Sandbox: There is a fixed monthly cost associated with running the machine on which the sandboxed pods run, which can range from $35-60/month for most deployments. (See FAQs for details) This makes the sandbox solution highly affordable for customers who need to execute code on demand.
Scalability: We have a fixed number of sandboxes that determine the number of parallel user sessions at a time, but can work with you to upscale and downscale resourcing as needed.

Summary

Our code sandbox provides a secure, isolated, and cost-effective environment for running code as part of the File Analyst tool. With its robust security features, including resource limits, network isolation, and session-based sandboxing, you can safely analyze data without worrying about system integrity or data leakage. Our solution is also affordable, scalable, and efficient, making it an ideal choice for companies looking to perform complex data analysis with the highest security standards. For more technical information on the architecture please reference this technical document.

Frequently Asked Questions (FAQ)

What happens to my data after a session ends?

Can I control how long the sandbox stays active?

Are there any limitations on the code I can run?

What kind of network restrictions does the sandbox have?

How does the sandbox handle security vulnerabilities?

How is resource usage monitored in the sandbox?

What if my organization needs more concurrent sessions?

What happens if the sandbox reaches its resource limits?

Can I integrate third-party tools or libraries into the sandbox?

How much does the sandbox cost?

General

Identity

Search

Assistant

Embedded Integrations

MCP

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

Introduction

How the Sandbox Works

Key Features

Security Implications

User Controls and Customization

Costs and Efficiency

Summary

Frequently Asked Questions (FAQ)

General

Identity

Search

Assistant

Embedded Integrations

MCP

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

​Introduction

​How the Sandbox Works

​Key Features

​Security Implications

​User Controls and Customization

​Costs and Efficiency

​Summary

​Frequently Asked Questions (FAQ)

Introduction

How the Sandbox Works

Key Features

Security Implications

User Controls and Customization

Costs and Efficiency

Summary

Frequently Asked Questions (FAQ)