Overview
Glean provides customers the ability to deploy Glean software inside their own Amazon Web Services (AWS) account. This deployment requires your AWS admin to:- Create a new empty AWS account.
- Associate a valid billing account.
- Notify Glean of any tags that need to be added to any resources that are created.
- Execute the provided Glean CloudFormation template to bootstrap the AWS environment.
- Notify Glean of the AWS account ID, AWS account name, and AWS region selected.
Prerequisites: Cross-Account Access Requirements
Glean’s deployment process requires access to specific resources hosted in Glean’s central AWS account (518642952506, us-east-1 region). Your AWS account must not have policies that block access to these resources.
Most customers do not need to take any action for these requirements. However, if your organization has restrictive Service Control Policies (SCPs), IAM policies, or VPC endpoint policies in place, verify that they permit the cross-account resource access described below.
ECR - Container Images
Glean distributes container images via Amazon Elastic Container Registry (ECR) for all Glean services deployed in your environment. Required Resource Access:ecr:GetAuthorizationToken, ecr:BatchGetImage, ecr:GetDownloadUrlForLayer
AWS Signer - Code Signing Verification
Glean uses AWS Signer to verify the integrity and authenticity of deployed code. Required Resource Access:signer:Describe*, signer:Get*, signer:List*
KMS & Secrets Manager - Control Plane Bootstrap
Glean uses AWS Key Management Service (KMS) and Secrets Manager to securely bootstrap the IPJC protocol, which is Glean’s control plane communication protocol. Required Resource Access:| Service | Resource ARN |
|---|---|
| KMS (Decrypt) | arn:aws:kms:us-east-1:518642952506:key/* |
| Secrets Manager | arn:aws:secretsmanager:us-east-1:518642952506:secret:* |
kms:Decrypt, secretsmanager:GetSecretValue
S3 - Deployment Artifacts & ML Resources
Glean distributes deployment artifacts and requires access to AWS machine learning resources via S3. Glean Central S3 Buckets (deployment artifacts):s3:GetObject, s3:ListBucket
AMIs - Compute Resources
Glean uses Amazon Machine Images (AMIs) for deploying compute resources in your environment. AWS Bottlerocket AMIs (AWS-managed): Glean programmatically retrieves the latest Bottlerocket AMI for each Kubernetes version during deployment following AWS’s recommended approach. These are publicly available AWS-owned AMIs and require no special configuration. Glean Custom Ubuntu AMIs (Glean-managed): Glean maintains custom Ubuntu images based on Ubuntu 22.04+ LTS with the latest security patches and optional Wiz and CrowdStrike sensor support. These AMIs are hosted in Glean’s central AWS account. Required Capability: Ability to copy AMIs from account518642952506 in region us-east-1 to your deployment region.
Glean custom AMIs contain only base operating system components and security tooling. They do not contain any customer data or sensitive Glean intellectual property.
Additional Security Configurations
EBS Encryption: Glean enables EBS encryption by default for all volumes during the initial AWS account setup. This configuration is applied once during bootstrapping. S3 Public Access Block: Glean enables account-wide S3 public access blocking for security. Customers can toggle this setting if specific use cases require it.1. Create a New AWS Account
Glean deploys to an empty AWS instance to provide layer of isolation from any other services you have running in AWS. This also simplifies the deployment process as it eliminates potential conflicts with existing services.- Select a supported AWS region for Glean to build your environment in.
- Create a new AWS account in the selected region.
- AWS Account ID (e.g.
182333000101) - AWS Account Name (e.g.
aws-glean-companyname) - AWS Region (e.g.
us-east-1)
2. Company Tag Policy
You must notify Glean if your company policy enforces the use of specific tags on AWS resources. Glean can add these tags to nearly all resources that we create, but we must be notified in advance before commencing the build of your Glean tenant so that the tags can be seeded in the initial deployment. You should provide your Glean engineer a list of any and all tags that are required in the following format:3. Run the SCP Check Tool
Glean provides a SCP Checker script that can be used to check if you have any SCPs that can conflict with the build of Glean. If a conflict is detected, please notify your Glean deployment engineer who can provide further guidance.4. Deploy the Glean CloudFormation Template
A Glean provided CloudFormation template is used to bootstrap the AWS account so that the Glean build system can access and deploy Glean services. This process is automated. For more information on how Glean securely accesses your AWS account, see AWS Account Access and Deployment Model.- Log in to your new AWS account as an Administrator.
-
Navigate to the AWS CloudFormation console at https://console.aws.amazon.com/cloudformation/
- Ensure that the correct region is selected before proceeding!
- If the CF template is installed to an incorrect region, you can delete it and re-install it to the correct one.
-
Click Create stack.

-
On the Specify template page, select Amazon S3 URL and paste the following URL for Glean’s CloudFormation template. Click Next to proceed.
-
Specify a name for the stack, e.g.
glean-bootstrap. Click Next to proceed. - When prompted, provide an email address to which Glean will send a magic link post-setup.
- Skip all other fields on the options page and click Next again at the bottom.
-
On the final review page, scroll to the bottom, agree to the capabilities acknowledgment, and click Submit to deploy the template.
- More information: Selecting a Stack Template (docs.aws.amazon.com)
5. (Strongly Recommended) Create the Admin Service Role
Glean recommends that customers set up an admin service role for allowing Glean on-call engineers break-glass access to triage any issues with the build (and the environment once deployed). This access is short-lived, auditable, and only used to triage issues. Glean engineers require approval from Glean Senior Leadership to access this role. We especially recommend that this role be created during the initial setup of Glean to help get the deployment to a stable state. This is not a required IAM role. However, we recommend that customers create this role and use an SCP attached to the Glean AWS account to disable all access to this role (example below). Whenever this access is needed, customers are able to detach the SCP to permit access and then re-attach the SCP once access is no longer required.Note: If you do not create this role (or block access with the SCP), then you must have an established process in place where Glean can request access 24/7 in response to service issues. Without this process, Glean will not be able to honor any Service or Support SLAs.
-
Following the CloudFormation process above, deploy the following template which defines the Admin Service role:
- You will need to specify the External ID in the above template script. This is unique for each environment and will need to be provided by your Glean deployment engineer. Please notify them when you are up to this step.
-
Optional: If you want to lock down this role via an SCP, you will want to create an SCP that looks like this:
6. Request Quota Changes
The resource quotas that Glean requires are detailed below.| Quota Type | Service | Quota name | Location | New Value | Justification |
|---|---|---|---|---|---|
| All Quotas | SageMaker | ml.p3.16xlarge for training job usage | primary deployment region | 1 | The Glean search system trains a custom AI language model on the corpus, enabling features such as semantic search, synonyms, and more. we use these NVIDIA V100 Multi gpu machines to power the training. |
| All Quotas | SageMaker | ml.g4dn.xlarge for training job usage | primary deployment region | 4 | These are the standard machine types used for our general training jobs, which can have multiple running in parallel. We want to increase the quota to alleviate this contention. |
| All Quotas | SageMaker | (For Small and Medium sized deployments) ml.m5.2xlarge for training job usage (For Large and Extra Large sized deployments) ml.m5.4xlarge for training job usage | primary deployment region | 1 | Used for cpu based ml training jobs |