AWS Deployment Guide

Overview

Glean provides customers the ability to deploy Glean software inside their own Amazon Web Services (AWS) account. This deployment requires your AWS admin to:

Create a new empty AWS account.
Associate a valid billing account.
Notify Glean of any tags that need to be added to any resources that are created.
Execute the provided Glean CloudFormation template to bootstrap the AWS environment.
Notify Glean of the AWS account ID, AWS account name, and AWS region selected.

After completing the above, Glean’s systems will automatically build and deploy the required compute, workflows, and software into your AWS account. At this stage, Glean will advise you that your tenant is ready; allowing your admins to proceed with the setup process in the Getting Started guide. This document will cover the steps required by your AWS admins to prepare an AWS account that is ready for your Glean build.

Prerequisites: Cross-Account Access Requirements

Glean’s deployment process requires access to specific resources hosted in Glean’s central AWS account (518642952506, us-east-1 region). Your AWS account must not have policies that block access to these resources.

Most customers do not need to take any action for these requirements. However, if your organization has restrictive Service Control Policies (SCPs), IAM policies, or VPC endpoint policies in place, verify that they permit the cross-account resource access described below.

For detailed information about Glean’s account access model, see AWS Account Access and Deployment Model.

ECR - Container Images

Glean distributes container images via Amazon Elastic Container Registry (ECR) for all Glean services deployed in your environment. Required Resource Access:

arn:aws:ecr:us-east-1:518642952506:repository/*

Permissions Used: ecr:GetAuthorizationToken, ecr:BatchGetImage, ecr:GetDownloadUrlForLayer

AWS Signer - Code Signing Verification

Glean uses AWS Signer to verify the integrity and authenticity of deployed code. Required Resource Access:

arn:aws:signer:us-east-1:518642952506:/signing-profiles/*

Permissions Used: signer:Describe*, signer:Get*, signer:List*

KMS & Secrets Manager - Control Plane Bootstrap

Glean uses AWS Key Management Service (KMS) and Secrets Manager to securely bootstrap the IPJC protocol, which is Glean’s control plane communication protocol. Required Resource Access:

Service	Resource ARN
KMS (Decrypt)	`arn:aws:kms:us-east-1:518642952506:key/*`
Secrets Manager	`arn:aws:secretsmanager:us-east-1:518642952506:secret:*`

Permissions Used: kms:Decrypt, secretsmanager:GetSecretValue

S3 - Deployment Artifacts & ML Resources

Glean distributes deployment artifacts and requires access to AWS machine learning resources via S3. Glean Central S3 Buckets (deployment artifacts):

arn:aws:s3:::scio-cloud-function-sources
arn:aws:s3:::scio-cloud-function-sources/*
arn:aws:s3:::scio-beam-runners-flink
arn:aws:s3:::scio-beam-runners-flink/*
arn:aws:s3:::elastic-plugin-hotswaps
arn:aws:s3:::elastic-plugin-hotswaps/*
arn:aws:s3:::glean-chomp-dags
arn:aws:s3:::glean-chomp-dags/*
arn:aws:s3:::glean-frontend-versions
arn:aws:s3:::glean-frontend-versions/*

AWS SageMaker & Glue S3 Buckets (ML operations):

arn:aws:s3:::*SageMaker*
arn:aws:s3:::*SageMaker*/*
arn:aws:s3:::*aws-glue*
arn:aws:s3:::*aws-glue*/*

Permissions Used: s3:GetObject, s3:ListBucket

AMIs - Compute Resources

Glean uses Amazon Machine Images (AMIs) for deploying compute resources in your environment. AWS Bottlerocket AMIs (AWS-managed): Glean programmatically retrieves the latest Bottlerocket AMI for each Kubernetes version during deployment following AWS’s recommended approach. These are publicly available AWS-owned AMIs and require no special configuration. Glean Custom Ubuntu AMIs (Glean-managed): Glean maintains custom Ubuntu images based on Ubuntu 22.04+ LTS with the latest security patches and optional Wiz and CrowdStrike sensor support. These AMIs are hosted in Glean’s central AWS account. Required Capability: Ability to copy AMIs from account 518642952506 in region us-east-1 to your deployment region.

Glean custom AMIs contain only base operating system components and security tooling. They do not contain any customer data or sensitive Glean intellectual property.

Additional Security Configurations

EBS Encryption: Glean enables EBS encryption by default for all volumes during the initial AWS account setup. This configuration is applied once during bootstrapping. S3 Public Access Block: Glean enables account-wide S3 public access blocking for security. Customers can toggle this setting if specific use cases require it.

Action Required: If your organization enforces restrictive cross-account access policies, review the resources above with your security team and ensure they are permitted. Contact your Glean deployment engineer if you have questions or need assistance configuring policy exceptions.

1. Create a New AWS Account

Glean deploys to an empty AWS instance to provide layer of isolation from any other services you have running in AWS. This also simplifies the deployment process as it eliminates potential conflicts with existing services.

Select a supported AWS region for Glean to build your environment in.
Create a new AWS account in the selected region.

You must notify Glean of the following information:

AWS Account ID (e.g. 182333000101)
AWS Account Name (e.g. aws-glean-companyname)
AWS Region (e.g. us-east-1)

The region selected cannot be changed once your tenant has been built. Changing region will require a complete rebuild of your tenant.

2. Company Tag Policy

You must notify Glean if your company policy enforces the use of specific tags on AWS resources. Glean can add these tags to nearly all resources that we create, but we must be notified in advance before commencing the build of your Glean tenant so that the tags can be seeded in the initial deployment. You should provide your Glean engineer a list of any and all tags that are required in the following format:

tag1:value1--tag2:value2--tag3:value3--...

3. Run the SCP Check Tool

Glean provides a SCP Checker script that can be used to check if you have any SCPs that can conflict with the build of Glean. If a conflict is detected, please notify your Glean deployment engineer who can provide further guidance.

4. Deploy the Glean CloudFormation Template

A Glean provided CloudFormation template is used to bootstrap the AWS account so that the Glean build system can access and deploy Glean services. This process is automated. For more information on how Glean securely accesses your AWS account, see AWS Account Access and Deployment Model.

Log in to your new AWS account as an Administrator.
Navigate to the AWS CloudFormation console at https://console.aws.amazon.com/cloudformation/
- Ensure that the correct region is selected before proceeding!
- If the CF template is installed to an incorrect region, you can delete it and re-install it to the correct one.
Click Create stack.
On the Specify template page, select Amazon S3 URL and paste the following URL for Glean’s CloudFormation template. Click Next to proceed.
```
https://glean-public-marketplace-resources.s3.amazonaws.com/glean-cloudformation-template.yaml
```
Specify a name for the stack, e.g. glean-bootstrap. Click Next to proceed.
When prompted, provide an email address to which Glean will send a magic link post-setup.
Skip all other fields on the options page and click Next again at the bottom.
On the final review page, scroll to the bottom, agree to the capabilities acknowledgment, and click Submit to deploy the template.
- More information: Selecting a Stack Template (docs.aws.amazon.com)

5. (Strongly Recommended) Create the Admin Service Role

Glean recommends that customers set up an admin service role for allowing Glean on-call engineers break-glass access to triage any issues with the build (and the environment once deployed). This access is short-lived, auditable, and only used to triage issues. Glean engineers require approval from Glean Senior Leadership to access this role. We especially recommend that this role be created during the initial setup of Glean to help get the deployment to a stable state. This is not a required IAM role. However, we recommend that customers create this role and use an SCP attached to the Glean AWS account to disable all access to this role (example below). Whenever this access is needed, customers are able to detach the SCP to permit access and then re-attach the SCP once access is no longer required.

Note: If you do not create this role (or block access with the SCP), then you must have an established process in place where Glean can request access 24/7 in response to service issues. Without this process, Glean will not be able to honor any Service or Support SLAs.

Following the CloudFormation process above, deploy the following template which defines the Admin Service role:
```
https://drive.google.com/file/d/134mvNzg24ZyhjvBhgUUPXttOIH9xWxU-/view?usp=drive_link
```
You will need to specify the External ID in the above template script. This is unique for each environment and will need to be provided by your Glean deployment engineer. Please notify them when you are up to this step.

Optional: If you want to lock down this role via an SCP, you will want to create an SCP that looks like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": "*",
            "Resource": "*",
            "Condition": {
                "ArnEquals": {
                    "aws:PrincipalArn": "arn:aws:iam::GLEAN-ACCOUNT-ID:role/GleanAdmin"
                }
            }
        }
    ]
}

6. Request Quota Changes

The resource quotas that Glean requires are detailed below.

Quota Type	Service	Quota name	Location	New Value	Justification
All Quotas	SageMaker	ml.p3.16xlarge for training job usage	primary deployment region	1	The Glean search system trains a custom AI language model on the corpus, enabling features such as semantic search, synonyms, and more. we use these NVIDIA V100 Multi gpu machines to power the training.
All Quotas	SageMaker	ml.g4dn.xlarge for training job usage	primary deployment region	4	These are the standard machine types used for our general training jobs, which can have multiple running in parallel. We want to increase the quota to alleviate this contention.
All Quotas	SageMaker	(For Small and Medium sized deployments) ml.m5.2xlarge for training job usage (For Large and Extra Large sized deployments) ml.m5.4xlarge for training job usage	primary deployment region	1	Used for cpu based ml training jobs

You will only need to request quota increases for the AWS region that you have selected for your deployment.

7. Review Cost Reduction Recommendations

Glean makes several recommendations available on how your AWS resource costs can be constrained/reduced. Link: AWS Hosting Cost & Reduction Recommendations We recommend that you review the recommendations and apply any that you believe are relevant. For example: Purchasing Reserved Instances for EC2 and RDS instead of relying on On-Demand.

8. Notify Glean to Begin the Build

Once the above steps have been completed, notify your Glean deployment engineer who will initiate the build process for your environment.

FAQ

Glean on AWS FAQ

Get Started

Stage 1: Prepare

Stage 2: Build

Stage 3: Setup

Stage 4: Learn

Stage 5: Go Live

Stage 6: Post-Launch

Overview

Prerequisites: Cross-Account Access Requirements

ECR - Container Images

AWS Signer - Code Signing Verification

KMS & Secrets Manager - Control Plane Bootstrap

S3 - Deployment Artifacts & ML Resources

AMIs - Compute Resources

Additional Security Configurations

1. Create a New AWS Account

2. Company Tag Policy

3. Run the SCP Check Tool

4. Deploy the Glean CloudFormation Template

5. (Strongly Recommended) Create the Admin Service Role

6. Request Quota Changes

7. Review Cost Reduction Recommendations

8. Notify Glean to Begin the Build

FAQ

Get Started

Stage 1: Prepare

Stage 2: Build

Stage 3: Setup

Stage 4: Learn

Stage 5: Go Live

Stage 6: Post-Launch

​Overview

​Prerequisites: Cross-Account Access Requirements

​ECR - Container Images

​AWS Signer - Code Signing Verification

​KMS & Secrets Manager - Control Plane Bootstrap

​S3 - Deployment Artifacts & ML Resources

​AMIs - Compute Resources

​Additional Security Configurations

​1. Create a New AWS Account

​2. Company Tag Policy

​3. Run the SCP Check Tool

​4. Deploy the Glean CloudFormation Template

​5. (Strongly Recommended) Create the Admin Service Role

​6. Request Quota Changes

​7. Review Cost Reduction Recommendations

​8. Notify Glean to Begin the Build

​FAQ

Overview

Prerequisites: Cross-Account Access Requirements

ECR - Container Images

AWS Signer - Code Signing Verification

KMS & Secrets Manager - Control Plane Bootstrap

S3 - Deployment Artifacts & ML Resources

AMIs - Compute Resources

Additional Security Configurations

1. Create a New AWS Account

2. Company Tag Policy

3. Run the SCP Check Tool

4. Deploy the Glean CloudFormation Template

5. (Strongly Recommended) Create the Admin Service Role

6. Request Quota Changes

7. Review Cost Reduction Recommendations

8. Notify Glean to Begin the Build

FAQ